Archive for the ‘Computing’ Category

Distributions in Mathematica and R/S-PLUS

Tuesday, July 1st, 2008

I posted some notes this evening on working with probability distributions in Mathematica and R/S-PLUS.

I much prefer Mathematica’s syntax. The first time I had to read some R code I ran across a statement something like runif(1, 3, 4). I thought it was some sort of conditional executation statement: run something if some condition holds. No, the code generates a random value uniformly from the interval (3, 4). The corresponding Mathematica syntax is Random[ UniformDistribution[3,4] ].

Another example. The statement pnorm(x, m, s) in R corresponds to PDF[ NormalDistribution[m, s], x ] in Mathematica. Both evaluate the PDF of a normal random variable with mean m and standard deviation s at the point x.

It’s a matter of taste. Some people prefer terse notation, especially for things they use frequently. I’d rather type more and remember less.

Mathematica turns 20

Tuesday, July 1st, 2008

Mathematica was first released June 23, 1988. I started using Mathematica not long after it came out and used it for a few years. Then for several years after that I didn’t touch it. When I began using Mathematica again several years after that, like Rip Van Winkle, I’d find many things had changed while I was gone. Instead, I was pleasantly surprised how easy it was to start using it again.

Mathematica syntax is simple, consistent, and predictable. They got this right twenty years ago and stuck to it. They’ve managed to grow over the years without alienating users, even those of us who take a long hiatus from using the product. I’ve used Mathematica more or less regularly over the last few years, but I’ll still go for weeks at a time without using it. It’s easy to pick up every time I return to it. (The opposite of my experience with Perl.)

Monitoring legacy code that fails silently

Tuesday, June 24th, 2008

Clift Norris and I just posted an article on CodeProject entitled Monitoring Unreliable Scheduled Tasks about some software Clift wrote to resolve problems we had calling some legacy software that would fail silently. His software adds from the outside monitoring and logging functions that better software would have provided on the inside.

The monitoring and logging software, called RunAndWait, kicks off a child process and waits a specified amount of time for the process to complete. If the child does not complete in time, a list of people are notified by email. The software also checks return codes and writes all its activity to a log.

RunAndWait is a simple program, but it has proven very useful over the last year and a half since it was written. We use RunAndWait in combination with PowerShell for scheduling our nightly processes to interact with the legacy system. Since PowerShell has verbose error reporting, calling RunAndWait from PowerShell rather than from cmd.exe gives additional protection against possible silent failures.

Managing passwords II

Monday, June 16th, 2008

PasswordMaker is a clever solution to the problem of managing passwords. Instead of storing passwords for each web site, you use their software to generate a unique password for each site. The idea is quite simple: use a master password and a one-way hash function to turn the URL of a site into the password for that site. For each site, this generates the same password each time. Each site has a different password, but you only have one password to remember.

You don’t have to use the URL. You can come up with any string you want to identify a context where you need a password. But the URL is a natural choice.

The software comes in many variations: browser-based, command line, etc.

Managing passwords

Friday, May 23rd, 2008

When everything you do requires a different password, how do you keep up with them all? The most common solution is to use the same username and password in as many contexts as possible. Not only is this ill-advised, it’s not all that practical. Maybe someone else is using your favorite username. Maybe your favorite password is too short or too long for some contexts, etc. So you end up with dozens of minor variations on a preferred username/password pair.

One solution is to keep all your passwords in place and have a strong password that unlocks your password collection. A security professional friend of mine recommends Password Safe for this purpose. It works well as long as you’re at your own computer or at a computer where you can access Password Safe on a flash drive, but not if you’re using a public computer.

Another solution is to use a third party authentication service like OpenID. Jeff Atwood posted a thorough discussion of the pros and cons of OpenID on his blog yesterday. OpenID can reduce the number of passwords you need to manage, but it won’t cut the number down much until more sites accept OpenID.

Customizing the PowerShell command prompt II

Tuesday, May 13th, 2008

I just picked up a copy of Windows PowerShell Cookbook by Lee Holmes. One of the first examples in the book is customizing the PowerShell command prompt. His example sets the command window title as part of the prompt function. For example, adding

$host.UI.RawUI.WindowTitle = "$env:computername $pwd.path"

to the function given in my previous post would display the computer name and full path to the working directory in the title bar. The full code would be

function prompt
{
    $m = 30 # maximum prompt length
    $str = $pwd.Path
    if ($str.length -ge $m)
    {
        # The prompt will begin with "...",
        # end with ">", and in between contain
        # as many of the path characters as will fit,
        # reading from the end of the path.
        $str = "..." + $str.substring($str.length - $m + 4)
    }
    $host.UI.RawUI.WindowTitle = "$env:computername $pwd.path"
    "$str> "
}

Customizing the PowerShell command prompt

Monday, May 12th, 2008

By default, the PowerShell command prompt does not echo the current working directory. To customize the command prompt, simply create a function named prompt. If you want this customization to persist, add it to your profile.

For example, adding the following line to your profile will cause the working directory to be displayed much like it is in cmd.exe.

function prompt { "$pwd>" }

However, the prompt function can contain any code at all. Here’s a prompt function that will display the right-most part of the working directory. This keeps long working directory names from taking up most of the space at the command line.

function prompt
{
    $m = 30 # maximum prompt length
    $str = $pwd.Path
    if ($str.length -ge $m)
    {
        # The prompt will begin with "...",
        # end with ">", and in between contain
        # as many of the path characters as will fit,
        # reading from the end of the path.
        $str = "..." + $str.substring($str.length - $m + 4)
    }
    "$str> "
}

For example, if

C:\Documents and Settings\Administrator\My Documents\My Music

is the current directory, the prompt would be

...ator\My Documents\My Music>

Update: See the next post for an update.

Wikipedia in 10 GB

Friday, May 9th, 2008

The Stack Overflow podcast, episode 4, mentioned in passing that the Wikipedia database is about 10 GB. I was surprised it isn’t bigger. If that size is correct, you could download a snapshot of Wikipedia to your local hard drive.

Top five gotchas when learning PowerShell

Friday, May 2nd, 2008

Here is my list of the top five gotchas when learning Windows PowerShell.

5. PowerShell will not run scripts by default.

4. PowerShell requires .\ to run a script in the current directory.

3. PowerShell uses -eq, -gt, etc. for comparison operators.

2. PowerShell uses backquote as the escape character.

1. PowerShell separates function arguments with spaces, not commas.

See PowerShell gotchas for more details and an explanation for why PowerShell made the design decisions it did. As surprising as these features are, there are good reasons for each.

Readable path listings

Thursday, May 1st, 2008

Windows has never made it easy to read long environment variables. If I display the path on one machine I get something like this, both from cmd and from PowerShell.

C:\bin;C:\bin\Python25;C:\bin\TeX\miktex\bin;C:\bin\TeX\MiKTeX\miktex\bin;C:\bin\Perl\bin\;C:\ProgramFiles\Compaq\Compaq Management Agents\Dmi\Win32\Bin; ...

The System Properties window is worse since you can only see a tiny slice of your path at a time.

screen shot of path UI

Here’s a PowerShell one-liner to produce readable path listing:

$env:path -replace ";", "`n"

This produces

C:\bin
C:\bin\Python25\
C:\bin\TeX\miktex\bin
C:\bin\TeX\MiKTeX\miktex\bin
C:\bin\Perl\bin\
C:\Program Files\Compaq\Compaq Management Agents\Dmi\Win32\Bin
...

(If you’re not familiar with PowerShell, note the backquote before the n to indicate the newline character to replace semicolons. This is one of the most unconventional features of PowerShell since backslash is the escape character in most contexts. Because Windows uses either forward or backward slashes as path separators, PowerShell could not use backslash as an escape character. Think of the backquote as a little backslash. Once you get over the initial shock, you get used to the backquote quickly.)

Update: It occurred to me after the original post that there’s an even simpler way to display the path.

$env:path.split(';')

Integrating the clipboard and the command line

Wednesday, April 30th, 2008

Two of my favorite cmdlets from the PowerShell Community Extensions are get-clipboard and out-clipboard. These cmdlets let you read from and write to the Windows clipboard from PowerShell. For example, the following code will grab the contents of the clipboard, replace every block of white-space with a comma, and paste the result back to the clipboard.

(get-clipboard) -replace '\s+(?!$)', ',' | out-clipboard 

I saved this to a file comma.ps1 in my path and run it when I get a list of numbers from one program delimited by newlines or tabs and need to make it the input to another program expecting comma-delimited values. For example, turning a column of numbers into an array for R. I copy one format, run comma.ps1, and paste in the new format.

In case you’re curious about the mysterious characters in the script, \s+(?!$) is a regular expression describing where I want to substitute a comma. The \s refers to white-space characters (tabs, spaces, newlines) and the +says this is repeated one or more times. So match one or more consecutive white-space characters. That would be enough by itself, but it would replace trailing white-space with a comma too, so I might get an unwanted comma at the end. The sequence (?!$) fixes that. The $ matches the end of line. The (?! before and the ) after form a negative look ahead, meaning “except when the thing inside matches.” So taken all together, the regular expression matches chunks of white-space except at the end of the input.

Preventing an unpleasant Sweave surprise

Tuesday, April 29th, 2008

Sweave is a tool for making statistical analyses more reproducible by using literate programming in statistics. Sweave embeds R code inside LaTeX and replaces the code with the result of running the code, much like web development languages such as PHP embed code inside HTML.

Sweave is often launched from an interactive R session, but this can defeat the whole purpose of the tool. When you run Sweave this way, the Sweave document inherits the session’s state. Here’s why that’s a bad idea.

Say you’re interactively tinkering with some plots to make them look like you want. As you go, you’re copying R code into an Sweave file. When you’re done, you run Sweave on your file, compile the resulting LaTeX document, and get beautiful output. You congratulate yourself on having gone to the effort to put all your R code in an Sweave file so that it will be self-contained and reproducible. You forget about your project then revisit it six months later. You run Sweave and to your chagrin it doesn’t work. What happened? What might have happened is that your Sweave file depended on a variable that wasn’t defined in the file itself but happened to be defined in your R session. When you open up R months later and run Sweave, that variable may be missing. Or worse, you happen to have a variable in your session with the right name that now has some unrelated value.

I recommend always running Sweave from a batch file. On Windows you can save the following two lines to a file, say sw.bat, and process a file foo.Rnw with the command sw foo.

  R.exe -e "Sweave('%1.Rnw')"
  pdflatex.exe %1.tex

This assumes R.exe and pdflatex.exe are in your path. If they are not, you could either add them to your path or put their full paths in the batch file.

Running Sweave from a clean session does not insure that your file is self-contained. There could still be other implicit dependencies. But running from a clean session improves the chances that someone else will be able to reproduce the results.

See Troubleshooting Sweave for some suggestions for how to prevent or recover from other possible problems with Sweave.

Update: See the links provided by Gregor Gorjanc in the first comment below for related batch files and bash scripts.

60-second description of feeds

Monday, April 28th, 2008

If you don’t know what a “feed” is, as in RSS feed etc., here’s a 60-second audio explanation.

Audio clip from Sixty Second Tech

Transcript

One program to rule them all

Sunday, April 27th, 2008

Do you have a single program that you “live in” when you’re at a computer? Emacs users are known for “living” inside Emacs. This means more than just using the program for a large part of the day. It means using the program as the integration point for other programs, a sort of backplane for tying other things together.

Steve Yegge’s most recent blog post described his switch from Windows to Mac. He said the main reason for the switch was that he prefers the appearance of the fonts on a Mac. Changing operating systems was not a big deal for Yegge because he didn’t really live in Windows before, nor does he live in OS X now. He lives in Emacs. He concludes his essay by saying

So I’ll keep using my Macs. They’re all just plumbing for Emacs, anyway. And now my plumbing has nicer fonts.

Graphic artists may spend the majority of their work day using Photoshop, but they don’t send email from Photoshop, and they don’t keep their calendar in Photoshop. So I wouldn’t say they “live” in Photoshop. Microsoft developers spend a great deal of their time inside Visual Studio, though they don’t live inside Visual Studio to the same extent that Emacs users live inside Emacs. The Visual Studio experience is somewhere between Photoshop and Emacs on the “live in” scale. Unlike Emacs, Visual Studio has no ambition to become an operating system, probably because the company that makes Visual Studio already has an operating system.

I once knew someone who lived in Mathematica, doing his word processing etc. inside this mathematical package. Mathematica is a nice place to visit, but I wouldn’t want to live there. 

A growing number of people now live inside their web browser, particularly if that browser is FireFox. There are FireFox plug-ins available to mow your lawn and take your children to the orthodontist. Maybe FireFox is becoming the Emacs of a new generation.

The choice of a program to live in is really a choice of how you want to tie applications together. To live in Emacs, you have to write Emacs Lisp, and that’s a deal-breaker for many. Interestingly, Microsoft has a project to create a highly configurable editor some have nick-named Emacs.NET. You can bet that the extension language will not be Emacs Lisp.

Some people live in their command shell and use shell scripts to tie everything together. While many Unix folks live that way, that hasn’t been practical on Windows until recently when PowerShell came out.

By the way, you can run PowerShell and Emacs at the same time. See Jeffrey Snover’s blog post PowerShell Running Inside of Emacs.

Cross-platform PowerShell

Friday, April 18th, 2008

I just found out there’s a project called Pash to create an open source, cross platform version of Microsoft’s PowerShell. That should be very interesting.