Customizing the PowerShell command prompt

By default, the PowerShell command prompt does not echo the current working directory. To customize the command prompt, simply create a function named prompt. If you want this customization to persist, add it to your profile.

For example, adding the following line to your profile will cause the working directory to be displayed much like it is in cmd.exe.

    function prompt { "$pwd>" }

However, the prompt function can contain any code at all. Here’s a prompt function that will display the right-most part of the working directory. This keeps long working directory names from taking up most of the space at the command line.

    function prompt
    {
        $m = 30 # maximum prompt length
        $str = $pwd.Path
        if ($str.length -ge $m)
        {
            # The prompt will begin with "...",
            # end with ">", and in between contain
            # as many of the path characters as will fit,
            # reading from the end of the path.
            $str = "..." + $str.substring($str.length - $m + 4)
        }
       "$str> "
    }

For example, if

    C:Documents and SettingsAdministratorMy DocumentsMy Music

is the current directory, the prompt would be

    ...atorMy DocumentsMy Music>

Update: See the next post for an update.

Jenga mathematics

Jenga is a game where you start with a tower of wooden pegs and take turns removing pegs until someone makes the tower collapse. A style of mathematics analogous to Jenga reached the height of its popularity about 40 years ago and then fell out of fashion. I use the phrase “Jenga mathematics” to refer to generalizing a well-known theorem by weakening its hypotheses, seeing how many pegs you can pull out before it falls.

Many 20th century mathematicians spent their careers going over the work of 19th century mathematicians, removing every hypothesis they could. Sometimes a 20th century mathematician would get his name tacked on to a 19th century theorem due to his Jenga accomplishments.

Taken to extremes, Jenga mathematics turns theorems inside-out and proofs become hypotheses. Natural hypotheses are replaced with a laundry list of properties necessary to make the proof work. Start with some theorem of the form “Let X be a widget. Then X has a foozle.” Go back over the proof and see just what features of a widget are needed for the proof. Then restate the theorem as “Let X have the following apparently arbitrary list of properties necessary for my proof to work. Then X has a foozle.” Never mind whether anybody can think of anything other that a widget that satisfies the hypotheses of the new theorem.

Jenga mathematics is no longer fashionable. Mathematicians still value removing unneeded hypotheses, but they’re not as willing to go to extremes to do so. They are more interested in building new towers than in removing every piece possible from old towers.

Publishing correct sample code

It’s infuriating to read published sample code that’s wrong. Sometimes code given in books is not even syntactically correct. I’ve wondered why publishers didn’t have a way to verify that the code at least compiles, and maybe even check that it gives the stated output.

Dave Thomas said in recent interview that his publishing company, The Pragmatic Programmers, does just that. Authors write in a logical mark-up language and software turns that into a publishable form, compiling code samples and inserting the output. Sample code from one of their books is more likely to work the first time you type it in than code from other publishers.

Regular expressions in C++ TR1

Regular expressions are not a part of the C++ Standard Library quite yet, but there is a document (Technical Report 1, or TR1) that includes among other things a specification for regular expression support that will probably be added to the C++ standard eventually.

The Boost library has supported TR1 for a while. Microsoft just released a feature pack for Visual Studio 2008 a month ago that includes support for most of TR1. (They’ve left out support for mathematical special functions.) And Dinkumware sells a complete TR1 implementation.

I’ve added some notes to my website for getting started with C++ TR1 regular expressions. I took my PowerShell regex notes as a starting point and implemented some of the same examples in C++. I changed the organization though, because the C++ implementation is fairly different from PowerShell.

Working with regular expressions is harder in C++ than in scripting languages such as Perl or Python, but not unnecessarily so. C++ is optimized for fine-grained control and efficiency rather than ease of use; that’s what C++ is for. The TR1 implementation is internally consistent and elegant in its own way.

It’s easy to find API-level documentation but harder to find examples for getting started. (I’ve heard good things about Pete Becker’s book The C++ Standard Library Extensions but I haven’t read it.) So I decided to keep some notes as I played with the Visual Studio implementation. I imagine most of the content applies to other implementations, but I’ve only tested the examples using Visual Studio.

Update: GCC just added support for C++ TR1 two days ago with their version 4.3 release.  However, it appears support for regular expressions is not included.

Biased estimators

An unbiased estimator, very roughly speaking, is a statistic that gives the correct result on average. For a precise definition, see Wikipedia. Unbiasedness is an intuitively desirable property. In fact, it seems indispensable at first.

In the colloquial sense, “bias” is practically synonymous with self-serving dishonesty. Who wants a self-serving, dishonest statistical estimate? But it’s important to remember that “bias” in statistical sense has a technical meaning that may not correspond to the colloquial meaning.

Here’s the big problem with statistical bias: if U is an unbiased estimator of θ, f(U) is NOT an unbiased estimator of f(θ) in general. For example, standard deviation is the square root of variance, but the square root of an unbiased estimator for variance is not an unbiased estimator for standard deviation. This shows bias has nothing to do with accuracy, since the square root of an accurate estimation of variance is an accurate estimate of standard deviation. In fact, unbiased estimators can be terrible.

The fact that unbiasedness is not preserved under transformations calls into question its usefulness. People seldom care directly about abstract statistical parameters directly. Instead they care about some calculation based on those parameters. An unbiased estimate of the parameters does not generally lead to an unbiased estimate of what people really want to estimate.

LINQ to Regex

Roy Osherove just posted an article about his Introducing LINQ to Regex project.

LINQ stands for Language INtegrated Query, a way of baking query support into .NET programming languages. Microsoft has been promising a unified way to query all kinds of data for years now.  Along the way they came out with a score of new libraries that were going to be the solution. They’d work for all kinds of data that happened to look very much like a relational database. But now with LINQ they’ve finally delivered something that works well not only with relational data but also with hierarchical data such as XML. With LINQ to Regex, you can query unstructured text with LINQ as well.

There are two big advantages to LINQ. First, you can query different kinds of data sources with similar code. Second, “language integrated” means that your programming language knows about your query language, making strong typing and better tool support possible. (By contrast, if you have a SQL statement inside VB, for example, VB knows nothing about SQL. The SQL command is just a string as far as VB is concerned. If the SQL is malformed, you won’t know until runtime. But with LINQ, malformed queries generate compile errors.)

Update: See Scott Hanselman’s discussion of LINQ to Regex.

Reusable code vs. re-editable code

In a recent interview, Donald Knuth made this comment about reusable code. I also must confess to a strong bias against the fashion for reusable code.

I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you’re totally convinced that reusable code is wonderful, I probably won’t be able to sway you anyway, but you’ll never convince me that reusable code isn’t mostly a menace.

Knuth didn’t elaborate on what he means by “re-editable” code, but I assume he means code that is easy to maintain. The best chance most code has at reuse is remaining useful in its original project over multiple versions, so maybe we’d get more reuse if we focused more on maintainability.

I think whether code should be editable or in “an untouchable black box” depends on the number of developers involved, as well as their talent and motivation. Knuth is a highly motivated genius working in isolation. Most software is developed by large teams of programmers with varying degrees of motivation and talent. I think the further you move away from Knuth along these three axes the more important black boxes become.

Top five gotchas when learning PowerShell

Here is my list of the top five gotchas when learning Windows PowerShell.

5. PowerShell will not run scripts by default.

4. PowerShell requires . to run a script in the current directory.

3. PowerShell uses -eq, -gt, etc. for comparison operators.

2. PowerShell uses backquote as the escape character.

1. PowerShell separates function arguments with spaces, not commas.

See PowerShell gotchas for more details and an explanation for why PowerShell made the design decisions it did. As surprising as these features are, there are good reasons for each.

Readable path listings

Windows has never made it easy to read long environment variables. If I display the path on one machine I get something like this, both from cmd and from PowerShell.

C:bin;C:binPython25;C:binTeXmiktexbin;C:binTeXMiKTeXmiktexbin;C:binPerlbin;C:ProgramFilesCompaqCompaq Management AgentsDmiWin32Bin; ...

The System Properties window is worse since you can only see a tiny slice of your path at a time.

screen shot of path UI

Here’s a PowerShell one-liner to produce readable path listing:

$env:path -replace ";", "`n"

This produces

C:bin
C:binPython25
C:binTeXmiktexbin
C:binTeXMiKTeXmiktexbin
C:binPerlbin
C:Program FilesCompaqCompaq Management AgentsDmiWin32Bin
...

(If you’re not familiar with PowerShell, note the backquote before the n to indicate the newline character to replace semicolons. This is one of the most unconventional features of PowerShell since backslash is the escape character in most contexts. Because Windows uses either forward or backward slashes as path separators, PowerShell could not use backslash as an escape character. Think of the backquote as a little backslash. Once you get over the initial shock, you get used to the backquote quickly.)

Update: It occurred to me after the original post that there’s an even simpler way to display the path.

$env:path.split(';')