DSLs in PowerShell

In an earlier post, I quoted John Lam saying that one reason Ruby is such a good language for implementing DSLs (domain specific languages) is that function calls do not require parentheses. This allows DSL authors to create functions that look like new keywords. I believe I heard Bruce Payette say in an interview that Ruby had some influence on the design of PowerShell. Maybe Ruby influenced the PowerShell team’s decision to not use parentheses around function arguments. (A bigger factor was convenience at the command line and shell language tradition.)

In what ways has Ruby influenced PowerShell? And if Ruby is good for implementing DSLs, how good would PowerShell be?

Update: See Keith Hill’s blog post on PowerShell function names and DSLs.

Negative space in operating systems

Unix advocates often say Unix is great because it has all these powerful tools. And yet practically every Unix tool has been ported to Windows. So why not just run Unix tools on Windows so that you have access to both tool sets? Sounds reasonable, but hardly anyone does that. People either use Unix tools on Unix or Windows tools on Windows.

Part of the reason is compatibility. Not binary compatibility, but cultural compatibility. There’s a mental tax for shifting modes of thinking as you switch tools.

I think the reason why few people use Unix tools on Windows is a sort of negative space. Artists use the term negative space to discuss the importance of what is not in a work of art, such as the white space around a figure or the silence framing a melody.

Similarly, part of what makes an operating system culture is what is not there. You don’t have to worry about what’s not there. And not worrying about something frees up brain capacity to think about something else. Having too many options can be paralyzing. I think that even though people say they like Unix for what is there, they actually value what is not there.

Table-driven text munging in PowerShell

In my previous post, I mentioned formatting C++ code as HTML by doing some regular expression substitutions. I often need to write something that carries out a list of pattern substitutions, so I decided to rewrite the previous script to read a list of patterns from a file. Another advantage of putting the list of substitutions in an external file is that the same file could be used from scripts written in other languages.

Here’s the code:

param($regex_file)

$lines = get-content $regex_file

$a = get-clipboard

foreach( $line in $lines )
{
    $line = ($line.trim() -replace "s+", " ")
    $pair = $line.split(" ", [StringSplitOptions]::RemoveEmptyEntries)
    $a = $a -replace $pair
}

out-clipboard $a

The part of the script that is unique to formatting C++ as HTML is moved to a separate file, say cpp2html.txt, that is pass in as an argument to the script.

&  &
<  &lt;
>  &gt;
"  &quot;
'  &#39;

Now I could use the same PowerShell script for any sort of task that boils down to a list of pattern replacements. (Often this kind of rough translation does not have to be done perfectly. It only has to be done well enough to reduce the amount of left over manual work to an acceptable level. You start with a small list of patterns and add more patterns until it’s less work to do the remaining work by hand than to make the script smarter.)

Note that the order of the lines in the file can be important. Substitutions are done from the top of the list down. In the example above, we want to first convert & to &amp; then convert < to &lt;. Otherwise, < would first become &lt; and then become &amp;lt;.

Manipulating the clipboard with PowerShell

The PowerShell Community Extensions contain a couple handy cmdlets for working with the Windows clipboard: Get-Clipboard and Out-Clipboard. One way to use these cmdlets is to copy some text to the clipboard, munge it, and paste it somewhere else. This lets you avoid creating a temporary file just to run a script on it.

For example, occasionally I need to copy some C++ source code and paste it into HTML in a <pre> block. While <pre> turns off normal HTML formatting, special characters still need to be escaped: < and > need to be turned into &lt; and &gt; etc. I can copy the code from Visual Studio, run a script html.ps1 from PowerShell, and paste the code into my HTML editor. (I like to use Expression Web.)

The script html.ps1 looks like this.

$a = get-clipboard;
$a = $a -replace "&", "&amp;";
$a = $a -replace "<", "&lt;";
$a = $a -replace ">", "&gt;";
$a = $a -replace '"', "&quot;"
$a = $a -replace "'", "&#39;"
out-clipboard $a

So this C++ code

double& x = y;
char c = 'k';
string foo = "hello";
if (p < q) ...

turns into this HTML code

double&amp; x = y;
char c = &#39;k&#39;;
string foo = &quot;hello&quot;;
if (p &lt; q) ...

Of course the PSCX clipboard cmdlets are useful for more than HTML encoding. For example, I wrote a post a few months ago about using them for a similar text manipulation problem.

If you’re going to do much text manipulation, you may may want to look at these notes on regular expressions in PowerShell.

The only problem I’ve had with the PSCX clipboard cmdlets is copying formatted text. The cmdlets work as expected when copying plain text. But here’s what I got when I copied the word “snippets” from the CodeProject home page and ran Get-Clipboard:

Version:0.9
StartHTML:00000136
EndHTML:00000214
StartFragment:00000170
EndFragment:00000178
SourceURL:http://www.codeproject.com/
<html><body>
<!--StartFragment-->snippets<!--EndFragment-->
</body>
</html>

The Get-Clipboard cmdlet has a -Text option that you might think would copy content as text, but as far as I can tell the option does nothing. This may be addressed in a future release of PSCX; it has been assigned a work item.

Regular expressions in PowerShell and Perl

This is one of the most popular pages on my web site:

Regular expressions in PowerShell and Perl

It’s about how you use regular expressions in PowerShell — how to do matches, replacements, etc. — rather than the grammar of regular expressions. It makes comparisons to Perl, in case you’re already familiar with how to use regular expressions there.

Experimenting with Out-Speech in PowerShell

I’ve played around with the PSCX script Out-Speech at home and at work. At home, running Vista, words come out in a natural female voice. At work, running XP, words come out in a robotic male voice.

The voice is somewhat configurable. I didn’t try it at home, but at work I opened the Speech Properties applet in the control panel. All three are mechanical voices. I went to Microsoft’s web site to see if I could download a natural voice. The site said that Microsoft does not provide other voices but it gives a link to third party providers.

My guess is that Microsoft deliberately put lame voices in XP for fear of a lawsuit and that they were braver by the time Vista was released.

Another difference I noticed between Vista and XP is tolerance of misspellings. XP will correctly pronounce “Fahrenheit” but pronounces the incorrect “Farenheit” so that it rhymes with “heat” rather than “height”. Vista correctly pronounces the misspelled word.

Depend on objects, not their presentation

The most recent blog post by Jeffrey Snover emphasizes that PowerShell pipes objects, not text. When you use single PowerShell commands, you can get the impression that they output text.  But everything is an object until the pipeline spills onto the command line.

In UNIX, text output is effectively a programming contract because that is what the whole system is built upon.  One command outputs text and other programs know what to expect so they parse the text to get the appropriate data elements so that they can code against it.  In this model, if you change the text output of a command – you run the risk of breaking a bunch of scripts.  … In PowerShell … We reserve the right to radically change our text rendering to improve our customer experience.

(Emphasis in the original.)

The object interfaces won’t change, but the text rendering probably will.

PowerShell posts classified

Here’s a summary of the blog posts I’ve written so far regarding PowerShell, grouped by topic.

Three posts announced CodeProject articles related to PowerShell:  automated software builds, text reviews for software, and monitoring legacy code.

Three posts on customizing the command prompt: I, II, III.

Two posts on XML sitemaps: making a sitemap and filtering a sitemap.

Two Unix-related posts: cross-platform PowerShell and comparing PowerShell and bash.

The rest of the PowerShell posts I’ve written so far fall under miscellaneous.

gotchas
PolyMon
here-strings
readable paths
uninitialized variables
ASP.NET in PowerShell
clipboard and command line
launching PowerShell faster
rounding and integer division
one program to rule them all
redirection and Unicode vs ASCII

Much to my surprise, the post on integer division in PowerShell has been one of the most popular.

PowerShell output redirection: Unicode or ASCII?

What does the redirection operator > in PowerShell do to text: leave it as Unicode or convert it to ASCII? The answer depends on whether the thing to the right of the > operator is a file or a program.

Strings inside PowerShell are 16-bit Unicode, instances of .NET’s System.String class. When you redirect the output to a file, the file receives Unicode text. As Bruce Payette says in his book Windows PowerShell in Action,

myScript > file.txt is just syntactic sugar for myScript | out-file -path file.txt

and out-file defaults to Unicode. The advantage of explicitly using out-file is that you can then specify the output format using the -encoding parameter. Possible encoding values include Unicode, UTF8, ASCII, and others.

If the thing on the right side of the redirection operator is a program rather than a file, the encoding is determined by the variable $OutputEncoding. This variable defaults to ASCII encoding because most existing applications do not handle Unicode correctly. However, you can set this variable so PowerShell sends applications Unicode. See Jeffrey Snover’s blog post OuputEncoding to the rescue for details.

Of course if you’re passing strings between pieces of PowerShell code, everything says in Unicode.

Thanks to J_Tom_Moon_79 for suggesting a blog post on this topic.

Improved PowerShell prompt

A while back I wrote a post on how to customize your PowerShell prompt. Last week Tomas Restrepo posted an article on a PowerShell prompt that adds color and shortens the path in a more subtle way. I haven’t tried it out yet, but his prompt looks much better than what I’ve been using.

If you’re a long-time Windows user you might be worried that all this PowerShell stuff is starting to look a lot like Unix. Well, it is. Some of the folks on the PowerShell team have a Unix background and they’re bringing some of the best of Unix to Windows. The Unix world has more experience operating from the command line and so it’s wise to learn from them.

On the other hand, PowerShell is emphatically not bash for Windows. PowerShell is thoroughly object oriented and in that respect unlike any Unix shell. Also, PowerShell is strongly tied to Microsoft libraries, particularly .NET but also COM and WMI.

PowerShell one-liner to filter a sitemap

Suppose you have an XML sitemap and you want to extract a flat list of URLs. This PowerShell code will do the trick.

        (

(gc sitemap.xml)).urlset.url | % {$_.loc}
This code calls Get-Content, using the shortcut gc, to read the file sitemap.xml and casts the file to an XML document object. It then makes an array of all blocks of XML inside a <url> tag. It then pipes the array to the foreach command, using the shortcut %, and selects the content of the <loc> tag which is the actual URL.

Now if you want to filter the list further, say to pull out all the PDF files, you can pipe the previous output to a Where-Object filter.

        (

(gc sitemap.xml)).urlset.url | % {$_.loc} |
? {$_ -like *.pdf}
This code uses the ? shortcut for the Where-Object command. The -like filter uses command line style matching. You could use -match to filter on a regular expression.

Related resources: PowerShell script to make an XML sitemap, Regular expressions in PowerShell