Comparing the Unix and PowerShell pipelines

This is a blog post I’ve intended to write for some time now. I intended to come up with a great example, but I’ve decided to go ahead and publish it and let you come up with your own examples. Please share your examples in the comments.

One of the great strengths of Unix is the shell pipeline. Unix has thousands of little utilities that can be strung together via a pipeline. The output of one program can be the input to another. But in practice, things don’t go quite so smoothly. Suppose the conceptual pattern is

A | B | C

meaning the output of A goes to B, and the output of B goes to C. This is actually implemented as

A | <grubby text munging> | B | <grubby text munging> | C

because B doesn’t really take the output of A. There’s some manipulation going on to prepare the output of A as the input of B. Strip these characters from these columns, replace this pattern with this other pattern, etc. The key point is the Unix commands spit out text. Maybe at a high level you care about programs A, B, and C, but in between are calls to utilities like grep, sed, or awk to bridge the gaps between output formats and input formats.

The PowerShell pipeline is different because PowerShell commands spit out objects. For example, if the output of a PowerShell command is a date, then the command returns a .NET object representing a date, not a text string. The command may display a string on the command line, but that string is just a human-readable representation. But the string representation of an object is not the object. If the output is piped to another command, the latter command receives a .NET date object, not a string. This is the big idea behind PowerShell. Commands pass around objects, not strings. The grubby, error-prone text munging between commands goes away.

Not all problems go away just because commands pass around objects. For example, maybe one command outputs a COM object and another takes in a .NET object. This is where more PowerShell magic comes in. PowerShell does a lot of work behind the scenes to implicitly convert output types to input types when possible. This sort of magic makes me nervous when I’m programming. I like to know exactly what’s going on, especially when debugging. But when using a shell, magic can be awfully convenient.

14 thoughts on “Comparing the Unix and PowerShell pipelines

  1. I am paraphrasing Jeffrey Snover who says, Unix shells have a rich powerful history. Piping is a parse and pray operation.

    I am all for the magic, well, except when it doesn’t do what I expect. :)

  2. If you want to know what is going on in terms of how PowerShell binds objects to a cmdlet parameter – including any type conversions – this cmdlet is your friend:

    PS> Trace-Command -Name ParameterBinding -PSHost -Expression { Get-ChildItem log.txt | get-content }

  3. I think you’re missing something when you refer to unix pipes and the ‘grubby text munging’. You ignore that in unix, everything is a ‘file’, so that ‘text’ that you talk about is actually a ‘stream of bytes’. What that ‘stream of bytes’ constitutes depends on the output of A, be it an image, a word-processing document or even plain old text. The pipe should be forbidden to change anything about the stream and merely transfer the data from the output of A to the input of B.

    I agree with Doug Finke’s sentiment about ‘magic’.

  4. I have really never touched powershell, but I suspect part of the reason you experience this magic is that the tools you pipe together is more customized for each other than the tools we use on the unix-command line. Someone has done a great deal of integration-programming by mapping .com objects to .net objects and possibly different .net objects to each other as a two complex objects might well contain largely overlapping information which a user might be interested to compare.

    The real power of the unix-pipe is that tools, not written specifically to be integrated with each other can be so, simply by inserting some “grubby text munging” into the pipeline.

  5. “….What that ’stream of bytes’ constitutes depends on the output of A, be it an image, a word-processing document or even plain old text. The pipe should be forbidden to change anything about the stream and merely transfer the data from the output of A to the input of B…”

    The stream of bytes is not necessarily 100% text, and should be treated accordingly, may even contain bytes with a value of 0 that are NOT string terminators.

  6. “The stream of bytes is not necessarily 100% text, and should be treated accordingly, may even contain bytes with a value of 0 that are NOT string terminators.”

    Of course it is. It could even be “…an image, …”. That is why the “stream of bytes” is not to be interfered with. Imagine what could happen to an image if the stream of bytes was altered in any way by the pipe.

  7. To respond to one frequent criticism here: One might actually call converting text into various formats for input to other tools an advantage, but I feel that being misplaced. I rarely pipe binary data from program to program. Really, most of the time it goes straight to a file instead. The one exception probably being tar and compression tools (but most tar versions handle that directly already).

    What’s more, PowerShell knows what it’s passing, what the consuming cmdlet expects and what conversions can be applied to that. Contrasting that to the Unix pipeline model exhibits a major flaw: No one there knows what is being passed or what is expected. Of course any alteration of the piped data is harmful, then.

    That’s not to say that everything PowerShell might convert there is a good thing; in fact it’s quite silent about lossy conversions as well:

    PS Home:> function A ([int]$i) { $i }
    PS Home:> A 3.14
    3

    But to be honest, in the time I am using PowerShell so far I have never experienced conversions that took me by surprise. It might be outrageous in theory, but in practice (or at least my practice) it is a non-problem. Just as many Unix users don’t see shell globbing as a problem as well (which has bitten me once, though, by making a command line way too long).

  8. “That’s not to say that everything PowerShell might convert there is a good thing; in fact it’s quite silent about lossy conversions as well:

    PS Home:> function A ([int]$i) { $i }
    PS Home:> A 3.14
    3 ”

    That example is not an example of a pipe. That is an example of a function. The ‘pipe’ in Unix is a connector between 2 separate programs. It is the programs which do all the processing on data. The pipe merely transfers the unchanged data which has been output from the first program into the input of the second program. The data is called ‘text’ but can be anything at all, even binary if that be what the output from the first program happens to be.

  9. Meh, yes; you’re right. My bad. Ok, now I’d struggle to actually come up with a pipe example that converts in an unexpected way. Might not even be too easy. But that actually underscores my point that this is a non-issue in the real world.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>