PowerShell one-liner to filter a sitemap

Suppose you have an XML sitemap and you want to extract a flat list of URLs. This PowerShell code will do the trick.

        (

(gc sitemap.xml)).urlset.url | % {$_.loc}
This code calls Get-Content, using the shortcut gc, to read the file sitemap.xml and casts the file to an XML document object. It then makes an array of all blocks of XML inside a <url> tag. It then pipes the array to the foreach command, using the shortcut %, and selects the content of the <loc> tag which is the actual URL.

Now if you want to filter the list further, say to pull out all the PDF files, you can pipe the previous output to a Where-Object filter.

        (

(gc sitemap.xml)).urlset.url | % {$_.loc} |
? {$_ -like *.pdf}
This code uses the ? shortcut for the Where-Object command. The -like filter uses command line style matching. You could use -match to filter on a regular expression.

Related resources: PowerShell script to make an XML sitemap, Regular expressions in PowerShell

Tagged with: , ,
Posted in PowerShell
3 comments on “PowerShell one-liner to filter a sitemap
  1. Codewiz51 says:

    John,

    You are really into this PowerShell thing. The syntax reminds me of Perl and bash. How long did it take you to get up to speed?

  2. John says:

    It doesn’t take long to get up to speed with PowerShell. It’s very consistent. Bruce Payette’s book is a good place to start. He has an appendix on coming to PowerShell from various backgrounds: Unix shells, DOS, Perl.

    Everyone runs into the same small number of gotchas when learning PowerShell.

    There are a ton of resources at PowerShellCommunity.org including a forum where you can post a chunk of code and have people critique it.

  3. Photon VPS says:

    I’m really enjoying the design and layout of your site. It’s a very easy on the eyes which makes it much more pleasant for me to
    come here and visit more often. Did you hire out a developer to create your theme?
    Outstanding work!