A while back I wrote a post on how to create a sitemap in the standard sitemap.org format using Python. This post does the same task using PowerShell. The solution presented here is an idiomatic PowerShell solution using pipes, not a direct translation of the Python code. I’ll introduce the script in pieces, then present the entire script at the end.
The final line of the script is
dir | wrap | out-file -encoding ASCII sitemap.xml
The heart of the script is the function wrap
that wraps each file’s properties in the necessary XML tags. This function uses the pipeline, and so it has begin
, process
, and end
blocks. The begin
block prints out the XML header and the opening <urlset>
tag. The end
block prints out the closing </urlset>
tag. In between is the process
block that does most of the work.
Since all unassigned expressions are returned from PowerShell functions, the code is very clean. No need for print statements, just state the strings that make up the output. Variable interpolation helps keep the code succinct as well: simply use the name of a variable where you want to insert that variable’s value in a string. (Be sure to use double quotes if you want interpolation.)
The wrap
function uses the implicit variable $_
which means “the next thing in the pipeline.” Since we’re piping in the output of dir
(alias for Get-ChildItem
), $_
represents a FileSystemInfo
object. We look at the extension
property on this object to see whether the file is one of the types we want to include in the sitemap. In this case, .html
, .htm
, or .pdf
. Obviously you can edit the value of the variable $extensions
if you want to include different file types in your sitemap.
Getting the file timestamp in the necessary format is particularly easy. The format specifier {0:s}
causes the date and time to be written in the ISO 8601 format that the sitemap standard requires. The Z
tacked on at the end says that time is UTC rather than some other time zone.
This script will produce a file sitemap.xml
in the standard format. Once you upload the sitemap to your server, you’ve got to let the search engines know how to find it. The simplest way to do this is to create a file called robots.txt
at the top of your site containing one line, Sitemap:
followed by the URL of your sitemap.
Sitemap: http://www.yourdomain.com/sitemap.xml
Now here’s the full script.
# Change this to your URL
$domain = "http://www.yourdomain.com"
# file extensions to include in sitemap
$extensions = ".htm", ".html", ".pdf"
# wrap file information in XML tags
function wrap
{
begin
{
'<?xml version="1.0" encoding="UTF-8"?>'
'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'
}
process
{
if ($extensions -contains $_.extension)
{
"`t<url>"
"`t`t<loc>$domain/$_</loc>"
"`t`t<lastmod>{0:s}Z</lastmod>" -f $_.LastWriteTimeUTC
"`t</url>"
}
}
end
{
"</urlset>"
}
}
dir | wrap | out-file -encoding ASCII sitemap.xml