How to grep Twitter

Twitter has an extensive search API. To build the URL for a query, start with the base http://search.twitter.com/search.atom?q=. To search for a word, just append that word to the base, such as http://search.twitter.com/search.atom?q=Coltrane to search for tweets containing “Coltrane.”

To search for a term within a particular user’s tweet stream, start with the base URL and append +from%3A and the user’s name. (The %3A is a URL- encoded colon.) See the search API page for other options, such as specifying the number of requests per page to return (look for rpp) or restricting the language (look for lang).

As far as I can tell, the API does not support regular expressions, but you could loop over the search results programmatically. Here’s how you’d do it in PowerShell.

First, grab the search results as a string. Say we want to search through the latest tweets from Hal Rottenberg, @halr9000.

$base = "http://search.twitter.com/search.atom?q="
$query = "from%3Ahalr9000"
$str = (new-object net.webclient).DownloadString($base + $query)

Now $str contains an XML string of results formatted as an Atom feed. The root element is <feed> and individual tweets are contained in <entry> tags under that. The text of the tweet is in the <title> tag and the other tags contain auxiliary data such as time stamps. The following code shows how to search for the regular expression d{4}. (Look for four-digit numbers.)

([ xml] $str).feed.entry | where-object {$_.title -match "d{4}"}

In English, the code says to cast $str to an XML document object and pipe the <entry> contents to the filter that selects those objects whose <title> strings match the regular expression.

The search API limits the number of entries it will return, so it’s best to do as much filtering as you can via the Twitter site before doing your own filtering.

Related posts

Regular expressions in PowerShell and Perl
Table-driven text munging in PowerShell