Logarithms, music, and arsenic

Scientific American has an article suggesting that our natural sense of numbers may operate on a logarithmic scale rather than a linear scale.

It has long been known that our senses often work on a logarithmic scale. For example, sound intensity is measured in decibels, a logarithmic scale. Pitch is perceived on a logarithmic scale, as the Pythagoreans discovered. When moving up a chromatic scale, it’s not the differences in frequencies but the ratios of frequencies that are constant. An octave is a ratio of 2 to 1, so a half step is a ratio of 21/12 to one since there are 12 half step in an octave.

The Statistical Modeling, Causal Inference, and Social Science blog gives an interesting example combining linear and logarithmic perceptions. They quote a study suggesting that when deciding whether to walk to a new well based on information regarding arsenic levels, Bangladeshis perceived “distance to nearest safe well” linearly but perceived “arsenic level” logarithmically.

Subjecting fewer patients to ineffective treatments

Tomorrow morning I’m giving a talk on how to subject fewer patients to ineffective treatment in clinical trials. I should have used something like the title of this post as the title of my talk, but instead my talk is called “Clinical Trial Monitoring With Bayesian Hypothesis Testing.” Classic sales mistake: emphasizing features rather than benefits. But the talk is at a statistical conference, so maybe the feature-oriented title isn’t so bad.

Ethical concerns are the main consideration that makes biostatistics a separate branch of statistics. You can’t test experimental drugs on people the way you test experimental fertilizers on crops. In human trials, you want to stop the trial early if it looks like the experimental treatment is not as effective as a comparable established treatment, but you want to keep going if it looks like the new treatment might be better. You need to establish rules before the trial starts that quantify exactly what it means to look like a treatment is doing better or worse than another treatment. There are a lot of ways of doing this quantification, and some work better than others. Within its context (single-arm phase II trials with binary or time-to-event endpoints) the method I’m presenting stops ineffective trials sooner than the methods we compare it to while stopping no more often in situations where you’d want the trial to continue.

If you’re not familiar with statistics, this may sound strange. Why not always stop when a treatment is worse and never stop when it’s better? Because you never know with certainty that one treatment is better than another. The more patients you test, the more sure you can be of your decision, but some uncertainty always remains. So you face a trade-off between being more confident of your conclusion and experimenting on more patients. If you think a drug is bad, you don’t want to treat thousands more patients with it in order to be extra confident that it’s bad, so you stop. But you run the risk of shutting down a trial of a treatment that really is an improvement but by chance appeared to be worse at the time you made the decision to stop. Statistics is all about such trade-offs.

Related: Adaptive clinical trial design

Side benefits of accessibility

Elliotte Rusty Harold makes the following insightful observation in his new book Refactoring HTML:

Wheelchair ramps are far more commonly used by parents with strollers, students with bicycles, and delivery people with hand trucks than they are by people in wheelchairs. When properly done, increasing accessibility for the disabled increases accessibility for everyone.

For example, web pages accessible to the visually impaired are also more accessible to search engines and mobile devices.

Migrating from HTML to XHTML

I migrated the HTML pages on my website to XHTML this weekend. I’ve been hesitant to do this after hearing a couple horror stories of how a slight error could have big consequences (for example, see how extra slashes caused Google to stop indexing CodeProject) but I took the plunge. Mostly this was a matter of changing <br> to <br /> etc. But I did discover a few errors such as missing </p> tags. I was surprised to find such things because I had previously validated the site as HTML 4.01 strict.

I thought that HTML entities such as &beta; for the Greek letter β were illegal in XHTML. Apparently not. Three validators (Microsoft Expression Web 2, W3C, and WDG) all seem to think they’re OK. Apparently they’re defined in XHTML though not in XML in general. I looked at the official W3C docs and didn’t see anything ruling these out.

Also, I’ve read that <i> and <b> are not allowed in strict XHTML. That’s what Elliotte Rusty Harold says in Refactoring HTML, and he certainly knows more about (X)HTML that I do. But the three validators I mentioned before all approved of these tags on pages marked as XHTML strict. I changed the <i> and <b> tags to <em> and <strong> respectively just to be safe, but I didn’t see anything in the W3C docs suggesting that the <i> and <b> tags were illegal or even deprecated. (I understand that italic and bold refer to presentation rather than content, but it seems pedantic to me to suggest that  <em> and <strong> are any different than their frowned-upon counterparts.)

Validating a website

The W3C HTML validator is convenient for validating HTML on a single page. The site lets you specify a file to validate by entering a URL, uploading a file, or pasting the content into a form. But the site is not designed for bulk validation. There is an offline version of the validator intended for bulk use, but it’s a Perl script and difficult to install, at least on Windows. There is also a web service API, but apparently it has no WSDL file and so is inaccessible from tools requiring such a file.

The WDG HTML validator is easier to use for a whole website. It lets you enter a URL and it will crawl the entire site report the validation results for each page. (It’s not clear how it knows what pages to check. Maybe it looks for a sitemap or follows internal links.) If you’d like more control over the list of files it validates, you can paste a list of URLs into a form. (This was the motivation for my previous post on filtering an XML sitemap to create a list of URLs.)

PowerShell one-liner to filter a sitemap

Suppose you have an XML sitemap and you want to extract a flat list of URLs. This PowerShell code will do the trick.

        ([ xml ] (gc sitemap.xml)).urlset.url | % {$_.loc}

This code calls Get-Content, using the shortcut gc, to read the file sitemap.xml and casts the file to an XML document object. It then makes an array of all blocks of XML inside a <url> tag. It then pipes the array to the foreach command, using the shortcut %, and selects the content of the <loc> tag which is the actual URL.

Now if you want to filter the list further, say to pull out all the PDF files, you can pipe the previous output to a Where-Object filter.

        ([ xml ] (gc sitemap.xml)).urlset.url | % {$_.loc} |
        ? {$_ -like *.pdf}

This code uses the ? shortcut for the Where-Object command. The -like filter uses command line style matching. You could use -match to filter on a regular expression.

Related resources: PowerShell script to make an XML sitemap, Regular expressions in PowerShell