Weekend miscellany

Two links on naked mole rats and cancer

Three links on software development, academia, and industry

Four kinds of periodic tables

Five miscellaneous links

Simple backup software

I was asking about backup software for Windows the other day and a couple people recommended Cobian Backup. It’s simple to use, but also very configurable. And it’s free.

You can have the software simply copy files or you can have it zip the output (.zip or .7z format). In either case, you don’t need the backup software in order to restore your files.

The software has all the features you’d expect. You can perform full, incremental, or differential backups. You can run backups manually or as scheduled tasks. Etc.

How to test a random number generator

Random number generators are challenging to test.

  • The output is supposed to be unpredictable, so how do you know when the generator working correctly?
  • Your tests will fail occasionally, but how do you decide whether they’re failing too often?
  • What kinds of errors are most common when writing random number generation software?

These are some of the questions I address in Chapter 10 of Beautiful Testing.

Beautiful Testing: Leading Professionals Reveal How They Test

The book is now in stock at Amazon. It is supposed to be in book stores by Friday. All profits from Beautiful Testing go to Nothing But Nets, a project to distribute anti-malarial bed nets.

Click to learn more about help with randomization


Bayesian clinical trials in one zip code

I recently ran across this quote from Mithat Gönen of Memorial Sloan-Kettering Cancer Center:

While there are certainly some at other centers, the bulk of applied Bayesian clinical trial design in this country is largely confined to a single zip code.

from “Bayesian clinical trials: no more excuses,” Clinical Trials 2009; 6; 203.

The zip code Gönen alludes to is 77030, the zip code of M. D. Anderson Cancer Center. I can’t say how much activity there is elsewhere, but certainly we design and conduct a lot of Bayesian clinical trials at MDACC.

Update: After over a decade working at MDACC, I left to start my own consulting business. If you’d like help with adaptive clinical trials please let me know.

Related posts:

Shallow bugs versus reported bugs

The open source community has a saying: With enough eyes, all bugs are shallow. When enough people look at a piece of code, someone is going find and fix the bugs.

A related principle is that with enough users, all bugs will be reported. With enough people use the software, someone else is going to run into the problem. Someone will report it. Someone will talk about it in an online forum. Someone will blog about it and post a work-around until the bug is fixed. This principle deserves more attention; it’s not cited as often as the shallow bugs principle.

Ideally, you want to use software with lots of eyes and lots of users. Firefox is an open source product with lots eyes and lots of users. But more often you have to pick eyes or users. You have to choose between open but obscure software and closed but popular software. Open source projects may have more people looking at the source code, and so they have the  “many eyes make shallow bugs” maxim working for them. But the user base for many open source projects is tiny compared to their commercial counterparts. The number of users to find and report bugs is small, and the number who document fixes and work-arounds is even smaller.

I’m not ideologically attached to open source or commercial software. I use both. I just want my software to work. And when it doesn’t work, I want to find a solution quickly.

Related posts:

How to differentiate a non-differentiable function

How can we extend the idea of derivative so that more functions are differentiable? Why would we want to do so? How can we make sense of a delta “function” that isn’t really a function? We’ll answer these questions in this post.

Suppose f(x) is a differentiable function of one variable. Suppose φ(x) is an infinitely differentiable function that is zero outside of some finite interval. Functions like φ are called test functions. Integration by parts says that

int_{-infty}^infty f'(x), varphi(x) , dx = -int_{-infty}^infty f(x), varphi'(x) , dx

where the integrals are over the entire real line. (The fact that φ is zero outside a finite interval mean the “uv” term from integration by parts is zero.) Now suppose f(x) is not differentiable. Then the left side of the equation above does not make sense, but the right side does. We use the right hand side to develop the definition of the generalized derivative.

We think of the function f not as a function of real numbers, but as a distribution that operates on tests functions. That is, we associate with f the linear functional on the space of tests functions that maps φ to ∫ f(x) φ(x) dx. Then the distributional derivative of this functional is another linear functional, the distribution that maps test functions φ to -∫ f(x) φ'(x) dx. In summary,

f: varphi mapsto int_{-infty}^infty f(x) , varphi(x), dx \ f': varphi mapsto -int_{-infty}^infty f(x) , varphi'(x), dx

We can use this procedure to define as many derivatives of f as we’d like, as long as f is integrable. So f could be some horribly ill-behaved function, differentiable nowhere in the classical sense, and we could define its 37th derivative by repeatedly applying this idea. (Distributions are also called “generalized functions.” Distributional derivatives are also called “generalized derivatives” or “weak derivatives.”)

By the way, this same procedure is used to make sense of the delta function. The delta function isn’t a function at all. It is the distribution δ that evaluates test functions at zero, i.e. δ maps φ to φ(0). (The delta function often nonsensically defined to be a function that is infinite at zero and zero everywhere else.)

Why would we want to be able to differentiate more functions? When we can differentiate more functions, we can look in a bigger space for solutions to differential equations. Sometimes this allows us to find solutions to equations that do not have classical solutions. Other times this allows us to find classical solutions more easily. We may first prove that a generalized solution exists, and then prove that the generalized solution is in fact a classical solution.

Here’s an analogy that explains how generalized solutions might lead to classical solutions. Suppose you want to find the minimum value of a function for integer arguments. You might first look for a real number that minimizes the function. This lets you, for example, use derivatives in your search for the minimum. If the real minimum you find happens to also be an integer, then you’ve solved your original problem. Distributions and generalized derivatives work much the same way. You might find a classical solution by first looking in a larger space of possible solutions, a space that allows you to use more powerful techniques in your search for a solution.

Related posts:

Reviewing catch blocks

Here’s an interesting exercise. If you’re writing code in a language like C# or C++ that has catch statements, write a script to report all catch blocks. You might be surprised at what you find. Some questions to ask:

  • Do catch blocks swallow exceptions and thus mask problems?
  • Is information lost by catching an exception and throwing a new one?
  • Are exceptions logged appropriately?
  • Are notification messages grammatically correct and helpful?

Here’s a PowerShell script that will report all catch statements plus the five lines following the catch statement.

Related post: Finding embarrassing and unhelpful error messages

Worthless technical books

I sold six technical books to a used book store on the way home today. The store paid me $5 total for four of the books. Two books they didn’t want at all. The books were not that old, but they were practically worthless.

It’s sobering to think how little a technical book is worth a few years after it is printed. It’s a good reminder to focus on things that will last.

Related posts:

Book review: Trade-Off

I enjoyed listening to Moira Gunn’s interview with Kevin Maney, author of the new book Trade-Off: Why Some Things Catch On and Others Don’t (ISBN 038552594X).

The book was a little disappointing after listening to the interview. I felt I had heard most of what Maney had to say before I read the book.

In a nutshell, the message of the book is that you should either strive for fidelity (exclusivity, quality) or convenience (accessibility, affordability). You can succeed by excelling at fidelity or at convenience. But if you strive for both, you’ll lose to companies that are better at one criteria or the other. Maney gives several interesting examples of companies that have succeeded along the edges of the fidelity/convenience graph but then failed when they started pursuing the diagonal.

Related post: I am not an operating system (how Microsoft and Apple are forced into their respective marketing positions)

A sort of opposite of Parkinson’s Law

Parkinson’s Law says that work expands to the time allowed. I’ve seen that play out over and over. However, I’ve also seen a sort of opposite of Parkinson’s Law. Sometimes work gets done faster when you have more time for it.

Sometimes when I’ve planned a large block of uninterrupted, say going into the office when hardly anyone else is there, I get through my to do list in the first hour of the day. Knowing that I have plenty of time, I think more clearly and end up not needing the extra time. When that happens, I sometimes think “If I’d known this would just take 30 minutes to solve, I would have done it sooner.” But it’s not that simple. Just because it took 30 minutes on a good day doesn’t mean that it could have been done during just any 30-minute time slot earlier.

In his book Symmetry and the Monster, Mark Ronan shares a story along these lines. Ronan tells how John Conway worked on a famous mathematical problem. Conway and his wife agreed that he would carve out Saturdays from noon to midnight and Wednesdays from 6 PM to midnight for working on this challenge. He started one Saturday and cracked the problem that evening. Perhaps Conway would have been able to solve his problem by working on it an hour at a time here and there. But it seems reasonable that having a large block of time, and knowing that other large blocks were scheduled, helped Conway think more clearly.

My guess is that Parkinson’s law applies best to projects involving several people and to one-person projects that are not well defined. But for well-defined projects, especially projects requiring creative problem solving, having more time might lead to not needing so much time.

Related posts:

Weekend miscellany

The Fourth Paradigm: Data-Intensive Scientific Discovery free book

Beautiful Testing has gone to press. Should be in bookstores by October 30.

Placebos have side effects too

10/GUI prototype for human-computer interaction

Math Teachers at Play #17. This is a much larger edition than usual.

How schools invented letter grades

Assembly of the International Space Station time-lapse animation

When noise is your friend: smoothed analysis

Fame Frenzy music video