Simple backup software

I was asking about backup software for Windows the other day and a couple people recommended Cobian Backup. It’s simple to use, but also very configurable. And it’s free.

You can have the software simply copy files or you can have it zip the output (.zip or .7z format). In either case, you don’t need the backup software in order to restore your files.

The software has all the features you’d expect. You can perform full, incremental, or differential backups. You can run backups manually or as scheduled tasks. Etc.

How to test a random number generator

Random number generators are challenging to test.

  • The output is supposed to be unpredictable, so how do you know when the generator working correctly?
  • Your tests will fail occasionally, but how do you decide whether they’re failing too often?
  • What kinds of errors are most common when writing random number generation software?

These are some of the questions I address in Chapter 10 of Beautiful Testing.

Beautiful Testing: Leading Professionals Reveal How They Test

The book is now in stock at Amazon. It is supposed to be in book stores by Friday. All profits from Beautiful Testing go to Nothing But Nets, a project to distribute anti-malarial bed nets.

Click to learn more about help with randomization

 

Bayesian clinical trials in one zip code

I recently ran across this quote from Mithat Gönen of Memorial Sloan-Kettering Cancer Center:

While there are certainly some at other centers, the bulk of applied Bayesian clinical trial design in this country is largely confined to a single zip code.

from “Bayesian clinical trials: no more excuses,” Clinical Trials 2009; 6; 203.

The zip code Gönen alludes to is 77030, the zip code of M. D. Anderson Cancer Center. I can’t say how much activity there is elsewhere, but certainly we design and conduct a lot of Bayesian clinical trials at MDACC.

Update: After over a decade working at MDACC, I left to start my own consulting business. If you’d like help with adaptive clinical trials please let me know.

Related posts:

Shallow bugs versus reported bugs

The open source community has a saying: With enough eyes, all bugs are shallow. When enough people look at a piece of code, someone is going find and fix the bugs.

A related principle is that with enough users, all bugs will be reported. With enough people use the software, someone else is going to run into the problem. Someone will report it. Someone will talk about it in an online forum. Someone will blog about it and post a work-around until the bug is fixed. This principle deserves more attention; it’s not cited as often as the shallow bugs principle.

Ideally, you want to use software with lots of eyes and lots of users. Firefox is an open source product with lots eyes and lots of users. But more often you have to pick eyes or users. You have to choose between open but obscure software and closed but popular software. Open source projects may have more people looking at the source code, and so they have the  “many eyes make shallow bugs” maxim working for them. But the user base for many open source projects is tiny compared to their commercial counterparts. The number of users to find and report bugs is small, and the number who document fixes and work-arounds is even smaller.

I’m not ideologically attached to open source or commercial software. I use both. I just want my software to work. And when it doesn’t work, I want to find a solution quickly.

Related posts:

How to differentiate a non-differentiable function

How can we extend the idea of derivative so that more functions are differentiable? Why would we want to do so? How can we make sense of a delta “function” that isn’t really a function? We’ll answer these questions in this post.

Suppose f(x) is a differentiable function of one variable. Suppose φ(x) is an infinitely differentiable function that is zero outside of some finite interval. Functions like φ are called test functions. Integration by parts says that

int_{-infty}^infty f'(x), varphi(x) , dx = -int_{-infty}^infty f(x), varphi'(x) , dx

where the integrals are over the entire real line. (The fact that φ is zero outside a finite interval mean the “uv” term from integration by parts is zero.) Now suppose f(x) is not differentiable. Then the left side of the equation above does not make sense, but the right side does. We use the right hand side to develop the definition of the generalized derivative.

We think of the function f not as a function of real numbers, but as a distribution that operates on tests functions. That is, we associate with f the linear functional on the space of tests functions that maps φ to ∫ f(x) φ(x) dx. Then the distributional derivative of this functional is another linear functional, the distribution that maps test functions φ to -∫ f(x) φ'(x) dx. In summary,

f: varphi mapsto int_{-infty}^infty f(x) , varphi(x), dx \ f': varphi mapsto -int_{-infty}^infty f(x) , varphi'(x), dx

We can use this procedure to define as many derivatives of f as we’d like, as long as f is integrable. So f could be some horribly ill-behaved function, differentiable nowhere in the classical sense, and we could define its 37th derivative by repeatedly applying this idea. (Distributions are also called “generalized functions.” Distributional derivatives are also called “generalized derivatives” or “weak derivatives.”)

By the way, this same procedure is used to make sense of the delta function. The delta function isn’t a function at all. It is the distribution δ that evaluates test functions at zero, i.e. δ maps φ to φ(0). (The delta function often nonsensically defined to be a function that is infinite at zero and zero everywhere else.)

Why would we want to be able to differentiate more functions? When we can differentiate more functions, we can look in a bigger space for solutions to differential equations. Sometimes this allows us to find solutions to equations that do not have classical solutions. Other times this allows us to find classical solutions more easily. We may first prove that a generalized solution exists, and then prove that the generalized solution is in fact a classical solution.

Here’s an analogy that explains how generalized solutions might lead to classical solutions. Suppose you want to find the minimum value of a function for integer arguments. You might first look for a real number that minimizes the function. This lets you, for example, use derivatives in your search for the minimum. If the real minimum you find happens to also be an integer, then you’ve solved your original problem. Distributions and generalized derivatives work much the same way. You might find a classical solution by first looking in a larger space of possible solutions, a space that allows you to use more powerful techniques in your search for a solution.

Related posts:

Reviewing catch blocks

Here’s an interesting exercise. If you’re writing code in a language like C# or C++ that has catch statements, write a script to report all catch blocks. You might be surprised at what you find. Some questions to ask:

  • Do catch blocks swallow exceptions and thus mask problems?
  • Is information lost by catching an exception and throwing a new one?
  • Are exceptions logged appropriately?
  • Are notification messages grammatically correct and helpful?

Here’s a PowerShell script that will report all catch statements plus the five lines following the catch statement.

Related post: Finding embarrassing and unhelpful error messages

Worthless technical books

I sold six technical books to a used book store on the way home today. The store paid me $5 total for four of the books. Two books they didn’t want at all. The books were not that old, but they were practically worthless.

It’s sobering to think how little a technical book is worth a few years after it is printed. It’s a good reminder to focus on things that will last.

Related posts:

Book review: Trade-Off

I enjoyed listening to Moira Gunn’s interview with Kevin Maney, author of the new book Trade-Off: Why Some Things Catch On and Others Don’t (ISBN 038552594X).

The book was a little disappointing after listening to the interview. I felt I had heard most of what Maney had to say before I read the book.

In a nutshell, the message of the book is that you should either strive for fidelity (exclusivity, quality) or convenience (accessibility, affordability). You can succeed by excelling at fidelity or at convenience. But if you strive for both, you’ll lose to companies that are better at one criteria or the other. Maney gives several interesting examples of companies that have succeeded along the edges of the fidelity/convenience graph but then failed when they started pursuing the diagonal.

Related post: I am not an operating system (how Microsoft and Apple are forced into their respective marketing positions)

A sort of opposite of Parkinson’s Law

Parkinson’s Law says that work expands to the time allowed. I’ve seen that play out over and over. However, I’ve also seen a sort of opposite of Parkinson’s Law. Sometimes work gets done faster when you have more time for it.

Sometimes when I’ve planned a large block of uninterrupted, say going into the office when hardly anyone else is there, I get through my to do list in the first hour of the day. Knowing that I have plenty of time, I think more clearly and end up not needing the extra time. When that happens, I sometimes think “If I’d known this would just take 30 minutes to solve, I would have done it sooner.” But it’s not that simple. Just because it took 30 minutes on a good day doesn’t mean that it could have been done during just any 30-minute time slot earlier.

In his book Symmetry and the Monster, Mark Ronan shares a story along these lines. Ronan tells how John Conway worked on a famous mathematical problem. Conway and his wife agreed that he would carve out Saturdays from noon to midnight and Wednesdays from 6 PM to midnight for working on this challenge. He started one Saturday and cracked the problem that evening. Perhaps Conway would have been able to solve his problem by working on it an hour at a time here and there. But it seems reasonable that having a large block of time, and knowing that other large blocks were scheduled, helped Conway think more clearly.

My guess is that Parkinson’s law applies best to projects involving several people and to one-person projects that are not well defined. But for well-defined projects, especially projects requiring creative problem solving, having more time might lead to not needing so much time.

Related posts:

Opening black boxes

Rookie programmers don’t know how to reuse code. They write too much original code because they either don’t know about libraries or they don’t know how to use them. And if they do reuse someone else’s code, they copy and paste it, creating maintenance problems.

The next step in professional development is learning to reuse code. Encapsulation! Black boxes! Buy, don’t build! etc.

But this emphasis on reuse and black boxes can go too far. We can be intimidated by these black boxes and afraid to open them. We can come to believe the black boxes were created by superior beings. We can spend more time inferring the behavior of the black boxes than it would take to open them up or rewrite them. Then we pile leaky abstraction on top of leaky abstraction when we treat our own code as black boxes.

Joe Armstrong said in Coders at Work

Over the years I’ve kind of made a generic mistake … to not open the black box. … It’s worthwhile seeing if the direct route is quicker than the packaged route.

Several of the programmers who were interviewed in the book made similar remarks. They contribute part of their success to being unafraid of black boxes. They gained experience and confidence by taking things apart to see how they work.

Donald Knuth once said in an interview

I also must confess to a strong bias against the fashion for reusable code. To me, “re-editable code” is much, much better than an untouchable black box or toolkit. I could go on and on about this. … you’ll never convince me that reusable code isn’t mostly a menace.

Knuth returns to this theme in Coders at Work.

There’s this overemphasis on reusable software where you never get to open up the box … It’s nice to have these black boxes but, almost always, if you can look inside the box you can improve it …

Well, Knuth can almost always improve any code he finds. Less talented programmers need to be more humble. But too often programmers who are talented enough to make improvements are reluctant to do so. As Yeats said in his poem The Second Coming,

The best lack all conviction, while the worst are full of passionate intensity.

In any discussion of opening black boxes, someone will bring up the analogy of cars: Not everyone needs to know how a car works inside. I would agree that drivers no longer need to understand how a car works, but automotive engineers do. The problem isn’t users who don’t understand how software works, it’s software developers who don’t understand how software works.

Of course software libraries are extremely valuable. Knuth goes too far when he says reusable code is usually a menace. But I see a disturbing lack of curiosity among programmers. They are far too willing to use code they don’t understand.

Related post: Reusable code versus re-editable code

The opening chord of "A Hard Day’s Night"

The opening chord of the Beatles song “A Hard Day’s Night” has been something of a mystery. Guitarists have tried to reproduce the chord with limited success. Turns out there’s a good reason why they haven’t figured it out: the chord cannot be played on a guitar alone.

Jason Brown has digitally analyzed the chord using Fourier analysis and determined that there must have been a piano in the recording studio playing along with the guitars. Brown has determined what notes each member of the Beatles were playing.

I heard Jason Brown’s story on the Mathematical Moments podcast. In addition to the chord discussed above, Brown talks about other things he has discovered about the Beatles and about the relationship between music and math in general. Unfortunately, Mathematical Moments does not make it easy to link to individual episodes. Here is a link to a PDF file of show notes with the audio embedded. The file is slow to download, and your PDF viewer may not support it. Here’s a link directly to just the MP3 audio file.

The Mathematical Moments podcast also does not make it obvious that you can subscribe to the podcast; they only provide links to individual episodes with fat PDF files. However, you can subscribe by using the URL http://www.ams.org/rss/mathmoments.rss.

Click to learn more about consulting help with signal processing