Bioinformatics, Biostatistics, and Biomathematics

Because they are new fields, the boundaries of bioinformatics, biostatistics, and biomathematics are blurry. I’ll illustrate this by relating some of my experience working in these fields.


I joined MD Anderson Cancer Center in 2000, around the time the Human Genome Project was declaring success. It was an exciting time for genetics and bioinformatics. There were great hopes that sequencing the human genome would soon lead to cures for cancer and other diseases. Biologists could now reduce biology to data and so now mathematicians and computer scientists could take over. (While the vision was exciting, the day-to-day reality was more dreary. Much of bioinformatics at the time was the science of file formats and database schema, cleaning and transforming data from one format to another.)

When I arrived at MDACC there was a Department of Biomathematics and a newly formed Department of Biostatistics. Bioinformatics was part of the department’s research but not part of the name. Later Biomathematics became part of the Department of Biostatistics and Applied Mathematics. (The chairman of the former biomath department noted that it would be more logical for biostatistics to be part of applied mathematics than the other way around, but institutional considerations trumped logical classification.)

Most of my work was with biostatistics, though I also worked on/managed projects in bioinformatics. My background was more suited to biomathematics—modeling biological process such as tumor growth with differential equations etc.—but there was more work to be done elsewhere. After leaving MD Anderson I became involved with consulting work that would be classified as biomathematics.

When I went to bioinformatics conferences, I was surprised how many different ideas there were of what bioinformatics meant. In some institutions, it was a branch of computer science and focused on things machine learning. In others it was part of MIS (management information systems) and focused on patient records, database integration, etc. In still other departments, such as my own, it was part of biostatistics and focused more on statistical analysis of biological data: microarrays, gene sequences, proteomic mass spectroscopy data, etc.

Sometime after biomath became a part of biostat, the department split again. The institution created a new Division of Quantitative Sciences to contain the Department of Biostatistics and a new Department of Bioinformatics and Computational Biology. The actual work we did didn’t change as often as our business cards. In practice the lines between departments were fuzzy, fortunately.

From the perspective of software development, the main division was working with human-generated data versus machine-generated data. Or you might say clinical data versus molecular data. Software for clinical trial conduct and data management gathers input from research nurses. Bioinformatics software had to manage data generated by lab equipment. The former is concerned with clear communication, data validation, etc. The latter is more concerned with volume and efficiency.

Another way to distinguish biostatistical software from software for bioinformatics, at least in my experience, is that the former is usually CPU-bound while the latter is mostly I/O bound. (The kind of user input mentioned above isn’t CPU-bound, but the computations behind the scenes, especially simulations, are CPU-bound.)