Comments on: Managing biological data

By: John

John — Tue, 15 Dec 2009 16:19:31 +0000

In reply to David Clark. David, I managed a project for clinical trial data management framework that uses an EAV (entity, attribute, value) database approach on the back-end. It was optimized for rapid development of data entry applications, and made some deliberate trade-offs on other criteria. We got up and running quickly. We were collecting data and learning from client feedback while we still would have been drawing database schema on whiteboards if we'd used a more traditional approach. I was also part of an effort to create a relational database for microarray experiment data. Like the projects Randall Julian refers to, this project folded and was replaced by something more like a document management system.

By: David Clark

David Clark — Tue, 15 Dec 2009 15:50:31 +0000

Can you comment or blog in more details about the project where you used “a flexible but structured approach that worked quite well.”

By: Ian mulvany

Ian mulvany — Tue, 15 Dec 2009 08:23:10 +0000

Yes, what jannne said, the important thing is to have as much of the support objects with the data as possible. It would be good to also be able to
maintain the relationships, “script
a runs on data set b”, “conclusion z
is drawn from step y”, but
beyond that the tools
used to represent and
store these relationships are an open question. Myexperent supports upload of workflows such as from taverna. They aslo expose the relationships between the parts of a research object with te repository standard oai-ore which lends itself to being represented in rdf, but you could look to extending a federation scheme like
sword, that’s built on top of atom, and my favourite new tool to
try to adapt to this mixing of data and logic is google wave. All of
these considerations sadly remain moot, as the vast majority of
scientific data (if counted by experent and not data volume) is stored
in excel files. That tends to be because big
science projects get funding to take care of
data citation and they can invest in doing it properly (thin of the vast amount of software that supports the LHC). Normal science, by contrast, tends to leave researchers to their own devices. Excel is
not as good as data purists would wish, but it is very powerful,
does the job for the most part and in the few cases where a bench scientist might actually want to share their data it’s very easy to do with excell.

By: Janne

Janne — Tue, 15 Dec 2009 05:44:11 +0000

How about sort of giving up on abstracting the data format and semantics too much, and store the data together with snippets of code that can read, analyze and export the relevant information from the data dump? Could be Perl/Python/Ruby for instance, or R or Matlab modules, but something runnable that verifiably does what it says, and runs in a standard environment. The code is both the analysis and the documentation of the process and the data format itself. Sort of object orientation, but from a data point of view.

By: John

John — Tue, 15 Dec 2009 04:14:26 +0000

I agree. I think couchdb could be a useful part of a solution.

By: Gabe Moothart

Gabe Moothart — Tue, 15 Dec 2009 04:09:23 +0000

Sounds like a document-oriented database like couchdb would be a good fit.

By: Ian mulvany

Ian mulvany — Mon, 14 Dec 2009 23:10:56 +0000

I was at the uk e-science meeting last week, and much of the discussions revolved around data. The nice folk at myexperiment.org are starting to prote
the idea of a “research object”. This would be a graph of artefacts deposited in a repository that represented the many interralted bits of an experement, part data, part notebook, part article. You could even expose it
in rdf if you wished. I think the idea has a lot
of promise, and could help
with the data-experiment issue you mention above.

By: John

John — Mon, 14 Dec 2009 20:48:47 +0000

In reply to Robert Talbert. You may need a lot of biology to get a degree in bioinformatics, but you don't need to know a lot of biology to do research in bioinformatics. Some bioinformatics researchers have a substantial background in biology, but many do not. It's often possible to learn what you need to know just-in-time.

By: Robert Talbert

Robert Talbert — Mon, 14 Dec 2009 20:41:51 +0000

That remark about being “hard to represent the experiment” makes me think of a question I’ve had about bio- and medical informatics for a while now: Just how much science does one have to know in order to be a good bio- or medical informaticist?

I’ve been kicking around possibly someday doing a MS in bioinformatice; I’ve looked into some degree programs in informatics and have been turned back by the sheer amount of biology courses in them. I enjoy biology, but to get a MS in bioinformatics I would need to take something like 30 hours in biology and chemistry in some of these programs, and that’s a lot of time and money spent there. But there are other programs that are just informatics with no science requirements. These are appealing for logistical and cost reasons, but I wonder how well they prepare somebody really to work in the discipline.

Anybody with some expertise here have a comment about that?