<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Managing biological data</title>
	<atom:link href="http://www.johndcook.com/blog/2009/12/14/managing-biological-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/</link>
	<description>The blog of John D. Cook</description>
	<lastBuildDate>Sat, 11 Feb 2012 01:10:06 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: John</title>
		<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/comment-page-1/#comment-28949</link>
		<dc:creator>John</dc:creator>
		<pubDate>Tue, 15 Dec 2009 16:19:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=3937#comment-28949</guid>
		<description>David, I managed a project for clinical trial data management framework that uses an EAV (entity, attribute, value) database approach on the back-end. It was optimized for rapid development of data entry applications, and made some deliberate trade-offs on other criteria. We got up and running quickly. We were collecting data and learning from client feedback while we still would have been drawing database schema on whiteboards if we&#039;d used a more traditional approach. 

I was also part of an effort to create a relational database for microarray experiment data. Like the projects Randall Julian refers to, this project folded and was replaced by something more like a document management system.</description>
		<content:encoded><![CDATA[<p>David, I managed a project for clinical trial data management framework that uses an EAV (entity, attribute, value) database approach on the back-end. It was optimized for rapid development of data entry applications, and made some deliberate trade-offs on other criteria. We got up and running quickly. We were collecting data and learning from client feedback while we still would have been drawing database schema on whiteboards if we&#8217;d used a more traditional approach. </p>
<p>I was also part of an effort to create a relational database for microarray experiment data. Like the projects Randall Julian refers to, this project folded and was replaced by something more like a document management system.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Clark</title>
		<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/comment-page-1/#comment-28943</link>
		<dc:creator>David Clark</dc:creator>
		<pubDate>Tue, 15 Dec 2009 15:50:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=3937#comment-28943</guid>
		<description>Can you comment or blog in more details about the project where you used &quot;a flexible but structured approach that worked quite well.&quot;</description>
		<content:encoded><![CDATA[<p>Can you comment or blog in more details about the project where you used &#8220;a flexible but structured approach that worked quite well.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian mulvany</title>
		<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/comment-page-1/#comment-28915</link>
		<dc:creator>Ian mulvany</dc:creator>
		<pubDate>Tue, 15 Dec 2009 08:23:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=3937#comment-28915</guid>
		<description>Yes, what jannne said, the important thing is to have as much of the support objects  with the data as possible. It would be good to also be able to
maintain the relationships, &quot;script
a runs on data set b&quot;, &quot;conclusion z
is drawn from step y&quot;, but
beyond that the tools
used to represent and
store these relationships are an open question. Myexperent supports upload of workflows such as from taverna. They aslo expose the relationships between the parts of a research object with te repository standard oai-ore which lends itself to being represented in rdf, but you could look to extending a federation scheme like
sword, that&#039;s built on top of atom, and my favourite new tool to
try to adapt to this mixing of data and logic is google wave. All of
these considerations sadly remain moot, as the vast majority of
scientific data (if counted by experent and not data volume) is stored
in excel files. That tends to be because big
science projects get funding to take care of
data citation and they can invest in doing it properly (thin of the vast amount of software that supports the LHC). Normal science, by contrast, tends to leave researchers to their own devices. Excel is
not as good as data purists would wish, but it is very powerful,
does the job for the most part and in the few cases where a bench scientist might actually want to share their data it&#039;s very easy to do with excell.</description>
		<content:encoded><![CDATA[<p>Yes, what jannne said, the important thing is to have as much of the support objects  with the data as possible. It would be good to also be able to<br />
maintain the relationships, &#8220;script<br />
a runs on data set b&#8221;, &#8220;conclusion z<br />
is drawn from step y&#8221;, but<br />
beyond that the tools<br />
used to represent and<br />
store these relationships are an open question. Myexperent supports upload of workflows such as from taverna. They aslo expose the relationships between the parts of a research object with te repository standard oai-ore which lends itself to being represented in rdf, but you could look to extending a federation scheme like<br />
sword, that&#8217;s built on top of atom, and my favourite new tool to<br />
try to adapt to this mixing of data and logic is google wave. All of<br />
these considerations sadly remain moot, as the vast majority of<br />
scientific data (if counted by experent and not data volume) is stored<br />
in excel files. That tends to be because big<br />
science projects get funding to take care of<br />
data citation and they can invest in doing it properly (thin of the vast amount of software that supports the LHC). Normal science, by contrast, tends to leave researchers to their own devices. Excel is<br />
not as good as data purists would wish, but it is very powerful,<br />
does the job for the most part and in the few cases where a bench scientist might actually want to share their data it&#8217;s very easy to do with excell.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Janne</title>
		<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/comment-page-1/#comment-28905</link>
		<dc:creator>Janne</dc:creator>
		<pubDate>Tue, 15 Dec 2009 05:44:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=3937#comment-28905</guid>
		<description>How about sort of giving up on abstracting the data format and semantics too much, and store the data together with snippets of code that can read, analyze and export the relevant information from the data dump? Could be Perl/Python/Ruby for instance, or R or Matlab modules, but something runnable that verifiably does what it says, and runs in a standard environment. The code is both the analysis and the documentation of the process and the data format itself. Sort of object orientation, but from a data point of view.</description>
		<content:encoded><![CDATA[<p>How about sort of giving up on abstracting the data format and semantics too much, and store the data together with snippets of code that can read, analyze and export the relevant information from the data dump? Could be Perl/Python/Ruby for instance, or R or Matlab modules, but something runnable that verifiably does what it says, and runs in a standard environment. The code is both the analysis and the documentation of the process and the data format itself. Sort of object orientation, but from a data point of view.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/comment-page-1/#comment-28898</link>
		<dc:creator>John</dc:creator>
		<pubDate>Tue, 15 Dec 2009 04:14:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=3937#comment-28898</guid>
		<description>I agree. I think &lt;a href=&quot;http://couchdb.apache.org/&quot; rel=&quot;nofollow&quot;&gt;couchdb&lt;/a&gt; could be a useful part of a solution.</description>
		<content:encoded><![CDATA[<p>I agree. I think <a href="http://couchdb.apache.org/" rel="nofollow">couchdb</a> could be a useful part of a solution.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gabe Moothart</title>
		<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/comment-page-1/#comment-28897</link>
		<dc:creator>Gabe Moothart</dc:creator>
		<pubDate>Tue, 15 Dec 2009 04:09:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=3937#comment-28897</guid>
		<description>Sounds like a document-oriented database like couchdb would be a good fit.</description>
		<content:encoded><![CDATA[<p>Sounds like a document-oriented database like couchdb would be a good fit.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian mulvany</title>
		<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/comment-page-1/#comment-28876</link>
		<dc:creator>Ian mulvany</dc:creator>
		<pubDate>Mon, 14 Dec 2009 23:10:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=3937#comment-28876</guid>
		<description>I was at the uk e-science meeting last week, and much of the discussions revolved around data. The nice folk at myexperiment.org are starting to prote
the idea of a &quot;research object&quot;. This would be a graph of artefacts deposited in a repository that represented the many interralted bits of an experement, part data, part notebook, part article. You could even expose it
in rdf if you wished. I think the idea has a lot
of promise, and could help
with the data-experiment issue you mention above.</description>
		<content:encoded><![CDATA[<p>I was at the uk e-science meeting last week, and much of the discussions revolved around data. The nice folk at myexperiment.org are starting to prote<br />
the idea of a &#8220;research object&#8221;. This would be a graph of artefacts deposited in a repository that represented the many interralted bits of an experement, part data, part notebook, part article. You could even expose it<br />
in rdf if you wished. I think the idea has a lot<br />
of promise, and could help<br />
with the data-experiment issue you mention above.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John</title>
		<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/comment-page-1/#comment-28861</link>
		<dc:creator>John</dc:creator>
		<pubDate>Mon, 14 Dec 2009 20:48:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=3937#comment-28861</guid>
		<description>You may need a lot of biology to get a &lt;em&gt;degree&lt;/em&gt; in bioinformatics, but you don&#039;t need to know a lot of biology to do &lt;em&gt;research&lt;/em&gt; in bioinformatics. 

Some bioinformatics researchers have a substantial background in biology, but many do not. It&#039;s often possible to learn what you need to know just-in-time.</description>
		<content:encoded><![CDATA[<p>You may need a lot of biology to get a <em>degree</em> in bioinformatics, but you don&#8217;t need to know a lot of biology to do <em>research</em> in bioinformatics. </p>
<p>Some bioinformatics researchers have a substantial background in biology, but many do not. It&#8217;s often possible to learn what you need to know just-in-time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robert Talbert</title>
		<link>http://www.johndcook.com/blog/2009/12/14/managing-biological-data/comment-page-1/#comment-28860</link>
		<dc:creator>Robert Talbert</dc:creator>
		<pubDate>Mon, 14 Dec 2009 20:41:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.johndcook.com/blog/?p=3937#comment-28860</guid>
		<description>That remark about being &quot;hard to represent the experiment&quot; makes me think of a question I&#039;ve had about bio- and medical informatics for a while now: Just how much science does one have to know in order to be a good bio- or medical informaticist? 

I&#039;ve been kicking around possibly someday doing a MS in bioinformatice;  I&#039;ve looked into some degree programs in informatics and have been turned back by the sheer amount of biology courses in them. I enjoy biology, but to get a MS in bioinformatics I would need to take something like 30 hours in biology and chemistry in some of these programs, and that&#039;s a lot of time and money spent there. But there are other programs that are just informatics with no science requirements. These are appealing for logistical and cost reasons, but I wonder how well they prepare somebody really to work in the discipline. 

Anybody with some expertise here have a comment about that?</description>
		<content:encoded><![CDATA[<p>That remark about being &#8220;hard to represent the experiment&#8221; makes me think of a question I&#8217;ve had about bio- and medical informatics for a while now: Just how much science does one have to know in order to be a good bio- or medical informaticist? </p>
<p>I&#8217;ve been kicking around possibly someday doing a MS in bioinformatice;  I&#8217;ve looked into some degree programs in informatics and have been turned back by the sheer amount of biology courses in them. I enjoy biology, but to get a MS in bioinformatics I would need to take something like 30 hours in biology and chemistry in some of these programs, and that&#8217;s a lot of time and money spent there. But there are other programs that are just informatics with no science requirements. These are appealing for logistical and cost reasons, but I wonder how well they prepare somebody really to work in the discipline. </p>
<p>Anybody with some expertise here have a comment about that?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.348 seconds -->

