Computational challenges in large-scale sequencing

April 6th, 2009

Thanks to everyone who provided comments and suggestions for my talk for NIH NHGRI’s five-year planning process meeting on The Future of the Large-Scale Sequencing Program, March 23-24 in Bethesda. A copy of my slides is here, if you’re interested.

3 responses

  1. Jason Stajich comments:

    Brilliant! – I really hope they listened.

  2. Ian Holmes comments:

    yes, thanks, that was interesting. I like the slide that says “this data volume is nothing compared to military informatics”.

  3. Sean Eddy comments:

    I borrowed that slide from Dan Meiron, and actually both of us have used it as an example of what *not* to do; e.g. a concrete example of how overly hyped many claims of “data deluge” are. (It’s hard to tell that from the slide without hearing how I talked about it.) Note, for example, that the y-axis of that plot is already on a log scale; so for there to be an exponential increase in data on that plot, the rate of data accumulation would be superexponential, not exponential. The plots is adding new acquisition capabilities together in layers *in log space*, which means that somehow the data acquired by new means are *multiplying*, not summing with previous technologies, which is nonsense. Meiron’s version goes on to show a big red ‘X’ over the whole slide. A back of the envelope calculation of complete imaging at superhigh space/time resolution of the entire Earth surface can’t even reach the data rates that high. Military imaging data acquisition does have some of the same data throughput issues we see in other fields, but it’s just nowhere, nowhere near as bad as that bogus slide implies.

Leave a comment