Posts about New from the lab

Dfam: annotation of transposable elements with profile HMMs

September 3rd, 2012

We’re happy to announce the release of Dfam 1.0, a set of profile HMMs for genomic DNA annotation of transposable elements. This essentially constitutes an upgrade of repeat element annotation from using searches with single sequence consensuses to using searches with profile HMMs, now that the HMMER3 project has made DNA/DNA profile HMM searches sufficiently fast for whole genomes. Dfam is a collaboration between Jerzy Jurka and his Repbase resources (Genetic Information Research Institute), Arian Smit and his RepeatMasker software (Institute for Systems Biology, Seattle), the HMMER3 development team at Janelia Farm (particularly Travis Wheeler, leading nhmmer development), and the Xfam database consortium (particularly Rob Finn, here at Janelia). Among other effects of this work, we expect the widely used RepeatMasker software to include nhmmer, Dfam models, and profile HMM searches in the near future. A preprint of the first Dfam paper is available now on our preprint server, and the database itself is available for use at dfam.janelia.org.
Read more »

Infernal 1.1: RNA alignment and database search, 10,000x faster

June 30th, 2012

One of our lab’s goals is to make it possible to systematically search for homologs of RNAs in genomes, not just by looking for sequence conservation but also by looking for RNA secondary structure conservation. A powerful model framework for RNA structure/sequence comparison, called profile stochastic-context free grammars (profile SCFGs), was introduced in the mid-1990s both by Yasu Sakakibara and by us. But profile SCFG methods are among the most computationally intensive algorithms used in genome sequence analysis, requiring (in their textbook description, anyway) O(N^4) time and O(N^3) memory for an RNA of N residues. Profile SCFG implementations like our Infernal software have required immense computational power to get even the most basic sort of searches done.

We are happy to announce a new landmark in our work on these methods, with a new version of Infernal that is about 100x faster than the previous (1.0) version, and 10,000x faster than when Eric Nawrocki started working on making Infernal fast enough for routine use. Over at infernal.janelia.org, Eric has made available the first release candidate of Infernal 1.1, 1.1rc1, including source code and binaries for Linux and MacOS/X. A typical RNA homology search of a vertebrate genome that used to require a cpu-year can now be done in about an hour on a single CPU, or a few seconds on a cluster.

So really for the first time, Infernal has become practical for systematic RNA sequence analysis of whole genomes. Roughly speaking, Infernal 1.1 is running at a speed comparable to what HMMER2 ran at — we’ve brought the RNA search problem down from the utterly ridiculous to the merely difficult.

The next version of the Rfam RNA sequence family database will be the first to be computed entirely natively with Infernal RNA structure comparison, instead of using BLASTN as a prefilter. An all-vs-all comparison of all 2000 Rfam models against the entire EMBL DNA database (170 Gb) would take 30,000 cpu-years using Infernal 0.55; now with Infernal 1.1, that enormous Rfam compute is only going to take us about a day on Janelia’s cluster.

Like Infernal 1.0, 1.1 is achieving its speed by using profile HMMs as heuristic prefilters. Whereas 1.0 used HMMER2-like prefilters, 1.1 has now switched to using HMMER3‘s vector engine, sharing code with Travis Wheeler’s soon-to-be-announced nhmmer program for DNA/DNA comparison.

Happy RNA hunting — and don’t let anyone tell you that O(N^4) algorithms aren’t tractable!

More domains and motifs

June 20th, 2012

In the latest version of the HMMER website we have focused on enhancing the recognition and display of domains and motifs found in query sequences. To achieve this we added two new features to the site, additional HMM databases and simple motif detection.
Read more »

Interactive, iterative searches using jackhmmer

April 16th, 2012

It has been a couple of weeks now since we released jackhmmer on the HMMER website and so far (touch wood etc…), it seems to be performing as we had hoped – here on ‘the farm’ we are getting very excited with the results we are observing.  Read more »

A different view on search results

October 17th, 2011

Have you ever wondered how a new protein family would look in context of other Pfam domains? Well, look no further than the hmmer website!  At the end of last week we released a new way of visualizing search results according to ‘domain architecture’ (applies to both phmmer and hmmsearch).   Read more »

RNA secondary structure prediction with probability models

August 23rd, 2011

Over at our publications page, I’ve posted a preprint of Elena Rivas’ latest paper on RNA secondary structure prediction, which she submitted for review today.

Read more »

HMMER3 at your (web) service

February 14th, 2011

hmmer-154x184

Over at hmmer.janelia.org, you’ll notice a significant change over on the right side of the page. See the “Search” button? You don’t have to use HMMER at the UNIX command line any more. Thanks to support from the Howard Hughes Medical Institute, and hard work from Rob Finn and Jody Clements here in the skunkworks at HMMER Labs, HMMER searches are now available on interactive web servers.
Read more »

Departures

June 11th, 2010

Our little Janelia lab got even smaller this week.

Sergi Castellano left to take a new faculty position in the Department of Evolutionary Genetics with Svante Paabo at the Max Planck in Leipzig, Germany. Sergi’s postdoctoral work on single-sequence-query homology searches in HMMER, the project we call “Smith/Waterman: reloaded”, is still in progress — a generative probabilistic interpretation of what all the explicit and implicit zero scores in Smith/Waterman scoring really “mean”, and the nonzero values we say they ought to have. Temple Smith has said his career was based on zero, referring to the extra step in the Smith/Waterman local alignment recursion compared to global sequence alignment. Sergi’s going to show that Temple’s career is and always should have been nonzero.

Diana Kolbe, one of our two last Washington University grad students, defended her thesis back in St. Louis yesterday. She’s worked for many (!) years on ways to accelerate the Infernal software for RNA similarity search. An important last chunk of her work, which we hope to write up for publication in addition to what’s in her thesis, established proof of principle for a structure-based heuristic acceleration that complements Infernal’s current sequence-based heuristic acceleration. It will take significant software engineering to implement her ideas in the production codebase, and she’s more than done her time in grad school already, so we’re letting her go; the proof of principle is captured in an Infernal development snapshot release. She’s off to join Laura Elnitski’s lab at NIH NHGRI as a postdoc.

The lab is getting close to completing a gradual four-year transition from its WashU configuration (where we were about 15 people, almost entirely built around me working with graduate students, and a sprinkling of postdocs and staff) to its Janelia configuration (where we are about 6 people, and the long-term plan is to have a core of staff working on software engineering in HMMER and Infernal, with me working on shiny new things with one or two postdocs). The last Washington University graduate student, Seolkyoung Jung, is now writing her thesis and her last paper; we’ve just submitted her magnum opus, a paper describing an enormous chunk of work on finding ncRNAs in Oxytricha trifallax.

New manuscripts from the lab

May 3rd, 2010

A burst of new work from the lab is available over on our publications page. Summaries and backstories for three of these manuscripts, below the fold: Read more »

our secret agent man infiltrates wikipedia.

December 16th, 2008

Nature has a news article by Declan Butler about a new paper from our secret agent man Tom Jones, in collaboration with Peter Stadler’s lab in Leipzig, which is about to appear in RNA Biology.

Read more »