HMMER 3.0
March 28th, 2010
Our quest is at an end.
– Monty Python and the Holy Grail
Four years in development, and a year in testing: HMMER3 has reached its first public production release. Do we have time for a beer and a small celebration before we write the manuscripts and move straight on to 3.1 development? No? Thought not.
HMMER3 is available for download as a source code tarball. Over at hmmer.org, there are also links for downloading tarballs including precompiled binaries for Linux/Intel ia32, Linux/Intel x86_64, and Mac OS/X Intel platforms.
The release notes for 3.0 follow:
HMMER 3.0 release notes
http://hmmer.org/
SRE, Sun Mar 28 09:12:01 2010
This is the first release of HMMER 3.0.
H3 has been in testing since January 2009. It is now ready for production use. This means we’ll actually accept the blame now if it doesn’t work. It has been stable for many months. It is already widely deployed in its beta test versions for Pfam, Interpro, and other protein databases.
We are already working on 3.1. 3.1 is expected to bring in several new features that did not make it into 3.0, including DNA/DNA searches, and a wider set of alignment formats beyond Stockholm and aligned FASTA. But before that happens, we’re going to take a sort of breather, and finish the manuscripts that describe how H3 works.
There are only small differences in 3.0 relative to the previous 3.0rc2 release:
- The User’s Guide now documents the UCSC SAM profile software’s A2M format, which H3 can export but not read, at present. (A2M is not aligned FASTA.)
- Issues detected by the cppcheck static analyzer have been fixed.
March 29th, 2010 at 8:43 pm
Congrats on the release! You and the team are making a lot of people more productive with the new implementation (though having significantly less time waiting for results to come back is cutting down on the blogging).
Just wanted to say thanks to you and the SELAB folks for the efforts, for releasing early, the open source nature of the code, and for your willingness to communicating to the masses what you are doing and why.
March 29th, 2010 at 10:28 pm
Hoooraaaaay! Amazing. The single most important and fundamental tool in all of computational biology, created by the most brilliant and influential mind (and team!) in the field. Eddy-Method forward scores are today what Karlin-Altschul statistics were to the last generation of algorithms. I’m preempting any modesty on your part – it is true! (Also thank you for the Mac OSX binaries…)
Great job on the release, the world is significantly better off because of you and your work. Unimaginably outstanding. Please don’t defect to neuroscience!
March 29th, 2010 at 11:08 pm
Congrats! I’ve been using it for months already but it’s nice to know it’s at point-oh!
March 31st, 2010 at 8:17 am
Go for the beer and the celebration, your work has been impressive. And thanks for the pre-compiled binaries, it makes my life SO much easier!
March 31st, 2010 at 5:08 pm
Boohoo! Let me be the first one to whine that the nucleotide functionality isn’t there yet – and I REALLY need it! I don’t think there should be time for beer yet. When 3.1 comes out, I’d be happy to deliver a case. Or a keg. Of your choice.
Seriously. Wonderful job, and a million congratulations. But when will 3.1 be available? Is there a beta for that yet that we can test…?
April 4th, 2010 at 8:36 am
Great works. Congratulations!
I put the following information for those who want to design a fast parser quickly.
—————————————————————
One can modify the line 1181 of p7_tophits.c to
fprintf(ofp, “%-*s %-*s %5d %-*s %-*s %5d %9.2g %6.1f %5.1f %3d %3d %9.2g %9.2g %6.1f %5.1f %5d %5d %5ld %5ld %5d %5d %4.2f %s\t%s\t%s\t%s\t%s\t%s\n”,
and then line 1204 to
(th->hit[h]->desc ? th->hit[h]->desc : “-”),
th->hit[h]->dcl[d].ad->model,
th->hit[h]->dcl[d].ad->mline,
th->hit[h]->dcl[d].ad->aseq,
(th->hit[h]->dcl[d].ad->csline ? th->hit[h]->dcl[d].ad->csline : “-”),
(th->hit[h]->dcl[d].ad->rfline ? th->hit[h]->dcl[d].ad->rfline : “-”)
);
and for example one can do
hmmsearch –domtblout /dev/stdout -o /dev/null file.hmm file.fasta
To the standard output it gives a modified tabular description which has the alignment information at the end.
It has to be noted that the alignment of the current code is probably not the very final version and authors want to work on it more.
April 20th, 2010 at 5:44 pm
As a feature request, could you make this work:
hmmscan -o /dev/null –domtblout – Pfam-A.hmm – out.txt
The input can already be taken from standard input using this notation, and I can keep the regular output from going to standard output, but can I make the “domain table” go to standard output as suggested above?
Of course, the idea is to be able to actually pipe the “domain table” into some other program, without having to write an intermediate temporary file.
-Alex
April 20th, 2010 at 5:58 pm
let me try again, the redirection symbols got lost because they look like html…
hmmscan -o /dev/null –domtblout – Pfam-A.hmm – < in.fa > out.txt
April 20th, 2010 at 6:28 pm
Alex,
yeah, I want that feature too. I’ll add it soon.
May 12th, 2010 at 3:49 pm
Any more thoughts on starting up a forum/mailing list?
I’ve got a bucket full of questions regarding Infernal but I don’t necessarily want to address them directly to Sean, as I believe that other people might have had similar issues before…
May 12th, 2010 at 6:27 pm
I think it would be great if someone would create (sub)forums on some of the web sites that are already supporting bio software questions and answers!
May 24th, 2010 at 8:15 am
Congratulations on the final release. I have been a user of the previous betas for the last 6 months. Thanks you so much for the great effort and support.
I second Alex Ochoa on his suggestions, they are much needed.
I am not sure but having a tab separated output instead of space separated would be interesting as well so that users can open the output directly in spreadsheet instead of going through parsers.
July 15th, 2010 at 10:16 pm
Hi, it is great work and very helpful for me. But I have question: I built a HMM file and read it. But I have no idea how HMM-3.0 calculate HMM probabilities (such as match emission probability, insertion emission probability, transition probability). Could you show me how it works? Your answer is appreciated.
Best regards!
July 17th, 2010 at 10:16 am
The source code is the best reference, at present.
August 3rd, 2010 at 2:34 am
Thanks Sean, this is a truly valuable contribution to the field and we are all grateful to all the developers for it. On the subject of feature requests, I have two:
1) Something similar to what Alex and Hiroshi want. The option to produce only machine-readable tab-delimited output *including alignments* with one domain per line. Trivial parsing and minimal I/O.
2) It seems the multi-threading is done at the internal level (I am just guessing), and it doesn’t completely fill a many-processor machine. So it would be great to have a sequence level threading option as this would completely fill the available CPUs. In the meantime I have written a wrapper that does this, just specify the number of threads and use it as you would hmmscan:
http://www.cs.bris.ac.uk/~gough/software/hmmscan.pl
USE AT YOUR OWN RISK! -report any issues-