HMMER policy on trademark, copyright, patents, and licensing

April 29th, 2009


There are two ways of spreading light: to be the candle or the mirror that reflects it.

– Edith Wharton, Vesalius in Zante

On April 14, the US Patent and Trademark Office awarded us a trademark on HMMER. This is a good moment to explain how we plan to deal with intellectual property.

HMMER is scientific software, and its methods are described in journal publications. That means that it must be made available in a form that enables any scientist to understand, reproduce, and extend – like any other result of a scientific paper. For software, this is essentially the same as what people mean by “open source”. Our intent is to make HMMER widely and freely available to the entire scientific community as open source code. At the same time, we have to recognize that HMMER is a large, growing, and increasingly valuable codebase, not just a one-off result, so we’re taking steps to make sure we can sustain it as a long term, coherent open source project.

LICENSE

HMMER3 is licensed under the GPLv3 (GNU General Public License, version 3). This means anyone can use it, study it, modify it, and even redistribute it — with the requirement that any modified/redistributed versions must also be licensed under the GPLv3. This explicitly includes both “noncommercial” and “commercial” use (whatever that means, in these days of multibillion dollar research universities and garage biotech startups). People at companies are scientists too, with the same rights and responsibilities regarding results in the scientific literature. The only thing the GPLv3 really blocks is someone forking a derivative copy of HMMER and distributing it under a different license, such as a closed-source proprietary license; to do that, you’d need to negotiate a non-GPL license with us first.

COPYRIGHT

We really don’t expect to negotiate any non-GPL licenses, though. We want to enable many different people to contribute to a single open source HMMER codebase, as a shared codebase for bioinformatics and computational biology. Having a lot of contributors means having a lot of copyrights. Many different copyrights already apply to HMMER, and we plan for even more; negotiating with all those copyright holders to obtain a non-GPL license will be prohibitive (for me, if not for you). It’s relatively easy to get everyone to agree to donate their copyrighted code under license terms compatible with the open source GPLv3, and that’s what we plan to do.

The Howard Hughes Medical Institute (HHMI; my employer) is the main copyright holder. Our terrific Hughes lawyer, Heidi Henning, has negotiated with Washington University (St. Louis USA) and the Medical Research Council Laboratory of Molecular Biology (MRC-LMB; Cambridge UK), my former employers, to transfer their copyrights to HHMI. We also have bits of code from a number of other sources, including Apple, IBM, and some other companies of various sizes, and several individuals in the comp bio community.

TRADEMARK

Did I mention, we want to enable a single open source HMMER codebase? There are several different “HMMERs” out there, some of which have forked HMMER code, some of which say they are independent implementations, and some of which aren’t very clear what they are. I don’t think this confusion around the name is useful for the community, and frankly, I find it somewhat annoying that people are forking rather than working together. (I also disbelieve that it is possible to independently implement HMMER, because there’s so much unpublished trickery in the code; so as far as I’m concerned, either a faux non-open HMMER is getting different and probably wrong answers and making me look bad, or it’s infringing my work and my license and making me mad.) Especially now with the advent of HMMER3, these other “HMMERs” are obsolete, imho.

To help drive cohesion of a single codebase, we have trademarked HMMER. I would now ask anyone who is distributing something called “HMMER” that is not HMMER to change their name to something else, in order not to confuse people. We will soon start “enforcing” the HMMER trademark with some friendly letters, if needed — these letters will be requests to work together on a common codebase. Of course you are still free to use the codebase under the terms of the GPLv3 for whatever you want — just please don’t call modified versions “HMMER”. If you make useful modifications, please consider contributing them back to HMMER instead. We think the “brand recognition” of HMMER is going to help motivate people to cooperate rather than fork.

Part of this plan involves us taking on more responsibility — we are making a commitment to spending time and effort on integrating useful modifications into the HMMER codebase. For example, I’m already making plans with Bjarne Knudsen and CLCbio (Copenhagen, Denmark) to work together to make sure that CLC will be able to integrate the open source version of HMMER3, rather than needing their own version.

PATENTS

We have debated defensively patenting the key innovations in HMMER3, but decided against it. HHMI, to its great credit, is perfectly prepared to file patents solely to defend the intellectual turf of an open source software tool — that is, if we were to be challenged by some commercial patent holder on something, we could fight fire with fire. In the end, though, I feel that software patents on published scientific results are sufficiently controversial and in conflict with the openness required of published scientific results that we decided we didn’t want to go there.

We are prepared to license and incorporate other people’s patented technologies in HMMER3, if necessary. The first example of that is the incorporation of patent-pending technology from Michael Farrar, which I use at the heart of HMMER3′s SIMD vector acceleration code. We licensed that technology nonexclusively from Michael specifically limited to its use in HMMER open source code, and we’ll do that with other future technologies as needed. The “patent clause” of the GPLv3 automatically conveys a nonexclusive license to you, and on through to derivative works. This means that you don’t have to do anything; the GPLv3 is automagically taking care of patent issues, once HHMI and I have done the right licensing up front. This is a big reason why I’m using the GPLv3.

A lot of thought has gone into our positions on HMMER’s intellectual property, thanks especially to discussions with HHMI lawyers and staff (Heidi Henning, Seth Brown, and Joanne Theurich). We think we’ve got this right, for a sustainable long-term plan of open source software development that benefits the whole community. But if you have comments or criticisms, this is a good time to hear them.

8 responses

  1. Ian Holmes comments:

    “I also disbelieve that it is possible to independently implement HMMER, because there’s so much unpublished trickery in the code”

    what then is your position on straightforward reimplementations of the HMMer Profile HMM which very clearly *are* independent (but may lack some of HMMer’s pre/post processing code)?

    For example, the HMMer adapters for HMMoC?

  2. Sean Eddy comments:

    Hi Ian,

    My concern is only with people calling their software “HMMER” when it’s not HMMER. The issue isn’t about profile HMM technology, nor over compatibility with HMMER. Everyone should be using profile HMM technology, imho; everyone should feel free to reuse HMMER code (subject to keeping it open sourced, anyway); and it’s in everyone’s interests, including mine, to be sure that HMMER plays well with others, so compatibility/interchange with other software is important and encouraged.

    Gerton’s HMMoC is perfectly clear about what it is – no one is going to confuse HMMoC with HMMER.

    My main concern is with things like Progeniq’s “BioBoost Accelerated HMMER”, Logical Depth’s “LD-HMMER”, and CLCbio’s “HMMER”. None of these companies distribute their code under a GPL (afaik), and none of them have a non-GPL license to my code — so (at best) they’re using the name HMMER to describe a codebase that isn’t HMMER, and at worst, they’re infringing the GPL. We are assuming the former. We will be asking these companies to stop using the name HMMER to describe a non-HMMER software package. Advertising HMMER *compatibility* is perfectly fine — and encouraged; we will be working directly with CLCbio, for example, to make sure they can use the open source HMMER3 code.

  3. scalability.org » Blog Archive » Short article on the growth of accelerators in life science work pings back:

    [...] confusing the mark, or b) simply stop distributing it. The rationale behind this is explained in this post. I take issue with the discussion on forking, as from what I understand it, the team tried to [...]

  4. Steven Salzberg comments:

    hi Sean,
    I must say I’m disappointed to see this. I understand your arguments, but they seem to boil down to (a) spending time and money on lawyers, and (b) preventing companies from using the HMMer name for their commercial software.

    My view is that this just makes lawyers happy, it doesn’t save you any time, and it doesn’t promote your software in the end. HMMer has been around for a long time and is very popular, and it’s managed to survive just fine without a trademarked name. My groups GLIMMER software has been around a long time too, and it’s used all over the world, and it’s open source, and I would never patent or trademark any part of it. I think fights over intellectual property are an enormous, wasteful drain on the incredibly valuable intellectual energies of people like you, who have much better things to do than to talk to lawyers.

    Lawyers have developed a host of very clever arguments about why they disagree with what I just wrote, but I think their biases are obvious.

    Okay, now I’ve spent more time than I really wanted to on this issue. See how it sucks the life out of us?
    -Steven

  5. Sean Eddy comments:

    Oi vey. I’m not at all “preventing companies from using the HMMER name”. I encourage anyone — companies included — to use the HMMER name — provided they’re actually using HMMER, rather than attaching the name to something else that has nothing to do with the HMMER code! This is just a basic issue of correct attribution. I don’t think you’ve had to face this problem with GLIMMER (yet).

    I can’t agree with your comments about lawyers. Our legal counsel at HHMI is smart, sensible, efficient, and committed to open source and open research.

  6. Bob Carpenter comments:

    In my field (computational linguistics), there is a huge tension between reality, as represented by the quote:

    “I also disbelieve that it is possible to independently implement HMMER, because there’s so much unpublished trickery in the code; …”

    and the scientific ideal, as represented by the quote:

    “… it must be made available in a form that enables any scientist to understand, reproduce, and extend – like any other result of a scientific paper.”

    The problem is “understand” here. We can reproduce the result if we have the same source that’s used in house (that is, the exact same source, not a released version that’s somewhat like it), as well as the same data pre-processing and post-processing code. But we can’t really understand “unpublished trickery” buried in a code base. At least without a huge time investment.

    There’s also a sociological obstacle — it’s hard to publish on the trickery because, well, it looks like a hack. Even insightful reports on pristine implementations are difficult to publish. That’s one of the reasons we started a software workshop at the Association for Computational Linguistics meetings — it was just impossible to get implementation details into main conference papers, yet they were critical for anyone like me who spends most of their time implementing systems.

    PS. I understand wanting trademark control and copyright. GPL’s more generous than some “academic use only” code. It only really prevents companies from redistributing the code along with non-GPL-ed linked components or modifications. That’ll stop companies from creating a software business around GPL-ed code, but it won’t stop them from using it or contributing to it if their business consumes rather than sells the software. We release our commercial natural language processing software under a GPL-like license, because it’s toxic enough to most companies (even if they don’t redistribute code) that they’ll pay us for commercial licenses (it’s like MySQL’s dual license strategy).

    PPS. I’ve wondered if Farrar’s striped Smith-Waterman (which is one of those things that hardly seems patentable, especially given the prior art) is really the best way to go for bio sequence mapping (e.g. as used, in variant form, in SHRiMP). I thought most sequencing problems were bounded difference problems, and the striped evaluation, while able to speed up brute-force Smither-Waterman using GPU or other vector operations, isn’t easy to use only evaluate the near off-diagonal elements, which seems to provide just as much speedup; especially if used in an iterative deepening fashion, like BWA’s search (that is, first looking for 0-mismatch alignments, then 1-mismatch alignments, etc.). (There are nice algorithm descriptions in Gusfield’s strings book.)

  7. James Lindelien comments:

    It seems simple enough that if an organization is not using the pure code base then it’s implementation should best be described by a different name, as Dr. Eddy notes. (A nit to be sure but Dr. Eddy should also make clear to the community whether his attorneys have also copyrighted HMMER3 or only HMMER, as USPTO will likely not consider these the same and in fact they name two separate codes. Otherwise some smart lawyer will create new headaches for everyone. Then there is the basic problem that, over time as the code evolves in small ways, “HMMER3″ shall not necessarily refer to a single entity.)

    There is another tension at play here. Experience gained during my former role as CEO at TimeLogic and with the DeCypher FPGA accelerator product teaches that virtually all academic codes are not well structured for direct acceleration on FPGAs. This is said not with lack of respect to the academic, but merely an engineering reality driven by the nature of FPGAs. The flow and timing of the original code, vs. that of the accelerated portions in adapted code, often demands a significant restructuring of the code if meaningful acceleration is to be realized. More commonly it is necessary to revert to the pure mathematical description and reimplement from scratch if one wishes a high performance FPGA based variant of the original code. Depending on the details, this may involve tradeoffs which might show up as differences in the output or may not. Marketplace factors (demands by important customers for proprietary advantages to be encoded into the product’s implementation) also put the commercial vendor in a difficult spot with the original academic, and often none of this can even be discussed with the original academic under terms of commercial non-disclosure, which results in tension. It is also in the commercial vendor’s competitive interests to offer “something beyond” the standard code base to the degree these claimed benefits can be confirned by independent third party researchers on apropos applications of interest broadly within the scientific community. Finally, as Dr. Eddy has highlighted, it is a major challenge for the original author to properly document all his tricks and hacks, although open scientific disclosure demands his best attempt to do so. Nor is it possible in every case for a commercial vendor to immediately adopt a new code base and make it simultaneously available without compromising product testing and reliability.

    Then there is the matter of revision level vs. practical day-to-day considerations. I can appreciate that from the point of view of the academic scientific community, it is vital to be able to reproduce with 100% fidelity a prior research result by, for example, downloading and re-executing an analysis via an older (and generally buggier and less competent) release of the code base (and the database of models, which also keeps changing). But given the frequent number of code base changes over time, the “thing” Dr. Eddy wishes to unambiguously protect is always changing anyway, and this naturally leads to differences in output results even between “officially sanctioned” versions of the code base. From a practical viewpoint, user-institutions are depending on their staff being well trained and smart enough to interpret the output of any of their analytical tools, at whatever revision level of the code implementation happens to be running that day.

    From my perspective there is no easier answer to these tensions than to rename one’s particular embodiment of the code base (or in the case of FPGA design, the underlying mathematical theory that led to the non-accelerated code base), so as to provide disambiguation to the community. Then let an independent accredited research authority compare the variants and formally report on their relative merits or demerits.

    Regards,

    Jim Lindelien

  8. Sean Eddy comments:

    Thanks, Jim. The tension you’re describing is exactly right, and it’s what’s led us (for better or worse) to give up on the idea of commercialized forks of HMMER, and to emphasize a single unified code base as we move forward with H3.

    On your smaller points: All our versions of HMMER are indeed copyrighted and GPL’ed; the umbrella term “HMMER” itself is a HHMI trademark.

    And for what it’s worth, my experience is that “buggier” and “less competent” are not terms that are well correlated with academic vs. commercial software development.

    Regarding Bob’s earlier comments, let me clarify: we distinguish individual published results from the entirety of the HMMER package. For any paper we write, we provide more than enough information to enable anyone to reproduce and extend our work, freely. That often means providing documented snapshots of source code. This is not the sort of “trickery” I was referring to – anything we publish, we are absolutely obligated to explain well, and make freely available. But HMMER is necessarily larger than any of our academic papers, and it necessarily contains “unpublished trickery”: hacks that we are unable if not actively embarrassed to publish, but which definitely affect details of results for users of HMMER. We are actually *not* obligated to properly document these, at least not by scientific publication standards, because we haven’t published them in the literature (that said, my personal craftsmanship standards are a lot higher than what we’re merely obligated to do, and I make every effort to document thoroughly). It’s because of this “unpublished trickery” that I disbelieve that anyone can exactly reproduce the results of HMMER without consulting us, or consulting the source code to such a degree that one would have to be supercareful not to infringe our copyright, if one were developing a “HMMER compatible” commercial clone. I emphasize again that results described in our *publications* are absolutely intended to be fully available, documented, and reproducible, and we put a lot of effort into making sure this is the case.

Comments are now closed