Archive for September, 2013

The Crandall lab explores solutions to incomplete phylogenies

The Crandall Lab is in charge of uploading and curating animal studies for the AVAToL-Open Tree project.  Chris Owen, postdoctoral researcher, has been leading this portion of the project for the animals beginning in March 2013.  To date, the Crandall Lab has contributed over 400 studies and sent requests for over 100 studies for authors to contribute their phylogenies to the Open Tree project.

Similar to the Solitis Lab group, the Crandall Lab success rate for obtaining published phylogenies directly from authors has been rather low.  As a result, many animal lineages are currently represented in the Open Tree as taxonomic graphs.  One example of a poorly sampled group is the decapods (crabs, crayfish, lobsters, prawns, and shrimp).  Dr. Keith Crandall has studied decapods most of his career and his phylogenies generate a well-sampled backbone, but each higher taxon is represented by few species.  Many researchers want to use the tree for some downstream analysis that benefits from sampling all species; therefore, at this stage of the project one must ask, “How can I obtain a phylogeny of all species for my favorite group, if the only thing available in Open Tree is a well-resolved backbone, while lower taxonomic ranks are represented primarily by unresolved taxonomic graphs?”.

Recently, a paper was published in the journal Nature that may present a workaround for people who wish to obtain a mostly bifurcating comprehensive phylogeny, although only a bifurcating backbone is available on OpenTree.  The published study by Jetz et al. (2013) aimed to use a phylogeny of birds to explore changes in speciation and extinction rate through time, while also mapping all bird diversity, to gain insight into bird evolution.  In order to explore these characteristics of bird evolution, the authors first needed a phylogeny of birds that included all species.  However, no such phylogeny has ever been published and the most comprehensive bird phylogenies available at the time of the study did not contain all species for each crown clade.  Their solution to generating a phylogeny of all birds began first by assigning each avian genus to a crown clade represented in the backbone phylogenies.  Next, sequence data for a set of loci for each species in a crown clade was downloaded from public databases and the phylogeny was estimated using Bayesian inference.  Since the crown clades of the backbone tree contain taxa also in the newly estimated crown phylogenies, the newly estimated crown phylogenies were sub-sampled with the backbone phylogenies to generate a pseudo-posterior distribution of complete avian phylogenies, which was used to depict the avian phylogeny with all species for downstream analyses.

As the organismal labs continue to track down studies and wait for requested published phylogenies, a method such as this may be a temporary solution to obtain mostly bifurcating phylogenies for lineages not well-represented by source trees. Furthermore, variations of this theme could also be used. For example, one could estimate a single tree for each crown clade and merge each tree with the Open Tree phylogeny that has a well-resolved backbone that has unresolved recent clades, using Open Tree Software, and ultimately create a synthetic tree for your favorite group.

These are a couple of potential methods to generate comprehensive phylogenies using the Open Tree for poorly resolved lineages represented only by taxonomy and we look forward to new ideas other researchers offer once the tree becomes public.

Keith Crandall is a professor and director at the George Washington University Institute of Computational Biology.

Chris Owen is a post-doctoral researcher for the AVAToL grant at George Washington University.

Recommending CC0 for GBIF data

GBIF (Global Biodiversity Information Facility) recently issued a request for comment on its data licensing policy. While Open Tree of LIfe does not currently use specimen data, we do use the GBIF classification in order to help resolve names and also as part of the opentree backbone. Jonathan Rees, Karen Cranston, Todd Vision and Hilmar Lapp wrote a response recommending a CC0 waiver for all GBIF data. Here is our summary, and a link to the full response on Figshare.


As a data aggregator, the goal of GBIF should be to find policies that benefit both its data providers and data reusers. Clearly, a GBIF that has no or few data will have little value, but so will a GBIF full of data that is encumbered with restrictions to an extent that stifles reuse.  Our response follows from the proposition that promoting data reuse should be a shared interest of all the parties: data providers, data users, and GBIF itself. We feel the consultation document missed the opportunity to recognize this shared interest, and that furthering the goal of data reuse should in fact be a primary yardstick by which different licensing options are measured.

Tracking the reuse of data is a critically important goal, as it provides a means of reward to data providers, allows scrutiny of derived results, and enables discovery of related research. Initiatives such as DataCite have have made considerable progress in recent years in enabling tracking of data reuse by addressing sociotechnical obstacles to tracking data reuse. By contrast, the consultation, in our view, puts undue weight on legal requirements for attribution. Legal instruments such as licenses are unsuitable, not designed for, and of little if any benefit for this purpose. Moreover, in most of the world, there is little to no formally recognized intellectual property protection for data, and it is on such protection that licenses rest.

In short, our recommendations are (1) that all data in GBIF be released under Creative Commons Zero (CC0), which is a public domain dedication that waives copyright rather than asserting it; (2) GBIF should set clear expectations in the form of community norms for how the data that it serves is to be referenced when reused, and (3) GBIF should work with partner organizations in promoting standards and technologies that enable the effective tracking of data reuse.

We note that our analysis is based on our understanding of the law; we are not legal professionals and this is not legal advice.

Full response

Response to GBIF request for consultation on data licenses. Karen Cranston, Todd Vision, Hilmar Lapp, Jonathan Rees. figshare.

The Soltis lab fills the gaps in green plant phylogeny for the Open Tree of Life

Phylogenetic tree summarizing relationships among major lineages of green plants (Viridiplantae)

Phylogenetic tree summarizing relationships among major lineages of green plants (Viridiplantae)

In the Soltis lab at the University of Florida, Bryan Drew and Jiabin Deng have spent much of the past year collecting trees and alignments of green plants (Viridiplantae) as part of an effort to produce a synthetic tree that represents all of the described organisms on Earth. As part of the tree-gathering process, they have gleaned public database archives and contacted corresponding authors directly to request data. Although these methods were not as successful as had been hoped, they recovered trees from over 1000 publications involving green plants.

As might be expected, some areas of the green plant tree are better resolved than others. For example, within gymnosperms and flowering plants we have authorsubmitted trees that support the monophyly of most major lineages, but for other major lineages of green plants, such as green algae and bryophytes, sampling is not as complete and those parts of the tree are not as well resolved. Fortunately, for green algae at least, help is on the way in the form of the NSF funded “Assembling the Green Algae Tree of Life” project. Although results from this project will not be incorporated into the upcoming Open Tree of Life “Big Bang Tree”, within a few years the green algae portion of the Open Tree will undoubtedly greatly benefit by inclusion of trees from the Green Algae Tree of Life project. Other parts of the green plant tree are shaping up nicely, and the Soltis lab is sending out some last minute requests to authors in an attempt to shore up regions of the tree that are presently underrepresented.

Here we provide a basic summary of what we know about green plant phylogeny, stressing that there is much we still do not know about relationships in this large clade of perhaps 500,000 species. We know from the fossil record that many green plant taxa have gone extinct; these extinctions contribute to “long branches” in the Tree of Life and can make it very difficult to determine relationships between older lineages. In the green plant tree, two main clades have been recovered, the Chlorophyta and the Streptophyta. The chlorophytes contain most of what is traditionally known as green algae, while the streptophytes contain the remaining green algae as well as land plants (Embryophyta). One of the many insights provided by molecular systematics during the past twenty years is that “green algae” as long recognized are not actually a natural group (i.e., they are not monophyletic), and that some traditionally classified “green algae” are actually more closely related to land plants. However, the closest “green algal” relative of land plants remains unclear—some studies suggest Charales whereas others indicate Zygnemetales or Coleochaetales The land plants (embryophytes) include bryophytes (mosses, hornworts, and liverworts) and vascular plants (tracheophytes). There is still some question as to whether the bryophytes are a natural group or comprise separate evolutionary lineages. The vascular plants are comprised of lycophytes (clubmosses and quillworts), monilophytes (e.g., ferns and horsetails), gymnosperms (cycads, Ginkgo, gnetophytes, and conifers), and angiosperms (flowering plants).

Though the relationships of come large clades are uncertain, these uncertainties will be shown in the Big Bang tree given that we possess many of the trees that highlight these different clade placements. In other areas of the green plant tree we are sorely lacking data, and the Soltis lab (in close collaboration with Stephen Smith’s lab at the University of Michigan) is still working hard to fill in the tens of thousands of holes in the tree that remain. This is a beautiful part of the Open Tree of Life: as with the organisms that it represents, the tree is ever growing!

Doug Soltis is a distinguished professor at the University of Florida.