Wanted: All your favorite trees

With eleven investigators, the Open Tree of Life project is already a large-scale research endeavor. But that does not mean that they can add all 1.9 million known species to a database by themselves. In fact, they are looking for help.

A lot of help.

The main goal of the project is to merge all existing phylogenetic trees in one overarching tree of life. In the past few months, the researchers have been working on software applications to make it possible to store all known species and, more important, to specify how they are all linked to each other in evolutionary terms.

The group has already started some internal testing of the system. “Some of our labs are adding trees to give feedback to the members who are creating the computer systems, so we can see what works well and what needs some adjustment. We will work on it until the end of next month and then publicize a data curation sprint, so that other people can submit trees and provide immediate feedback about the software,” says Karen Cranston, the principal investigator of the project.

She is pleased with the progress made during the first few months. “We already have a taxonomic backbone in place, but we continue with some minor updating along the way. We want to make sure that we know exactly what happens with the system when we add phylogenetic trees and other data before we present this to the entire community.”


So how can other researchers become involved with the project, right now?

The first step is to put citations on the project’s Mendeley page with information about the availability of digital data for phylogentic trees. Scientists can create an account on Mendeley, join the Open Tree of Life group on that website, and use the web importer to create a personal library. Then it is possible to simply drag a paper from the personal library into the Open Tree of Life group library. (Select the checkboxes next to the papers in your library and choose ‘OpenTree’ from the ‘Add selected documents to…’ pulldown list.)

There are already 150 papers in the group’s Mendeley database with information about phylogentic trees of flagellated fungi (Chytridiomycota), rhynchosaur, excavata, and mimetic butterflies, among many other species and branches of life. Cranston wants to hear from scientists in the phylogenetics community about what trees they think are most important to include in the beginning phase of the project. “There are ten-thousands of trees out there, but we cannot include them all immediately. So that is why we need to know what people consider the most important trees in their specific fields. And then we, together with the community, can add all the other ones over time as well.”

Digital data

The database only publishes phylogenies for which the data (tree and alignment files) are available. Scientists can use tags to track data availability when they upload the papers to the Open Tree of Life library. For instance, with the tag treebase it is possible to indicate that the data are located in TreeBASE and the tag datadryad when the data are available in Dryad.

As the project’s database consists of digital records, it means that not every tree that has been published in the past decades can easily get a place in the Open Tree of Life. This is because a lot of the generated data has not been stored or made accessible to the public.

“Some of the data are really lost. It could be that someone’s hard drive crashed or some other accident. But in most cases the data were not stored in the first place,” Cranston explains. “Researchers were not thinking about that back in the day. It was standard to publish only a picture of the tree. The audience studied the illustrations to come up with their own follow-up research. Access to the digital data did not matter to them.”

However, nowadays, a new generation of researchers depends on those large datasets to do their computational analyses of phylogenetic trees and taxonomies. “Only with such data, they can do their estimations,” she says. “Therefore, the Open Tree of Life project is not only intended to build the database, but also to raise awareness within the scientific community about the benefits of sharing data of all newly-developed phylogentic trees. And, of course, we hope that researchers will add their data to the system we are creating right now, so that the tree of life eventually keeps growing.”

2 responses

  1. This is a very good write up thank you for the info-Thank you for sharing.


    September 26, 2012 at 10:05 am

  2. Pingback: Connecting millions of data points in a graph database | OPEN TREE OF LIFE