Open Data in the Open Tree of Life
Making valuable research data available to others in the scientific community is at the heart of open science, an idea very central to the Open Tree of Life project. Through collaboration and the sharing of information, the goal of the Open Tree of Life is to take the discoveries about the phylogenetics of all life and make them easily accessible to everyone.
With 1.9 million of species described, and with thousands more being discovered and named each year, there is no shortage of new research being done in phylogenetics, or the relationships between species, genera, and families. What there is a shortage of, however, is digital data that is provided with these findings – data that can be used in projects like the Open Tree of Life.
Why is this? Despite decades of funding towards this type of research, a huge amount of our knowledge isn’t available in ways that are reusable. This lack of data availability is due to several factors: the data used to construct the Tree of Life have not always been provided when scholarly articles have been published, or they have been stored in a way that isn’t easily accessed, manipulated, or maintained.
This by no means implies that the researchers and scientists making these findings were hiding their data – currently researchers regularly deposit genetic sequences and other datasets into online programs like GenBank or TreeBase. However, even these sophisticated resources do not integrate the data into a comprehensive picture of the Tree of Life, and many discoveries about relationships are represented only as figures in scientific journals, which are not easily computed or combined.
There was also a question of the target audiences, and of how people were trying to use the data. Typically, the audience for these species discoveries was taxonomists – accomplished specialists in their own right – but with very different uses for the information than the Open Tree of Life. There was not yet a large enough demand for data to be stored in open digital forms, and it became the discipline’s norm (as with many other disciplines).
The goal of the Open Tree of Life is to take these phylogenetic discoveries, these newly described species and phylogenetic relationship data and put them together into a draft tree. The draft tree will provide a framework to illustrate the science in such a way that the relationships can be clearly accessed and understood by a variety of users, as well as easily contributed to and edited by the scientists with species-specific expertise.
Working on four exemplary clades for the draft tree are the research teams of Doug Soltis (green plants), Keith Crandall (arthropods), David Hibbett (Fungi), and Laura Katz (Amoebozoa). Through their detailed knowledge of the respective clades, the four teams will provide the basic backbone of how genera and species will be represented in the tree, and how those relationships are explained. Work in these groups will provide a model for how the Tree of Life can be developed by the broad community of phylogenetic scientists, with expertise in all groups of organisms.
As the draft tree takes shape and more research information is shared because of open science, The Open Tree will become more comprehensive. Karen Cranston, the lead principal investigators of the project, described how important open science was to the Open Tree of Life. “We’re focusing more on changing behaviors – on getting scientists and researchers to share their findings. If we don’t publish our data, we are in danger of causing other people to redo our research, and in this era of very tight funding, that’s very inefficient.”
“We’re a large scale downstream user of phylogenetic and biodiversity data, and some of the information and resources that we need just aren’t available yet.” As more scientific data is made available to the Open Tree of Life, hopefully that will change. Ideally, the Open Tree of Life will become a major place where the information is synthesized and shared – and that is a truly ‘open’ idea.