You don’t want to build a new tree from scratch?

‘Let the computer do the work’

Creating a phylogenetic tree is no easy task. It usually involves a complex synthesis of multiple datasets, but it leads to much satisfaction when all work is done—until new data come in.

Then, the process typically starts all over again: building a new tree from scratch.

Mark Holder, a professor of statistical phylogenetics at Kansas University and one of the investigators of the Open Tree of Life project, maintains that there is a real need for scientists to have access to digital tools that save them from doing quite a few labor-intensive procedures.

“In the past, researchers combined information from different trees and then analyzed the data. But they never made good computer systems that allowed for continuous updating. They would not be able to see how an entire tree would look like when they added more data or another individual tree. In that case, they had to start over.”

“Scientists in our field are continuously swamped by new data”

That is the main reason for why tree synthesis always has been a time-consuming job, because every tree is basically handmade. “Older tools just do not work as well. They can be used for visualizing trees, but they are optimized for about a hundred species in a tree. We are focusing on about two million of them,” says Holder. “Because scientists in our fields are continuously swamped by new data, we need to develop an interface where we can add new trees to the database and let the computer do the work to provide a cohesive, synthetic view of the updated tree.”

Efficient and meaningful

Stephen Smith, an evolutionary biology professor at the University of Michigan is working with several other researchers to develop the programs and methods for bringing together individual trees that are used to construct the entire tree of life. Scientists can then study relationships between species, and synthesize all relevant information in the database. “We are currently building the back-end of the Open Tree of Life. We need to create software that allows us to put all our information in a graph network, so that we can easily retrieve the information that researchers are specifically looking for in the database.”

That graph network is constructed in almost the same way as how popular social media networks were built, such as Facebook and Twitter, where millions of people are linked to each other. Instead of connecting personal accounts to the ones of other Facebook friends or Twitter followers, the Open Tree of Life network links species based on their evolutionary relationships.

“There will be a full-running system before the summer”

“With such a large amount of data it is important for social media companies to learn how those networks can retrieve information swiftly,” explains Smith. “We are basically doing the same thing, trying to connect the dots in a meaningful way and as efficient as possible for the researchers that will be using the software.”

Smith and his colleagues have set August of next year as the deadline for presenting the first draft of the Open Tree of Life. But the team needs to have the software running much earlier to be able to test it. “We need a working back-end ready in a couple months, because we want a full-running system before the summer. That is completely possible and we are working hard to get much done quickly so we can hit that later deadline. If we have a relatively well-working back-end around new years, then we can do a lot of bug testing to get it all worked out before we are supposed to be ready.”

Photo credit: Dirk Schumacher

2 responses

  1. Pingback: Small portion of phylogenetic data is stored publicly « OPEN TREE OF LIFE

  2. Pingback: Free webinar: Putting all species in a graph database | OPEN TREE OF LIFE