Connecting millions of pieces

Creating the entire tree of life is like completing a jigsaw puzzle with more than two million pieces. And to make it even harder; the illustration of how the solved puzzle would look like is missing.

No one knows precisely how all pieces are related.

This disparity is unmistakably demonstrated by disagreements between evolutionary biologists about how certain species and branches are linked together. Throughout the years they have created a large variety of trees with specific groups of species that contradict each other. For example, one researcher maintains that species A is the closest living relative of species B, but another scientist thinks that species C is actually most closely related to B.

Spotting conflicting data

This has huge consequences for the builders of the Open Tree of Life database. They not only need to create software to store ten-thousands of individual trees to create the entire tree with all known species, but the programmers also need to find a way to allow scientists to upload many more trees, even when they are overlapping each other with contradictory data. Additionally, the system needs to recognize those conflicts and warn the researchers promptly when there is such data controversy.

“It will give us a lot of headaches”

But those challenges are not necessarily perceived as all negative by the project team. Sure, it leads to ample debate during many online and offline brainstorm sessions. And there is definitely a lot of trial and error involved in revising the computer codes. (Or as Mark Holder, one of the investigators, simply describes: “it gives us a lot of headaches.”) But on a more positive note, the Open Tree of Life team will accomplish something that no one has ever done on such a massive scale: providing an overview of all the conflicts and contradictory evidence related to specific branches for which there is no consensus on the evolutionary relations between the species.

“There are databases with a long list of individual trees, but they are not dynamic sources. You have to manually discover that there is a conflict between two sets of data. That takes a lot of time, especially when you have many more datasets to compare,” explains Holder, who is a professor of statistical phylogenetics at Kansas University. “Instead, we want to create a system where it is easy to identify conflicts so that the community can spend most of their time on finding solutions instead.”

Reanalyzing the data

Stephen Smith, an evolutionary biologist at the University of Michigan, is spearheading the programming efforts to build the software tools that should allow researchers to assemble and analyze the tree of life. He is very excited about the additional feature to flag all those contradictions. “There are a lot of benefits for doing that, as it makes the data more meaningful for researchers. For example, it could be that fifty trees in the database support Hypothesis A, which states that there is a certain relationship between two species, but that two or three other studies support Hypothesis B, which states that there is not such relationship. For phylogenists that is meaningful to know.”

“We want to a system that makes it
easy to identify conflicts”

Hence, the system will provide a comprehensive view of what has been submitted or contributed up to that date about specific areas of the tree of life. Those outlines then can be used as a starting point for further research as scientists can easily access the original data for all conflicting trees. Those are part of the entire database that Smith and his colleagues are creating. Scholars have the ability to reanalyze the datasets to solve any conflict, if they wish. Moreover, it is possible to add additional hypotheses when they insert new sequencing data and compare those findings with the earlier assessments of how certain species are possibly related to each other.

Eventually, the software allows researchers to connect more and more pieces of this jigsaw puzzle that is the Open Tree of Life. “We provide a basic draft tree with certain synthetic estimates and then it is up to experts of specific species to examine what will be considered the best fit for all those parts of the tree,” says Holder. “This way, the entire community keeps updating the individual trees to give us a more complete picture of what the entire tree looks like.”

2 responses

  1. Pingback: Small portion of phylogenetic data is stored publicly « OPEN TREE OF LIFE

  2. Hello, i feel that i noticed you visited my site thus i came to go back the desire?
    .I am trying to find things to improve my web site!I suppose its ok to make use of
    a few of your concepts!!


    May 2, 2013 at 1:45 pm