Is it a plant? Or is it a monkey?
It should not be hard to recognize the differences between furry night monkeys and the bright yellow flowers of golden peas. But they have something peculiar in common that leads to some confusion once in while: their name. Both genera are officially known as Aotus.
There are about two million known species on the planet, so it should not come to a surprise that scientists accidentally have given certain species, or groups of species, similar names. For instance, Proboscidea is considered an order of elephants, but it is also the name for the genus of devil’s claws. Other examples include Myrmecia pyriformis (insect and green algae), Ficus elegans (mollusc and plant), Ormosia nobilis (insect and plant), and Trigonidium grande (orchid and katydid).
“That has historically been a problem,” says Laura Katz, a professor of biological sciences at Smith College, who is leading an effort to create a single list with all species for the Open Tree of Life database. “Someone in Europe discovers a new insect species and gives it a name, while someone else in United States wants to label a group of bacteria and gives it a similar name. That is often hard to avoid, especially in times when the Internet was not around yet.”
Those overlapping names generally do not cause any confusion in phylogenetic research, because night monkey experts would not mistake those animals for plants. However, it becomes an issue when you create a database with all known species, because the computer systems must be programmed in such way that it points users to the proper records when they look for Anthrax (the bombyliid flies) and not Anthrax (the bacterium), or any other genus.
The software that is currently being developed should enable users to find information about one or more species instantly and, evenly important, to leave the millions of other species out of the search results. Stephen Smith, an assistant professor of evolutionary biology at the University of Michigan, is one of the technology designers for the Open Tree of Life project. Multiple species with the same name and species that have evolved partly through lateral-gene transfer are causing some of the many difficulties for developing an efficient search engine for the tree of life.
“We really need a functioning taxonomy. It may sound trivial, but you actually want all of the individual trees consistent with the terms that are being used. It needs to be clear what are considered bacteria and what are not, especially when you are dealing with about two million species. We really try to avoid entering data with multiple meanings, because that eventually leads to lots of problems.”
Making a list…
The goal is to produce a tree structure that eventually can encompass all life forms. This includes the life forms characterized thoroughly with both morphological and genetic data, species for which there are only genetic data, as well as species from the early years of systematics where the only information is a physical description or a drawing. Complete molecular data only exist for less than a quarter of a million of all known species. And 90 percent of them have not been sampled with more modern techniques at all.
“The value of our effort is to put a list together that allows for phylogenetic synthesis of all the kinds of data that are available, whether it is a description from 1880 about a microorganism that was studied with a crappy microscope or high-tech molecular sequencing performed today. Right now, there is no resource to get all these data in one computer-readable format,” explains Katz.
“We create a mechanism for the community to make the tree better”
There are some other big challenges to create such comprehensive list besides the homonyms. For instance, scientists from all over the world are not using one uniform label system for all newly discovered species. Naming codes are different for plants, animals, microbes, and other forms of life. This creates some confusion as codes have been used in different ways.
Naming all organisms in a consistent way that can be understood by all users depends on a number of factors: what family the organism comes from, what naming system is used, and what data are available about the organism, to name just a few. One research team might store their data under one scientific name, while another might use a completely different one.
“We really have to deal with all the chaos. Otherwise, we could get the synthesis wrong at the end. Our aim is to allow for a plurality of approaches. We want to try and put all species in context. So that means that we need to disambiguate names,” she maintains.
… and checking it twice (or more often)
The Open Tree of Life team is making considerable progress generating a complete list, according to Katz. “We have captured roughly 1.9 million species and we are adding another 200,000 species, right now. More will follow soon. It is now the only list available with this many species. That is the good news.”
That does not mean that she is satisfied yet with the quality of the list, as many names still need to be standardized. Much additional work is ahead to create order in the colossal taxonomy maze that is caused by the different naming customs and practices that have been evolved for hundreds of years. “Actually, right now, the list is bad. It is awful,” says Katz laughing, poking fun at the long way she and her colleagues still have to go to eventually create their envisioned tree of life. “We are talking about millions of species, not just a few hundred. The scale of this project is massive. So it is only a start and we have a whole lot to do before we present a draft.”
Not only much work for the eleven Open Tree of Life investigators, but also work for the many scientists with an interest in taxonomy and phylogeny. Participation by researchers from all over the world are critical for success of the project.
Currently, scientists can submit their publications of favorite phylogenetic trees on the Open Tree of Life page on Mendeley, and they will be able to help with taxonomic issues as well when the first draft of the database is released next year. “We are trying really hard to continue with some refinement in the upcoming months, but then other researchers can help us cleaning up the data in their individual areas of expertise. We are creating a mechanism for the community to make the tree better, enabling anyone to contribute. That is our overall objective.”
(Rosemary Keane contributed to this article)
Aotus ericoides (Australia) by “Melburnian”
Aotus lemurinus zonalis (Panama) by “dsasso”