Open data

Why Do We Need Big Trees, Anyway?

An explicit goal of the Open Tree of Life is to create a single phylogenetic tree that encompasses all living (and some extinct) biodiversity on earth. A question some may have, especially non-scientists, is why do we need a tree like that, and what would we do with it? You can’t even see it all at once, right? The answer to this question, of course, is that with bigger and more resolved trees we can answer evolutionary questions on scales not previously possible.

Currently, postdocs from the labs of Doug Soltis (Univ. of Florida) and Stephen Smith (Univ. of Michigan) are collaborating on several projects within the plant world that leverage the power of big trees. Cody Hinchliff, a postdoc in the Smith lab, recently presented some of these findings during a standing room only presentation at the Botanical Society of America conference in Boise, Idaho, employing a tree with almost complete generic level sampling to unravel evolution and diversification of epiphytes across vascular plants. Perhaps most surprisingly, Hinchliff found that most epiphyte lineages are relatively young, suggesting that either the widespread success that epiphytes currently exhibit is a recent phenomenon, or that epiphytic lineages are relatively short lived and evolve opportunistically in response to large-scale climate fluctuations. This, and other associated findings, are novel and exciting discoveries, and are examples of the insights that can be gleaned by analyzing character data across a massive data set.

Other collaborative “big tree” projects involving the Soltis and Smith labs involve the evolution of the aquatic habit within land plants and the evolution of floral characters in the order Lamiales. These studies involve Hinchliff and Stephen Smith, Bryan Drew from the University of Nebraska at Kearney (formerly a postdoc with Doug Soltis) and Doug Soltis, and undergraduates from all three institutions. The aquatic evolution project is looking at how the re-colonization of aquatic plants is linked to lineage diversification and whether an aquatic habit is associated with other character or habitat traits. The focus of the Lamiales study is investigating what suites of floral characters may be responsible for the extraordinary evolutionary success of the lineage, which at 23,000 species comprise about 1/12th of all flowering plants.

The fact that studies of this magnitude are not only possible, but ongoing, is a testament to the utility of big trees. Because these trees are nearly complete in terms of genera, we can account for virtually all diversity across these clades. Sparse lineage sampling and hence unaccounted for diversity has previously been a hindrance when analyzing evolutionary trends that span the tree of life, but the time is approaching (or might be here already!) where the size of the phylogenies will not be the limiting factor in studying broad scale evolutionary questions. This exciting development leaves researchers more time to examine and ponder truly interesting questions that could not be addressed previously. This is the power that big trees give us, and this is one of the reasons we need big trees.

Chronogram showing epiphytic evolution within vascular plants. Epiphytic lineages are shown in orange, and likely branches of epiphytic origin are in red. Root of tree is ~485 million years old.

Chronogram showing epiphytic evolution within vascular plants. Epiphytic lineages are shown in orange, and likely branches of epiphytic origin are in red. Root of tree is ~485 million years old.

Doug Soltis is a distinguished professor at the University of Florida.
Bryan Drew was previously a post-doctoral researcher in the Soltis lab and is currently an assistant professor at the University of Nebraska-Kearney.


A push for fungal phylogenies in the Open Tree of Life

Screen Shot 2014-09-15 at 1.16.35 PMThe summer of 2014 was a busy one for the mycology group in the Open Tree of Life. Postdoctoral Fellow Romina Gazis gave presentations on the Open Tree of Life at the Annual Meeting of the Mycological Society of America (June 8-12, East Lansing, Michigan) and the International Mycological Congress (Aug. 3-8, Bangkok, Thailand). You can download the IMC presentation here.

Meanwhile, back in Worcester, we continued to compile published phylogenetic trees for incorporation into the Open Tree database. Our goal is to create a synthetic tree that represents, as closely as possible, our current understanding of the broad outlines of fungal phylogenetic relationships, based on molecular studies and taxonomy in Index Fungorum and other sources. We plan to use the tree as the centerpiece of a revision of “higher level” fungal taxonomy, updating a study that we published with seventy coauthors way back in 20071.

Dr. Romina Gazis is a postdoc at Clark University. Dr. Gazis specializes in systematics of endophytes, including symbionts of rubber trees (Hevea brasiliensis) and the newly-described class Xylonomycetes, and also works on phylogenies for the Open Tree of Life project.

Dr. Romina Gazis is a postdoc at Clark University. Dr. Gazis specializes in systematics of endophytes, including symbionts of rubber trees (Hevea brasiliensis) and the newly-described class Xylonomycetes, and also works on phylogenies for the Open Tree of Life project.

To this end, we reviewed the recent and not-so-recent fungal biology literature, emphasizing studies that made a major contribution to understanding of higher-level relationships. We thus identified 314 important studies that are a priority for inclusion in Open Tree of Life. The list of “critical” higher-level studies can be viewed here. Mycologists reading this blog post may wish to check our list of references, and let us know if we have missed anything! Please realize that at this point, we are prioritizing studies that resolve major clades, or that have particularly strong sampling of large groups.

Jiaqi Mei is an undergraduate research assistant at the Katz Lab at Smith College. Jiaqi has been working on gathering information on missing phylogenies for the Open Tree of Life project. Photo: Katz Lab

Having identified the critical higher-level analyses, our next job was to search for the phylogenies in TreeBase and upload them to Open Tree of Life via PhyloGrafter. We were assisted in this time-consuming work by Jiaqi Mei, an undergraduate from Laura Katz’s lab at Smith College who joined us for the summer. 119 of the 314 “higher level” studies (38%) had studies available in TreeBase or other sources. In contrast, Drew et al. (2013)2 found that only about 17% of published phylogenetic studies from all groups have available phylogenies . This evidently demonstrates that mycologists who look at “big picture” phylogenetic relationships are particularly conscientious about data deposition! Nonetheless, there were still many missing phylogenies, so Jiaqi and Romina initiated an e-mail campaign, reaching out to authors of the 195 critical higher-level studies for which we had no trees. We are very grateful to have received responses from almost 50 authors so far. If you are among those who replied to our plea for data, we want to take this opportunity to say Thank You! You should have received a note from us—if not, something may have been lost in transit—please write again!

Our immediate goal is to compile phylogenies that address higher-level relationships, but we are not neglecting fungal studies at low taxonomic levels. In fact, one of Jiaqi’s major tasks was to update our literature review of all fungal phylogenies, reviewing publications since the 2013 study of Drew et al.2, which included studies published up to 2012. Overall, we have identified 2314 fungal phylogenetic studies published since 2000 in 17 journals, of which 640 (28%) have associated treefiles.

It is hard to believe that the Open Tree of Life Project is already in its third year. Our major goal by the end of this academic year is to produce a synthetic phylogenetic tree that significantly updates the 2007 “AFTOL Classification”1 of Fungi, with direct connections to taxonomy and diverse phylogenetic studies. With the continued cooperation of the mycological community we are optimistic that we will reach this goal.

1Hibbett, D. S., M. Binder, J. F. Bischoff, M. Blackwell, P. F. Cannon, O. E. Eriksson, S. Huhndorf, T. James, P. M. Kirk, R. Lücking, T. Lumbsch, F. Lutzoni, P. B. Matheny, D. J. Mclaughlin, M. J. Powell, S. Redhead, C. L. Schoch, J. W. Spatafora, J. A. Stalpers, R. Vilgalys, M. C. Aime, A. Aptroot, R. Bauer, D. Begerow, G. L. Benny, L. A. Castlebury, P. W. Crous, Y.-C. Dai, W. Gams, D. M. Geiser, G. W. Griffith, C. Gueidan, D. L. Hawksworth, G. Hestmark, K. Hosaka, R. A. Humber, K. Hyde, J. E. Ironside, U. Kõljalg, C. P. Kurtzman, K.-H. Larsson, R. Lichtwardt, J. Longcore, J. Miądlikowska, A. Miller, J.-M. Moncalvo, S. Mozley-Standridge, F. Oberwinkler, E. Parmasto, V. Reeb, J. D. Rogers, C. Roux, L. Ryvarden, J. P. Sampaio, A. Schüßler, J. Sugiyama, R. G. Thorn, L. Tibell, W. A. Untereiner, C. Walker, Z. Wang, A. Weir, M. Weiß, M. M. White, K. Winka, Y.-J. Yao, N. Zhang. 2007. A higher-level phylogenetic classification of the Fungi. Mycological Research 111: 509-547. <>

2Drew, B.T., R. Gazis, P. Cabezas, K.S. Swithers, J. Deng, R. Rodriguez, L.A. Katz, K.A. Crandall, D.S. Hibbett, D.E. Soltis. 2013. Lost branches on the tree of life. PLOS Biology 11:e1001636.

David Hibbett is a professor of biology and PI of the Hibbett lab at Clark University.

Romina Gazis is a postdoc at Clark University. 


Free webinar: Putting all species in a graph database

Biology + Technology = OTOL

Neo4j screenshotOne of the developers of the Open Tree of Life demonstrates Thursday, during a free webinar, how graph databases are used to construct a tree of life. The lecture is organized by Neo Technology, which is the maker of Neo4j, an open-source database that is used for OTOL.

Stephen Smith, an ecology and evolutionary biology professor at the University of Michigan, is going to explain how Neo4j and other digital technologies are assisting in constructing the tree of life. Starting at 10:00 PDT (19:00 CEST), he will also discuss other aspects of the interface of biology with next generation technologies.

“Our project is building the tools with which scientists in the community can continually improve the tree of life as we gather new information. Neo4j allows us to not only store trees in their native graph form, but also allows us to map trees to the same structure, the graph. So in fact, we are facilitating the construction of the graph of life,” says Smith.

Neo4j approached the Open Tree of Life team to present a webinar because it is a project that utilizes the Neo4j graph database to represent the interconnectedness of biological data. The company considers the project a great example of how a graph database can better model the natural world.

The online lecture is intended for a broad audience including beginner computer programmers, advanced hackers, data scientists, natural scientists, and anyone interested in the cross-section of science and technology, especially data modeling. Over 150 people have already registered online.

The registration form: LINK

Update: The video from this webinar is available on vimeo:

Karen Cranston talks about open licensing and Dryad

Making scientific data available


In case you haven’t seen the video yet, Open Tree of Life investigator Karen Cranston talked about sharing research data in open access data repositories during the Creative Commons 10-year celebration in Raleigh, North Carolina, last December. She mainly focused on the use of CC0 for Dryad, which is a curated general-purpose repository that makes the data underlying scientific publications discoverable and freely reusable. Cranston also mentioned that data availability leads to more citations, which is highly valued in the academic community. You can access the presentation slides as well (link).