With the soft release of the v 1.0 of the Open Tree of Life (see Karen Cranston’s Evolution talk for details) we also have methods for accessing the data:
* a not-very-pretty but functional page to download the enture 2.5 million tip tree as newick
* API access to subtrees and source trees as well as taxon name services
* clone the github repository of all input trees
A few folks have started to think about ways to interact with the very large newick file, specifically extracting subtrees. Yan Wong posted a perl solution a few weeks ago:
Michael Elliot has a C++ package called Gulo which seems to be very efficient (see comments on the post):
Thrilled to see people working with the data! I note that, despite having APIs to return a subtree or a pruned subtree, downloading all of the data and working with it remotely is still an easy and flexible option for many users. We will continue to make our datasets available, and that download page should have more options and tree metrics soon!