FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix

Top Cited Papers

Open Access

17 April 2009

journal article
Published by Oxford University Press (OUP) in Molecular Biology and Evolution

Vol. 26 (7), 1641-1650
https://doi.org/10.1093/molbev/msp077

Abstract

Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N2) space and O(N2L) time, but FastTree requires just O(NLa + N) memory and O(Nlog (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes–Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

Keywords

This publication has 33 references indexed in Scilit:

Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments
Science, 2007
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
Bioinformatics, 2006
Neighbor-Joining Revealed
Molecular Biology and Evolution, 2006
Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB
Applied and Environmental Microbiology, 2006
Pfam: clans, web tools and services
Nucleic Acids Research, 2006
Protein Molecular Function Prediction by Bayesian Phylogenomics
PLoS Computational Biology, 2005
Assessment of Protein Distance Measures and Tree-Building Methods for Phylogenetic Tree Reconstruction
Molecular Biology and Evolution, 2005
The MicrobesOnline Web site for comparative genomics
Genome Research, 2005
Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle
Journal of Computational Biology, 2002
Confidence Limits on Phylogenies: An Approach Using the Bootstrap
Evolution, 1985

Cited by 4051 articles