Prospects for Building the Tree of Life from Large Sequence Databases

12 November 2004

journal article
other
Published by American Association for the Advancement of Science (AAAS) in Science

Vol. 306 (5699), 1172-1174
https://doi.org/10.1126/science.1102036

Abstract

We assess the phylogenetic potential of ∼300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two “supermatrices” suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.

Keywords

This publication has 18 references indexed in Scilit:

Phylogenomics of Eukaryotes: Impact of Missing Data on Large Alignments
Molecular Biology and Evolution, 2004
Genome-scale approaches to resolving incongruence in molecular phylogenies
Nature, 2003
The challenge of constructing large phylogenetic trees
Trends in Plant Science, 2003
Obtaining Maximal Concatenated Phylogenetic Data Sets from Large Sequence Databases
Molecular Biology and Evolution, 2003
Extracting Species Trees From Complex Gene Trees: Reconciled Trees And Vertebrate Phylogeny
Molecular Phylogenetics and Evolution, 2000
A few logs suffice to build (almost) all trees (I)
Random Structures & Algorithms, 1999
Phylogenetic supertrees: Assembling the trees of life
Trends in Ecology & Evolution, 1998
Angiosperm Phylogeny Inferred from 18S Ribosomal DNA Sequences
Annals of the Missouri Botanical Garden, 1997
Inferring complex phytogenies
Nature, 1996
The guinea-pig is not a rodent
Nature, 1996

Cited by 211 articles