Distinct Gene Number-Genome Size Relationships for Eukaryotes and Non-Eukaryotes: Gene Content Estimation for Dinoflagellate Genomes
Open Access
- 14 September 2009
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 4 (9), e6978
- https://doi.org/10.1371/journal.pone.0006978
Abstract
The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log10-transformed protein-coding gene number (Y′) versus log10-transformed genome size (X′, genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y′ = ln(-46.200+22.678X′, whereas non-eukaryotes a linear model, Y′ = 0.045+0.977X′, both with high significance (p0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%–1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%–47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3×106 kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245×106 kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species.Keywords
This publication has 48 references indexed in Scilit:
- Cascades of convergent evolution: The corresponding evolutionary histories of euglenozoans and dinoflagellatesProceedings of the National Academy of Sciences of the United States of America, 2009
- Dinoflagellate Spliced Leader RNA Genes Display a Variety of Sequences and Genomic ArrangementsMolecular Biology and Evolution, 2009
- From Stop to Start: Tandem Gene Arrangement, Copy Number and Trans-Splicing Sites in the Dinoflagellate Amphidinium carteraePLOS ONE, 2008
- Concentration-dependent organization of DNA by the dinoflagellate histone-like protein HCc3Nucleic Acids Research, 2007
- Eukaryotic genome size databasesNucleic Acids Research, 2006
- Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraureliaNature, 2006
- Synergy between sequence and size in Large-scale genomicsNature Reviews Genetics, 2005
- The Genome of the Kinetoplastid Parasite, Leishmania majorScience, 2005
- The Genome of the African Trypanosome Trypanosoma bruceiScience, 2005
- Complex Protein Targeting to Dinoflagellate PlastidsJournal of Molecular Biology, 2005