Comparison of multiple vertebrate genomes reveals the birth and evolution of human exons

Abstract
Orthologous gene structures in eight vertebrate species were compared on a genomic scale to detect the birth and maturation of new internal exons during the course of evolution. We found that 40% of new human exons are alternatively spliced, and most of these are cassette exons (exons that are either included or skipped in their entirety) with low inclusion rates. This proportion decreases steadily as older and older exons are examined, even as splicing efficiency increases. Remarkably, the great majority of new cassette exons are composed of highly repeated sequences, especially Alu. Many new cassette exons are 5' untranslated exons; the proportion that code for protein increases steadily with age. New protein-coding exons evolve at a high rate, as evidenced by the initially high substitution rates (K(s) and K(a)), as well as the SNP density compared with older exons. This dynamic picture suggests that de novo recruitment rather than shuffling is the major route by which exons are added to genes, and that species-specific repeats could play a significant role in recent evolution.