Effects of RNA editing and gene processing on phylogenetic reconstruction

Abstract
RNA editing is a ubiquitous phenomenon affecting most mitochondrial and chloroplast, and some nuclear genomes, where mutations in genomic DNA are "corrected" in the mRNA during transcriptional processing. Most editing in plants and animals corrects T-to-C substitutions at nonsynonymous first or second base positions, and the overall effect is an mRNA and protein sequence that differs from that predicted by the DNA. It has been suggested that genomic sequences that undergo editing should not be used in phylogenetics. We contend that editing will have little or no effect on DNA-based phylogenetic reconstruction because it is an intrinsic transcriptional process that does not affect the historical information in the DNA sequence. The only effect of editing on protein-coding DNA should be an increase in the rate of T-to-C transitions. Here we test the effects of RNA editing on phylogenetic reconstruction, using two data sets with high levels of editing, plant coxII and coxIII. Even with high levels of editing, phylogenies based on DNA and edited mRNA are virtually identical. The two types of sequences should not be used in the same analysis, however, because the particular forms of the gene will tend to group together. We also examine the effects of processed paralogs--a term proposed for mRNA sequences that are reverse transcribed and reinserted into the genome as intact gene sequences. Processed paralogs result in a distinct and under-appreciated source of conflict among gene trees because of RNA editing. Analyses with unidentified processed paralogs may yield incorrect phylogenies, and the sequences may evolve at different rates if the gene has been transferred from one genetic compartment (nuclear, mitochondrial, chloroplast) to another. Although RNA editing itself is not a problem in phylogenetic reconstruction, analyses should not combine mRNAs with DNAs, and processed paralogs should be either excluded or analyzed with caution.