Evolutionary and Expression Signatures of Pseudogenes in Arabidopsis and Rice

Abstract
Pseudogenes (Ψ) are nonfunctional genomic sequences resembling functional genes. Knowledge of Ψs can improve genome annotation and our understanding of genome evolution. However, there has been relatively little systemic study of Ψs in plants. In this study, we characterized the evolution and expression patterns of Ψs in Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa). In contrast to animal Ψs, many plant Ψs experienced much stronger purifying selection. In addition, plant Ψs experiencing stronger selective constraints tend to be derived from relatively ancient duplicates, suggesting that they were functional for a relatively long time but became Ψs recently. Interestingly, the regions 5′ to the first stops in the Ψs have experienced stronger selective constraints compared with 3′ regions, suggesting that the 5′ regions were functional for a longer period of time after the premature stops appeared. We found that few Ψs have expression evidence, and their expression levels tend to be lower compared with annotated genes. Furthermore, Ψs with expressed sequence tags tend to be derived from relatively recent duplication events, indicating that Ψ expression may be due to insufficient time for complete degeneration of regulatory signals. Finally, larger protein domain families have significantly more Ψs in general. However, while families involved in environmental stress responses have a significant excess of Ψs, transcription factors and receptor-like kinases have lower than expected numbers of Ψs, consistent with their elevated retention rate in plant genomes. Our findings illustrate peculiar properties of plant Ψs, providing additional insight into the evolution of duplicate genes and benefiting future genome annotation.