Centromeric retrotransposon lineages predate the maize/rice divergence and differ in abundance and activity

Abstract
Centromeric retrotransposons (CR) are located almost exclusively at the centromeres of plant chromosomes. Analysis of the emerging Zea mays inbred B73 genome sequence revealed two novel subfamilies of CR elements of maize (CRM), bringing the total number of known CRM subfamilies to four. Orthologous subfamilies of each of these CRM subfamilies were discovered in the rice lineage, and the orthologous relationships were demonstrated with extensive phylogenetic analyses. The much higher number of CRs in maize versus Oryza sativa is due primarily to the recent expansion of the CRM1 subfamily in maize. At least one incomplete copy of a CRM1 homolog was found in O. sativa ssp. indica and O. officinalis, but no member of this subfamily could be detected in the finished O. sativa ssp. japonica genome, implying loss of this prolific subfamily in that subspecies. CRM2 and CRM3, as well as the corresponding rice subfamilies, have been recently active but are present in low numbers. CRM3 is a full-length element related to the non-autonomous CentA, which is the first described CRM. The oldest subfamily (CRM4), as well as its rice counterpart, appears to contain only inactive members that are not located in currently active centromeres. The abundance of active CR elements is correlated with chromosome size in the three plant genomes for which high quality genomic sequence is available, and the emerging picture of CR elements is one in which different subfamilies are active at different evolutionary times. We propose a model by which CR elements might influence chromosome and genome size.