Conserved noncoding sequences provide insights into regulatory sequence and loss of gene expression in maize

Abstract
Thousands of species will be sequenced in the next few years; however, understanding how their genomes work without an unlimited budget requires both molecular and novel evolutionary approaches. We developed a sensitive sequence alignment pipeline to identify conserved noncoding sequences (CNSs) in the Andropogoneae tribe (multiple crop species descended from a common ancestor ~18 million years ago). The Andropogoneae share similar physiology while being tremendously genomically diverse, harboring a broad range of ploidy levels, structural variation, and transposons. These contribute to the potential of Andropogoneae as a powerful system for studying CNSs and are factors we leverage to understand the function of maize CNSs. We found that 86% of CNSs were comprised of annotated features, including introns, UTRs, putative cis-regulatory elements, chromatin loop anchors, noncoding RNA genes, and several transposable element superfamilies. CNSs were enriched in active regions of DNA replication in the early S phase of the mitotic cell cycle and showed different DNA methylation ratios compared to the genome-wide background. More than half of putative cis-regulatory sequences (identified via other methods) overlapped with CNSs detected in this study. Variants in CNSs were associated with gene expression levels, and CNS absence contributed to loss of gene expression. Furthermore, the evolution of CNSs was associated with the functional diversification of duplicated genes in the context of maize subgenomes. Our results provide a quantitative understanding of the molecular processes governing the evolution of CNSs in maize.
Funding Information
  • Germplasm Repository Information Network
  • Extreme Science and Engineering Discovery Environment
  • National Science Foundation (#ACI-1548562)
  • U.S. Department of Agriculture–Agricultural Research Service (USDA-ARS) and National Science Foundation (#1822330)
  • USDA-ARS and National Science Foundation (#1822330)

This publication has 136 references indexed in Scilit: