Evolution and Selection in Yeast Promoters: Analyzing the Combined Effect of Diverse Transcription Factor Binding Sites

Abstract
In comparative genomics one analyzes jointly evolutionarily related species in order to identify conserved and diverged sequences and to infer their function. While such studies enabled the detection of conserved sequences in large genomes, the evolutionary dynamics of regulatory regions as a whole remain poorly understood. Here we present a probabilistic model for the evolution of promoter regions in yeast, combining the effects of regulatory interactions of many different transcription factors. The model expresses explicitly the selection forces acting on transcription factor binding sites in the context of a dynamic evolutionary process. We develop algorithms to compute likelihood and to learn de novo collections of transcription factor binding motifs and their selection parameters from alignments. Using the new techniques, we examine the evolutionary dynamics in Saccharomyces species promoters. Analyses of an evolutionary model constructed using all known transcription factor binding motifs and of a model learned from the data automatically reveal relatively weak selection on most binding sites. Moreover, according to our estimates, strong binding sites are constraining only a fraction of the yeast promoter sequence that is under selection. Our study demonstrates how complex evolutionary dynamics in noncoding regions emerges from formalization of the evolutionary consequences of known regulatory mechanisms. Cells use sophisticated regulation to transform static genomic information into flexible function. We are still far from understanding how such regulation evolves. Short DNA sequences that physically bind transcription factors in promoter areas near target genes play an important role in gene regulation and are directly subject to mutation and selection. In this work, we develop a methodology for studying the evolution of promoter sequences under the effect of multiple regulatory interactions. We present a model that describes the evolutionary process at each genomic locus, taking into account a random flux of mutations that occur in it and the effects of transcription factor binding sites gain or loss. Our model accounts for dependencies (or epistasis) between adjacent loci that contribute to the same regulatory interactions: mutation in one such locus immediately changes the effect of mutations in the other. Using our model, we characterize the evolution of promoters in yeast, showing that many regulatory interactions that were discovered experimentally or computationally are evolutionarily unstable. The dynamic nature of transcriptional interactions may be explained if the regulatory phenotype is achieved through multiple interactions at different levels of specificity, and if only relatively few of these interactions are essential for themselves.