Comprehensive comparison of the interaction of the E2 master regulator with its cognate target DNA sites in 73 human papillomavirus types by sequence statistics

Abstract
Mucosal human papillomaviruses (HPVs) are etiological agents of oral, anal and genital cancer. Properties of high- and low-risk HPV types cannot be reduced to discrete molecular traits. The E2 protein regulates viral replication and transcription through a finely tuned interaction with four sites at the upstream regulatory region of the genome. A computational study of the E2–DNA interaction in all 73 types within the alpha papillomavirus genus, including all known mucosal types, indicates that E2 proteins have similar DNA discrimination properties. Differences in E2–DNA interaction among HPV types lie mostly in the target DNA sequence, as opposed to the amino acid sequence of the conserved DNA-binding alpha helix of E2. Sequence logos of natural and in vitro selected sites show an asymmetric pattern of conservation arising from indirect readout, and reveal evolutionary pressure for a putative methylation site. Based on DNA sequences only, we could predict differences in binding energies with a standard deviation of 0.64 kcal/mol. These energies cluster into six discrete affinity hierarchies and uncovered a fifth E2-binding site in the genome of six HPV types. Finally, certain distances between sites, affinity hierarchies and their eventual changes upon methylation, are statistically associated with high-risk types.