GC Composition of the Human Genome: In Search of Isochores

Abstract
The isochore theory, proposed nearly three decades ago, depicts the mammalian genome as a mosaic of long, fairly homogeneous genomic regions that are characterized by their guanine and cytosine (GC) content. The human genome, for instance, was claimed to consist of five distinct isochore families: L1, L2, H1, H2, and H3, with GC contents of 52%, respectively. In this paper, we address the question of the validity of the isochore theory through a rigorous sequence-based analysis of the human genome. Toward this end, we adopt a set of six attributes that are generally claimed to characterize isochores and statistically test their veracity against the available draft sequence of the complete human genome. By the selection criteria used in this study: distinctiveness, homogeneity, and minimal length of 300 kb, we identify 1,857 genomic segments that warrant the label “isochore.” These putative isochores are nonuniformly scattered throughout the genome and cover about 41% of the human genome. We found that a four-family model of putative isochores is the most parsimonious multi-Gaussian model that can be fitted to the empirical data. These families, however, are GC poor, with mean GC contents of 35%, 38%, 41%, and 48% and do not resemble the five isochore families in the literature. Moreover, due to large overlaps among the families, it is impossible to classify genomic segments into isochore families reliably, according to compositional properties alone. These findings undermine the utility of the isochore theory and seem to indicate that the theory may have reached the limits of its usefulness as a description of genomic compositional structures.