Gene deletions causing human genetic disease: mechanisms of mutagenesis and the role of the local DNA sequence environment

Abstract
Reports describing short (< 20 bp) gene deletions causing human genetic disease were collated in order to study underlying causative mechanisms. Deletion break-point junction regions were found to be non-random both at the nucleotide and dinucleotide sequence levels, an observation consistent with an endogenous sequencedirected mechanism of mutagenesis. Direct repeats of between 2 bp and 8 bp were found in the immediate vicinity of all but one of the 60 deletions analysed. Direct repeats are a feature of a number of recombination, replication or repair-based models of deletion mutagenesis and the possible contribution of each to the spectrum of mutations examined was assessed. The influence of parameters such as repeat length and lenght of DNA between repeats was studied in relation to the frequency, location and extent of these deletions. Findings were broadly consistent with a slipped mispairing model but the predicted deletion of one whole repeat copy was found only rarely. A modified version of the slipped mispairing hypothesis was therefore proposed and was shown to possess considerable explanatory value for ∼ 25% of deletions examined. Whereas the frequency of inverted repeats in the vicinity of gene deletions was not significantly elevated, these elements may nevertheless promote instability by facilitating the formation of secondary structure intermediates. A significant excess of symmetrical sequence elements was however found at sites of single base deletions. A new model to explain the involvement of symmetric elements in frameshift mutagenesis was devised, which successfully accounted for a majority of the single base deletions examined. In general, the loss of one or a few base pairs of DNA was found to be more compatible with a replication-based model of mutagenesis than with a recombination or repair hypothesis. Seven hitherto unrecognized hotspots for deletion were noted in five genes (AT3, F8, HBA, HBB and HPRT). Considerable sequence homology was found between these different sites, and a consensus sequence (TGA/GA/GG/ TA/C) was drawn up. Sequences fitting this consensus (i) were noted in the immediate vicinity of 41% of the other (sporadic) gene deletions, (ii) were found frequently at sites of spontaneous deletion in the hamster APRT gene, (iii) were found to be associated with many larger human gene deletions/translocations, (iv) act as arrest sites for human polymerase a during DNA replication and (v) have been shown by in vitro studies of human polymerase a to be especially prone to frameshift mutation. It is proposed that dissociation of polymerase a at arrest sites may, by providing a stable single stranded substrate, lead to deletion of a DNA sequence either by slipped mispairing via a number of different secondary structure intermediates, or by strand-switching or base misincorporation. Human gene deletions thus appear to be caused by multiple mechanisms whose relative importance is probably governed by local primary and secondary DNA structure. Our ability to predict precisely the location and extent of a gene deletion is however hampered both by this complexity and by the possibility that these mechanisms may often act in combination.