Estimating abundances of retroviral insertion sites from DNA fragment length data
Open Access
- 11 January 2012
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 28 (6), 755-762
- https://doi.org/10.1093/bioinformatics/bts004
Abstract
Motivation: The relative abundance of retroviral insertions in a host genome is important in understanding the persistence and pathogenesis of both natural retroviral infections and retroviral gene therapy vectors. It could be estimated from a sample of cells if only the host genomic sites of retroviral insertions could be directly counted. When host genomic DNA is randomly broken via sonication and then amplified, amplicons of varying lengths are produced. The number of unique lengths of amplicons of an insertion site tends to increase according to its abundance, providing a basis for estimating relative abundance. However, as abundance increases amplicons of the same length arise by chance leading to a non-linear relation between the number of unique lengths and relative abundance. The difficulty in calibrating this relation is compounded by sample-specific variations in the relative frequencies of clones of each length. Results: A likelihood function is proposed for the discrete lengths observed in each of a collection of insertion sites and is maximized with a hybrid expectation–maximization algorithm. Patient data illustrate the method and simulations show that relative abundance can be estimated with little bias, but that variation in highly abundant sites can be large. In replicated patient samples, variation exceeds what the model implies—requiring adjustment as in Efron (2004) or using jackknife standard errors. Consequently, it is advantageous to collect replicate samples to strengthen inferences about relative abundance. Availability: An R package implements the algorithm described here. It is available at http://soniclength.r-forge.r-project.org/ Contact:ccberry@ucsd.edu Supplementary information: Supplementary data are available at at Bioinformatics online.This publication has 26 references indexed in Scilit:
- The host genomic environment of the provirus determines the abundance of HTLV-1–infected T-cell clonesBlood, 2011
- A method to sequence and quantify DNA integration for monitoring outcome in gene therapyNucleic Acids Research, 2011
- Transfusion independence and HMGA2 activation after gene therapy of human β-thalassaemiaNature, 2010
- Efficacy of Gene Therapy for X-Linked Severe Combined ImmunodeficiencyThe New England Journal of Medicine, 2010
- Dynamics of gene-modified progenitor cells analyzed by tracking retroviral integration sites in a human SCID-X1 gene therapy trialBlood, 2010
- Insertional oncogenesis in 4 patients after retrovirus-mediated gene therapy of SCID-X1JCI Insight, 2008
- DNA bar coding and pyrosequencing to analyze adverse events in therapeutic gene transferNucleic Acids Research, 2008
- Vector integration is nonrandom and clustered and influences the fate of lymphopoiesis in SCID-X1 gene therapyJCI Insight, 2007
- HIV integration site selection: Analysis by massively parallel pyrosequencing reveals association with epigenetic modificationsGenome Research, 2007
- A Serious Adverse Event after Successful Gene Therapy for X-Linked Severe Combined ImmunodeficiencyThe New England Journal of Medicine, 2003