Genome measures used for quality control are dependent on gene function and ancestry
Open Access
- 8 October 2014
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 31 (3), 318-323
- https://doi.org/10.1093/bioinformatics/btu668
Abstract
Motivation: The transition/transversion (Ti/Tv) ratio and heterozygous/nonreference-homozygous (het/nonref-hom) ratio have been commonly computed in genetic studies as a quality control (QC) measurement. Additionally, these two ratios are helpful in our understanding of the patterns of DNA sequence evolution. Results: To thoroughly understand these two genomic measures, we performed a study using 1000 Genomes Project (1000G) released genotype data ( N = 1092). An additional two datasets ( N = 581 and N = 6) were used to validate our findings from the 1000G dataset. We compared the two ratios among continental ancestry, genome regions and gene functionality. We found that the Ti/Tv ratio can be used as a quality indicator for single nucleotide polymorphisms inferred from high-throughput sequencing data. The Ti/Tv ratio varies greatly by genome region and functionality, but not by ancestry. The het/nonref-hom ratio varies greatly by ancestry, but not by genome regions and functionality. Furthermore, extreme guanine + cytosine content (either high or low) is negatively associated with the Ti/Tv ratio magnitude. Thus, when performing QC assessment using these two measures, care must be taken to apply the correct thresholds based on ancestry and genome region. Failure to take these considerations into account at the QC stage will bias any following analysis. Contact:yan.guo@vanderbilt.edu Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 22 references indexed in Scilit:
- On the Immortality of Television Sets: "Function" in the Human Genome According to the Evolution-Free Gospel of ENCODEGenome Biology and Evolution, 2013
- An integrated map of genetic variation from 1,092 human genomesNature, 2012
- An integrated encyclopedia of DNA elements in the human genomeNature, 2012
- The use of next generation sequencing technology to study the effect of radiation therapy on mitochondrial DNA mutationMutation Research/Genetic Toxicology and Environmental Mutagenesis, 2012
- Summarizing and correcting the GC content bias in high-throughput sequencingNucleic Acids Research, 2012
- A map of human genome variation from population-scale sequencingNature, 2010
- ANNOVAR: functional annotation of genetic variants from high-throughput sequencing dataNucleic Acids Research, 2010
- Fast and accurate short read alignment with Burrows–Wheeler transformBioinformatics, 2009
- Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1Nature Genetics, 2009
- Substantial biases in ultra-short read data sets from high-throughput DNA sequencingNucleic Acids Research, 2008