GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions
Open Access
- 16 September 2020
- journal article
- research article
- Published by Frontiers Media SA in Frontiers in Bioengineering and Biotechnology
Abstract
Variant allele frequencies (VAF) are an important measure of genetic variation that can be estimated at single-nucleotide variant (SNV) sites. RNA and DNA VAFs are used as indicators of a wide-range of biological traits, including tumor purity and ploidy changes, allele-specific expression and gene-dosage transcriptional response. Here we present a novel methodology to assess gene and chromosomal allele asymmetries and to aid in identifying genomic alterations in RNA and DNA datasets. Our approach is based on analysis of the VAF distributions in chromosomal segments (continuous multi-SNV genomic regions). In each segment we estimate variant probability, a parameter of a random process that can generate synthetic VAF samples that closely resemble the observed data. We show that variant probability is a biologically interpretable quantitative descriptor of the VAF distribution in chromosomal segments which is consistent with other approaches. To this end, we apply the proposed methodology on data from 72 samples obtained from patients with breast invasive carcinoma (BRCA) from The Cancer Genome Atlas (TCGA). We compare DNA and RNA VAF distributions from matched RNA and whole exome sequencing (WES) datasets and find that both genomic signals give very similar segmentation and estimated variant probability profiles. We also find a correlation between variant probability with copy number alterations (CNA). Finally, to demonstrate a practical application of variant probabilities, we use them to estimate tumor purity. Tumor purity estimates based on variant probabilities demonstrate good concordance with other approaches (Pearson's correlation between 0.44 and 0.76). Our evaluation suggests that variant probabilities can serve as a dependable descriptor of VAF distribution, further enabling the statistical comparison of matched DNA and RNA datasets. Finally, they provide conceptual and mechanistic insights into relations between structure of VAF distributions and genetic events. The methodology is implemented in a Matlab toolbox that provides a suite of functions for analysis, statistical assessment and visualization of Genome and Transcriptome allele frequencies distributions. GeTallele is available at: https://github.com/SlowinskiPiotr/GeTallele..Funding Information
- Wellcome Trust
- Engineering and Physical Sciences Research Council
This publication has 38 references indexed in Scilit:
- Mutational and structural analysis of diffuse large B-cell lymphoma using whole-genome sequencingBlood, 2013
- THetA: inferring intra-tumor heterogeneity from high-throughput DNA sequencing dataGenome Biology, 2013
- STAR: ultrafast universal RNA-seq alignerBioinformatics, 2012
- An integrated encyclopedia of DNA elements in the human genomeNature, 2012
- Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancerGenome Research, 2012
- Integrative Genomics Viewer (IGV): high-performance genomics data visualization and explorationBriefings in Bioinformatics, 2012
- A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing dataBioinformatics, 2011
- Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing dataBioinformatics, 2009
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- Multiple Comparisons among MeansJournal of the American Statistical Association, 1961