Sampling Variation of RAD-Seq Data from Diploid and Tetraploid Potato (Solanum tuberosum L.)
Open Access
- 7 February 2021
- Vol. 10 (2), 319
- https://doi.org/10.3390/plants10020319
Abstract
The new sequencing technology enables identification of genome-wide sequence-based variants at a population level and a competitively low cost. The sequence variant-based molecular markers have motivated enormous interest in population and quantitative genetic analyses. Generation of the sequence data involves a sophisticated experimental process embedded with rich non-biological variation. Statistically, the sequencing process indeed involves sampling DNA fragments from an individual sequence. Adequate knowledge of sampling variation of the sequence data generation is one of the key statistical properties for any downstream analysis of the data and for implementing statistically appropriate methods. This paper reports a thorough investigation on modeling the sampling variation of the sequence data from the optimized RAD-seq (Restriction sit associated DNA sequencing) experiments with two parents and their offspring of diploid and autotetraploid potato (Solanum tuberosum L.). The analysis shows significant dispersion in sampling variation of the sequence data over that expected under multinomial distribution as widely assumed in the literature and provides statistical methods for modeling the variation and calculating the model parameters, which may be easily implemented in real sequence datasets. The optimized design of RAD-seq experiments enabled effective control of presentation of undesirable chloroplast DNA and RNA genes in the sequence data generated.Keywords
Funding Information
- Biotechnology and Biological Sciences Research Council (BB/N008952/1)
- Nature Science Fundation of China (31671328 and 31871240)
This publication has 26 references indexed in Scilit:
- SNP genotyping and parameter estimation in polyploids using low-coverage sequencing dataBioinformatics, 2017
- WASP: allele-specific software for robust molecular quantitative trait locus discoveryNature Methods, 2015
- Estimating genotype error rates from high-coverage next-generation sequence dataGenome Research, 2014
- QTL mapping in autotetraploids using SNP dosage informationTheoretical and Applied Genetics, 2014
- A Next-Generation Sequencing Method for Genotyping-by-Sequencing of Highly Heterozygous Autotetraploid PotatoPLOS ONE, 2013
- Genotyping‐by‐Sequencing for Plant Breeding and GeneticsThe Plant Genome, 2012
- The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching processNucleic Acids Research, 2011
- A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing dataBioinformatics, 2011
- Genome-wide genetic marker discovery and genotyping using next-generation sequencingNature Reviews Genetics, 2011
- Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing dataBioinformatics, 2009