Short reads and nonmodel species: exploring the complexities of next‐generation sequence assembly and SNP discovery in the absence of a reference genome
- 17 February 2011
- journal article
- research article
- Published by Wiley in Molecular Ecology Resources
- Vol. 11 (s1), 93-108
- https://doi.org/10.1111/j.1755-0998.2010.02969.x
Abstract
How practical is gene and SNP discovery in a nonmodel species using short read sequences? Next-generation sequencing technologies are being applied to an increasing number of species with no reference genome. For nonmodel species, the cost, availability of existing genetic resources, genome complexity and the planned method of assembly must all be considered when selecting a sequencing platform. Our goal was to examine the feasibility and optimal methodology for SNP and gene discovery in the sockeye salmon (Oncorhynchus nerka) using short read sequences. SOLiD short reads (up to 50 bp) were generated from single- and pooled-tissue transcriptome libraries from ten sockeye salmon. The individuals were from five distinct populations from the Wood River Lakes and Mendeltna Creek, Alaska. As no reference genome was available for sockeye salmon, the SOLiD sequence reads were assembled to publicly available EST reference sequences from sockeye salmon and two closely related species, rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar). Additionally, de novo assembly of the SOLiD data was carried out, and the SOLiD reads were remapped to the de novo contigs. The results from each reference assembly were compared across all references. The number and size of contigs assembled varied with the size reference sequences. In silico SNP discovery was carried out on contigs from all four EST references; however, discovery of valid SNPs was most successful using one of the two conspecific references.Keywords
This publication has 44 references indexed in Scilit:
- Transcriptome sequencing and high-resolution melt analysis advance single nucleotide polymorphism discovery in duplicated salmonidsMolecular Ecology Resources, 2010
- Comparative genomics based on massive parallel transcriptome sequencing reveals patterns of substitution and selection across 10 bird speciesMolecular Ecology, 2010
- Targeted single nucleotide polymorphism (SNP) discovery in a highly polyploid plant species using 454 sequencingPlant Biotechnology Journal, 2009
- Single nucleotide polymorphism (SNP) discovery in the polyploid Brassica napus using Solexa transcriptome sequencingPlant Biotechnology Journal, 2009
- Next-generation DNA sequencingNature Biotechnology, 2008
- Sequencing goes 454 and takes large‐scale genomics into the wildMolecular Ecology, 2008
- genepop’007: a complete re‐implementation of the genepop software for Windows and LinuxMolecular Ecology Resources, 2008
- Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markersGenome Research, 2006
- Thirty‐two single nucleotide polymorphism markers for high‐throughput genotyping of sockeye salmonMolecular Ecology Notes, 2006
- genalex 6: genetic analysis in Excel. Population genetic software for teaching and researchMolecular Ecology Notes, 2005