Long-read transcriptome and other genomic resources for the angiosperm Silene noctiflora
Open Access
- 3 June 2021
- journal article
- research article
- Published by Oxford University Press (OUP) in G3 Genes|Genomes|Genetics
- Vol. 11 (8)
- https://doi.org/10.1093/g3journal/jkab189
Abstract
The angiosperm genus Silene is a model system for several traits of ecological and evolutionary significance in plants, including breeding system and sex chromosome evolution, host-pathogen interactions, invasive species biology, heavy metal tolerance, and cytonuclear interactions. Despite its importance, genomic resources for this large genus of approximately 850 species are scarce, with only one published whole-genome sequence (from the dioecious species Silene latifolia). Here, we provide genomic and transcriptomic resources for a hermaphroditic representative of this genus (S. noctiflora), including a PacBio Iso-Seq transcriptome, which uses long-read, single-molecule sequencing technology to analyze full-length mRNA transcripts. Using these data, we have assembled and annotated high-quality full-length cDNA sequences for approximately 14,126 S. noctiflora genes and 25,317 isoforms. We demonstrated the utility of these data to distinguish between recent and highly similar gene duplicates by identifying novel paralogous genes in an essential protease complex. Furthermore, we provide a draft assembly for the approximately 2.7-Gb genome of this species, which is near the upper range of genome-size values reported for diploids in this genus and threefold larger than the 0.9-Gb genome of Silene conica, another species in the same subgenus. Karyotyping confirmed that S. noctiflora is a diploid, indicating that its large genome size is not due to polyploidization. These resources should facilitate further study and development of this genus as a model in plant ecology and evolution.Keywords
Funding Information
- National Science Foundation (grant (MCB-1733227))
- Colorado State University, and graduate fellowships from NSF
- National Institutes of Health
This publication has 83 references indexed in Scilit:
- MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and UsabilityMolecular Biology and Evolution, 2013
- High-quality draft assemblies of mammalian genomes from massively parallel sequence dataProceedings of the National Academy of Sciences of the United States of America, 2010
- Limitations of next-generation genome sequence assemblyNature Methods, 2010
- Genetic determination of male sterility in gynodioecious Silene nutansHeredity, 2010
- New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0Systematic Biology, 2010
- BLAST+: architecture and applicationsBMC Bioinformatics, 2009
- RNAmmer: consistent and rapid annotation of ribosomal RNA genesNucleic Acids Research, 2007
- Eukaryotic genome size databasesNucleic Acids Research, 2006
- The Pfam protein families databaseNucleic Acids Research, 2004
- Predicting transmembrane protein topology with a hidden markov model: application to complete genomesJournal of Molecular Biology, 2001