Genomic Resources for Erysimum spp. (Brassicaceae): Transcriptome and Chloroplast Genomes

Abstract
Erysimum (Brassicaceae) is a genus of more than 200 species (Al-Shehbaz, 2012). It is widely distributed in the Northern Hemisphere and has been the focus of active research in ecology, evolution, and genetics (Gómez and Perfectti, 2010; Gómez, 2012; Valverde et al., 2016). Despite long-standing interest in Erysimum, its taxonomy has yet to be properly established, partly due to a complex and reticulated evolutionary history that renders phylogenetic reconstructions highly challenging (Ancev, 2006; Marhold and Lihová, 2006; Abdelaziz et al., 2014; Gomez et al., 2014; Moazzeni et al., 2014; Züst et al., 2020). The Baetic Mountains (South-Eastern Iberia) are among the most critical glacial refugia in Europe. The waxing and waning of plant populations following climatic fluctuations have likely complicated the distribution and genetic variation of extant diversity in this region. Isolation and posterior secondary contact between taxa may have favored hybridization and introgression (Médail and Diadema, 2009). The Erysimum species that inhabit these mountains have been a particularly fruitful system for plant evolutionary ecology [e.g., Gómez et al., 2006, 2008; Gómez and Perfectti, 2010; Gómez, 2012; Valverde et al., 2016]. However, the relationships among these species remain unresolved, hampering comparative and evolutionary studies. Genome duplications, incomplete lineage sorting, and hybridization have compromised the phylogenetic reconstructions within Erysimum (Marhold and Lihová, 2006; Osuna-Mascaró, 2020). Additionally, clarifying this group's complex evolution requires extensive genomic resources, which are currently being produced but are mostly lacking. The fast development of high-throughput sequencing technologies has led to a rapid increase in genomic and transcriptomic for many plant species (Dong et al., 2004; Duvick et al., 2007; Sundell et al., 2015; Boyles et al., 2019). However, obtaining complete genome sequencing remains a challenge with large, repetitive-DNA enriched genomes. Transcriptome sequencing is comparatively more accessible, providing a relatively cheap and fast method to obtain large amounts of functional genomic data (Timme et al., 2012; Yang and Smith, 2013; Wickett et al., 2014; Léveillé-Bourret et al., 2017). Accordingly, global initiatives such as the 1,000 plants (1KP) project have generated transcriptomic resources for over 1,000 plant species (Matasci et al., 2014; Leebens-Mack et al., 2019). In addition, the use of RNA-Seq could be useful in obtaining complete chloroplast genomes in a reliable and accessible way, making possible the use of complete molecules in phylogenomic analyses (Smith, 2013; Osuna-Mascaró et al., 2018; Morales-Briones et al., 2021). Here, we report the annotation of 18 floral transcriptomes assembled de novo from total RNA-Seq libraries and nine chloroplast genomes from seven Erysimum species inhabiting the Baetic Mountains. The chloroplast genomes were assembled from total RNA-Seq data following a previously-validated reference assemble approach (Osuna-Mascaró et al., 2018). The data presented here represent reliable genomic resources for transcriptomic, proteomic, and phylotranscriptomic studies. These data contribute to the ecological and genetic resources available for Brassicaceae in general and the genus Erysimum in particular, being the only genomic resources for these species coming from flower buds. We sampled flower buds at the same development stage (completely developed non-open buds) from three different populations of Erysimum mediohispanicum, E. nevadense, E. popovii, and E. baeticum, four populations of E. bastetanum, and one population of E. lagascae, and E. fitzii (see Supplementary Table 1 for details). We stored the samples in liquid nitrogen and maintained them in an ultra-freezer (−80°C) until RNA extraction. Then, we extracted RNA from the buds under highly sterile conditions. The buds were snap-frozen in liquid nitrogen and ground with mortar and pestle. We used the Qiagen RNeasy Plant Mini Kit, following the manufacturer's protocol, to extract total RNA and their quality and quantity were checked using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, United States) and with an Agilent 2100 Bioanalyzer system (Agilent Technologies Inc). Library preparation and RNA sequencing were conducted by Macrogen Inc. (Seoul, Korea). We used rRNA-depletion (Ribo-Zero) for mRNA enrichment and to avoid sequencing rRNAs. Library preparation was performed using the TruSeq Stranded Total RNA LT Sample Preparation Kit (Plant). The sequencing of the 18 libraries was carried out using the Hiseq 3000-4000 sequencing protocol and TruSeq 3000-4000 SBS Kit v3 reagent, following a paired-end 150 bp strategy on the Illumina HiSeq 4000 platform. A summary of sequencing statistics appears in Supplementary Table 2. We analyzed the fastq files for each library using FastQC v 0.11.5 (Andrews, 2010). Then, we trimmed the adapters using cutadapt v 1.1540 (Martin, 2011), specifying the “-b” option for trimming the adapters in 5′ and 3′ and the “-n” option to search repeatedly for the adapter sequences (28 iterations). This option ensures that the correct adapters were detected by searching in loops until any adapter match is found or until the specified number of rounds is reached. Following, we trimmed the reads by quality using Sickle v 1.3341 (Joshi and Fass, 2011), using the “pe” option for paired-end reads and the “-t” to use Illumina quality values (see https://github.com/najoshi/sickle). This trimming software uses sliding-window analyses and quality and length thresholds to cut and discard the reads that do not fit the selected threshold values. We...
Funding Information
  • Ministerio de Ciencia y Tecnología (CGL2016- 79950-R, CGL2017-86626-C2-2-P)
  • European Regional Development Fund (A-RNM-505-UGR18, SOMM17/ 6109/UGR)
  • Ministerio de Economía y Competitividad (BES-2014-069022)