Chromosomal-level assembly of the blood clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C
Open Access
- 1 July 2019
- journal article
- research article
- Published by Oxford University Press (OUP) in GigaScience
Abstract
The blood clam, Scapharca (Anadara) broughtonii, is an economically and ecologically important marine bivalve of the family Arcidae. Efforts to study their population genetics, breeding, cultivation, and stock enrichment have been somewhat hindered by the lack of a reference genome. Herein, we report the complete genome sequence of S. broughtonii, a first reference genome of the family Arcidae. A total of 75.79 Gb clean data were generated with the Pacific Biosciences and Oxford Nanopore platforms, which represented approximately 86× coverage of the S. broughtonii genome. De novo assembly of these long reads resulted in an 884.5-Mb genome, with a contig N50 of 1.80 Mb and scaffold N50 of 45.00 Mb. Genome Hi-C scaffolding resulted in 19 chromosomes containing 99.35% of bases in the assembled genome. Genome annotation revealed that nearly half of the genome (46.1%) is composed of repeated sequences, while 24,045 protein-coding genes were predicted and 84.7% of them were annotated. We report here a chromosomal-level assembly of the S. broughtonii genome based on long-read sequencing and Hi-C scaffolding. The genomic data can serve as a reference for the family Arcidae and will provide a valuable resource for the scientific community and aquaculture sector.Keywords
Funding Information
- National Key R&D Program of China (2018YFD0900304)
- China Agriculture Research System (CARS-49)
- National Natural Science Foundation of China (31602142, 31502208)
This publication has 54 references indexed in Scilit:
- Infernal 1.1: 100-fold faster RNA homology searchesBioinformatics, 2013
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposonsNucleic Acids Research, 2007
- Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics researchBioinformatics, 2005
- De novo identification of repeat families in large genomesBioinformatics, 2005
- The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003Nucleic Acids Research, 2003
- KEGG: Kyoto Encyclopedia of Genes and GenomesNucleic Acids Research, 2000
- Prediction of complete gene structures in human genomic DNAJournal of Molecular Biology, 1997
- tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic SequenceNucleic Acids Research, 1997
- Maximum Discrimination Hidden Markov Models of Sequence ConsensusJournal of Computational Biology, 1995