Whole-genome assembly of Corylus avellana cv “Tonda Gentile delle Langhe” using linked-reads (10X Genomics)

Abstract
The European hazelnut (Corylus avellana L.; 2n=2x=22) is a worldwide economically important tree nut that is cross-pollinated due to sporophytic incompatibility. Therefore, any individual plant is highly heterozygous. Cultivars are clonally propagated using mound layering, rooted suckers and micropropagation. In recent years, the interest in this crop has increased, due to a growing demand related to the recognized health benefits of nut consumption. C. avellana cv ‘Tonda Gentile delle Langhe’ (‘TGdL’) is well-known for its high kernel quality, and the premium price paid for this cultivar is an economic benefit for producers in northern Italy. Assembly of a high-quality genome is a difficult task in many plant species because of the high level of heterozygosity. We assembled a chromosome-level genome sequence of ′TGdL′ with a two-step approach. First, 10X Genomics Chromium Technology was used to create a high-quality sequence, which was then assembled into scaffolds with cv ′Tombul′ genome as the reference. Eleven pseudomolecules were obtained, corresponding to 11 chromosomes. A total of 11,046 scaffolds remained unplaced, representing 11% of the genome (46,504,161 bp). Gene prediction, performed with Maker-P software, identified 27,791 genes (AED ≤ 0.4 and 92% of BUSCO completeness), whose function was analysed with BlastP and InterProScan software. To characterise ‘TGdL’ specific genetic mechanisms, Orthofinder was used to detect orthologs between hazelnut and closely related species. The ‘TGdL’ genome sequence is expected to be a powerful tool to understand hazelnut genetics and allow detection of markers/genes for important traits to be used in targeted breeding programs.