ISSN / EISSN : 2160-1836 / 2160-1836
Published by: Oxford University Press (OUP) (10.1093)
Total articles ≅ 3,175
Latest articles in this journal
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab228
In conventional linear models for whole-genome prediction and genome-wide association studies (GWAS), it is usually assumed that the relationship between genotypes and phenotypes is linear. Bayesian neural networks have been used to account for non-linearity such as complex genetic architectures. Here, we introduce a method named NN-Bayes, where” NN” stands for neural networks, and” Bayes” stands for Bayesian Alphabet models, including a collection of Bayesian regression models such as BayesA, BayesB, BayesC, and Bayesian LASSO. NN-Bayes incorporates Bayesian Alphabet models into non-linear neural networks via hidden layers between SNPs and observed traits. Thus, NN-Bayes attempts to improve the performance of genome-wide prediction and GWAS by accommodating non-linear relationships between the hidden nodes and the observed trait, while maintaining genomic interpretability through the Bayesian regression models that connect the SNPs to the hidden nodes. For genomic interpretability, the posterior distribution of marker effects in NN-Bayes is inferred by Markov chain Monte Carlo (MCMC) approaches and used for inference of association through posterior inclusion probabilities (PIPs) and window posterior probability of association (WPPA). In simulation studies with dominance and epistatic effects, performance of NN-Bayes was significantly better than conventional linear models for both GWAS and whole-genome prediction, and the differences on prediction accuracy were substantial in magnitude. In real data analyses, for the soy dataset, NN-Bayes achieved significantly higher prediction accuracies than conventional linear models, and results from other four different species showed that NN-Bayes had similar prediction performance to linear models, which is potentially due to the small sample size. Our NN-Bayes is optimized for high-dimensional genomic data and implemented in an open-source package called” JWAS”. NN-Bayes can lead to greater use of Bayesian neural networks to account for non-linear relationships due to its interpretability and computational performance.
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab230
Life requires the oligomerization of individual proteins into higher-order assemblies. In order to form functional oligomers, monomers must adopt appropriate three-dimensional structures. Molecular chaperones transiently bind nascent or misfolded proteins to promote proper folding. Single missense mutations frequently cause disease by perturbing folding despite chaperone engagement. A misfolded mutant capable of oligomerizing with wild-type proteins can dominantly poison oligomer function. We previously found evidence that human-disease-linked mutations in Saccharomyces cerevisiae septin proteins slow folding and attract chaperones, resulting in a kinetic delay in oligomerization that prevents the mutant from interfering with wild-type function. Here we build upon our septin studies to develop a new approach for identifying chaperone interactions in living cells, and use it to expand our understanding of chaperone involvement, kinetic folding delays, and oligomerization in the recessive behavior of tumor-derived mutants of the tumor suppressor p53. We find evidence of increased binding of several cytosolic chaperones to a recessive, misfolding-prone mutant, p53(V272M). Similar to our septin results, chaperone overexpression inhibits the function of p53(V272M) with minimal effect on the wild type. Unlike mutant septins, p53(V272M) is not kinetically delayed under conditions in which it is functional. Instead, it interacts with wild-type p53 but this interaction is temperature sensitive. At high temperatures or upon chaperone overexpression, p53(V272M) is excluded from the nucleus and cannot function or perturb wild-type function. Hsp90 inhibition liberates mutant p53 to enter the nucleus. These findings provide new insights into the effects of missense mutations.
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab224
Gene conversion is GC-biased across a wide range of taxa. Large palindromes on mammalian sex chromosomes undergo frequent gene conversion that maintains arm-to-arm sequence identity greater than 99%, which may increase their susceptibility to the effects of GC-biased gene conversion. Here, we demonstrate a striking history of GC-biased gene conversion in 12 palindromes conserved on the X chromosomes of human, chimpanzee, and rhesus macaque. Primate X-chromosome palindrome arms have significantly higher GC content than flanking single-copy sequences. Nucleotide replacements that occurred in human and chimpanzee palindrome arms over the past 7 million years are one-and-a-half times as GC-rich as the ancestral bases they replaced. Using simulations, we show that our observed pattern of nucleotide replacements is consistent with GC-biased gene conversion with a magnitude of 70%, similar to previously reported values based on analyses of human meioses. However, GC-biased gene conversion since the divergence of human and rhesus macaque explains only a fraction of the observed difference in GC content between palindrome arms and flanking sequence, suggesting that palindromes are older than 29 million years and/or had elevated GC content at the time of their formation. This work supports a greater than 2:1 preference for GC bases over AT bases during gene conversion, and demonstrates that the evolution and composition of mammalian sex chromosome palindromes is strongly influenced by GC-biased gene conversion.
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab225
Technological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures, phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium. We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak linkage disequilibrium with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab226
The Salmonella research community has used strains and bacteriophages over decades, exchanging useful new isolates among laboratories for study of cell surface antigens, metabolic pathways and restriction-modification studies. Here we present the sequences of two laboratory Salmonella strains (STK005, an isolate of LB5000; and its descendant ER3625). In the ancestry of LB5000, segments of ∼15 and ∼42 kb were introduced from Salmonella enterica sv Abony 103 into Salmonella enterica sv Typhimurium LT2, forming strain SD14; this strain is thus a hybrid of S. enterica isolates. Strains in the SD14 lineage were used to define flagellar antigens from the 1950s to the 1970s, and to define three restriction-modification systems from the 1960s to the 1980s. LB5000 was also used as host in phage typing systems used by epidemiologists. In the age of cheaper and easier sequencing, this resource will provide access to the sequence that underlies the extensive literature.
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab219
The wheat leaf rust fungus, Puccinia triticina Erikss. is a worldwide pathogen of tetraploid durum and hexaploid wheat. Many races of P. triticina differ for virulence to specific leaf rust resistance genes and are found in most wheat-growing regions of the world. Wheat cultivars with effective leaf rust resistance exert selection pressure on P. triticina populations for virulent race types. The objectives of this study were to examine whole-genome sequence data of 121 P. triticina isolates and to gain insight into race evolution. The collection included isolates comprising of many different race phenotypes collected worldwide from common and durum wheat. One isolate from wild wheat relative Aegilops speltoides and two from Ae. cylindrica were also included for comparison. Based on 121,907 informative variants identified relative to the reference Race 1-1 genome, isolates were clustered into 11 major lineages with 100% bootstrap support. The isolates were also grouped based on variation in 1311 predicted secreted protein genes. In gene-coding regions, all groups had high ratios of non-synonymous to synonymous mutations and nonsense to readthrough mutations. Grouping of isolates based on two main variation principle components for either genome wide variation or variation just within the secreted protein genes, indicated similar groupings. Variants were distributed across the entire genome, not just within the secreted protein genes. Our results suggest that recurrent mutation and selection play a major role in differentiation within the clonal lineages.
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab223
Apis mellifera L., the western honey bee is a major crop pollinator that plays a key role in beekeeping and serves as an important model organism in social behavior studies. Recent efforts have improved on the quality of the honey bee reference genome and developed a chromosome-level assembly of sixteen chromosomes, two of which are gapless. However, the rest suffer from 51 gaps, 160 unplaced/unlocalized scaffolds, and the lack of 2 distal telomeres. The gaps are located at the hard-to-assemble extended highly repetitive chromosomal regions that may contain functional genomic elements. Here, we use de-novo re-assemblies from the most recent reference genome Amel_HAv_3.1 raw reads and other long-read-based assemblies (INRA_AMelMel_1.0, ASM1384120v1, and ASM1384124v1) of the honey bee genome to resolve 13 gaps, five unplaced/unlocalized scaffolds and, the lacking telomeres of the Amel_HAv_3.1. The total length of the resolved gaps is 848,747 bp. The accuracy of the corrected assembly was validated by mapping PacBio reads and performing gene annotation assessment. Comparative analysis suggests that the PacBio-reads-based assemblies of the honey bee genomes failed in the same highly repetitive extended regions of the chromosomes, especially on chromosome 10. To fully resolve these extended repetitive regions, further work using ultra-long Nanopore sequencing would be needed. Our updated assembly facilitates more accurate reference-guided scaffolding and marker/sequence mapping in honey bee genomics studies.
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab227
Pea (Pisum sativum L.) is an important cool season food legume for sustainable food production and human nutrition due to its nitrogen fixation capabilities and nutrient-dense seed. However, minimal breeding research has been conducted to improve the nutritional quality of the seed for biofortification, and most genomic-assisted breeding studies utilize small populations with few single nucleotide polymorphisms (SNPs). Genomic resources for pea have lagged behind those of other grain crops, but the recent release of the Pea Single Plant Plus Collection (PSPPC) and the pea reference genome provide new tools to study nutritional traits for biofortification. Calcium, phosphorus, potassium, iron, zinc, and phytic acid concentrations were measured in a study population of 299 different accessions grown under greenhouse conditions. Broad phenotypic variation was detected for all parameters except phytic acid. Calcium exhibited moderate broad-sense heritability (H2) estimates, at 50%, while all other minerals exhibited low heritability. Of the accessions used, 267 were previously genotyped in the PSPPC release by the USDA, and we mapped the genotyping data to the pea reference genome for the first time. This study generated 54,344 high-quality SNPs used to investigate the population structure of the Pea Single Plant Plus Collection and perform a genome-wide association study to identify genomic loci associated with mineral concentrations in mature pea seed. Overall, we were able to identify multiple significant SNPs and candidate genes for iron, phosphorus, and zinc. These results can be used for genetic improvement in pea for nutritional traits and biofortification, and the candidate genes provide insight into mineral metabolism.
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab220
Identifying gene × environment (G × E) interactions, especially when rare variants are included in genome-wide association studies, is a major challenge in statistical genetics. However, the detection of G × E interactions is very important for understanding the etiology of complex diseases. Although currently some statistical methods have been developed to detect the interactions between genes and environment, the detection of the interactions for the case of rare variants is still limited. Therefore, it is particularly important to develop a new method to detect the interactions between genes and environment for rare variants. In this paper, we extend an existing method of adaptive combination of P-values (ADA) and design a novel strategy (called iSADA) for testing the effects of G × E interactions for rare variants. We propose a new two-stage test to detect the interactions between genes and environment in a certain region of a chromosome or even for the whole genome. First, the score statistic is used to test the associations between trait value and the interaction terms of genes and environment and obtain the original P-values. Then, based on the idea of the ADA method, we further construct a full test statistic via the P-values of the preliminary tests in the first stage, so that we can comprehensively test the interactions between genes and environment in the considered genome region. Simulation studies are conducted to compare our proposed method with other existing methods. The results show that the iSADA has higher power than other methods in each case. A GAW17 data set is also applied to illustrate the applicability of the new method.
G3 Genes|Genomes|Genetics; doi:10.1093/g3journal/jkab221
Wolbachia is arguably one of the most ubiquitous heritable symbionts among insects and understanding its transmission dynamics is crucial for understanding why it is so common. While previous research has studied the transmission pathways of Wolbachia in several insect lineages including Lepidoptera, this study takes advantage of data collected from the lepidopteran tribe Aeromachini in an effort to assess patterns of transmission. Twenty-one of the 46 species of Aeromachini species were infected with Wolbachia. Overall, 25% (31/125) of Aeromachini specimens tested were Wolbachia positive. All Wolbachia strains were species specific except for the wJho strain which appeared to be shared by three host species with a sympatric distribution based on a co-phylogenetic comparison between Wolbachia and the Aeromachini species. Two tests of phylogenetic congruence did not find any evidence for cospeciation between Wolbachia strains and their butterfly hosts. The co-phylogenetic comparison, divergence time estimation and Wolbachia recombination analysis revealed that Wolbachia acquisition in Aeromachini appears to have mainly occurred mainly through horizontal transmission rather than codivergence.