Latest articles in this journal

Aviv Omer, Or Shemesh, Ayelet Peres, Pazit Polak, Adrian J Shepherd, Corey T Watson, Scott D Boyd, Andrew M Collins, William Lees, Gur Yaari
Nucleic Acids Research; doi:10.1093/nar/gkz872

Abstract:VDJbase is a publicly available database that offers easy searching of data describing the complete sets of gene sequences (genotypes and haplotypes) inferred from adaptive immune receptor repertoire sequencing datasets. VDJbase is designed to act as a resource that will allow the scientific community to explore the genetic variability of the immunoglobulin (Ig) and T cell receptor (TR) gene loci. It can also assist in the investigation of Ig- and TR-related genetic predispositions to diseases. Our database includes web-based query and online tools to assist in visualization and analysis of the genotype and haplotype data. It enables users to detect those alleles and genes that are significantly over-represented in a particular population, in terms of genotype, haplotype and gene expression. The database website can be freely accessed at, and no login is required. The data and code use creative common licenses and are freely downloadable from
David S Wishart, Carin Li, Ana Marcu, Hasan Badran, Allison Pon, Zachary Budinski, Jonas Patron, Debra Lipton, Xuan Cao, Eponine Oler, et al.
Nucleic Acids Research; doi:10.1093/nar/gkz861

Abstract:PathBank ( is a new, comprehensive, visually rich pathway database containing more than 110 000 machine-readable pathways found in 10 model organisms (Homo sapiens, Bos taurus, Rattus norvegicus, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, Escherichia coli and Pseudomonas aeruginosa). PathBank aims to provide a pathway for every protein and a map for every metabolite. This resource is designed specifically to support pathway elucidation and pathway discovery in transcriptomics, proteomics, metabolomics and systems biology. It provides detailed, fully searchable, hyperlinked diagrams of metabolic, metabolite signaling, protein signaling, disease, drug and physiological pathways. All PathBank pathways include information on the relevant organs, organelles, subcellular compartments, cofactors, molecular locations, chemical structures and protein quaternary structures. Each small molecule is hyperlinked to the rich data contained in public chemical databases such as HMDB or DrugBank and each protein or enzyme complex is hyperlinked to UniProt. All PathBank pathways are accompanied with references and detailed descriptions which provide an overview of the pathway, condition or processes depicted in each diagram. Every PathBank pathway is downloadable in several machine-readable and image formats including BioPAX, SBML, PWML, SBGN, RXN, PNG and SVG. PathBank also supports community annotations and submissions through the web-based PathWhiz pathway illustrator. The vast majority of PathBank's pathways (>95%) are not found in any other public pathway database.
Monika Sharma, Shakshi Sharma, Apoorv Alawada
Nucleic Acids Research; doi:10.1093/nar/gkz877

Abstract:Mammalian Quaking (QKI) protein, a member of STAR family of proteins is a mRNA-binding protein, which post-transcriptionally modulates the target RNA. QKI protein possesses a maxi-KH domain composed of single heterogeneous nuclear ribonucleoprotein K homology (KH) domain and C-terminal QUA2 domain, that binds a sequence-specific QKI RNA recognition element (QRE), CUAAC. To understand the binding specificities for different mRNA sequences of the KH-QUA2 domain of QKI protein, we introduced point mutations at different positions in the QRE resulting in twelve different mRNA sequences with single nucleotide change. We carried out long unbiased molecular dynamics simulations using two different sets of recently updated forcefield parameters: AMBERff14SB+RNAχOL3 and CHARMM36 (with CMAP correction). We analyzed the changes in intermolecular dynamics as a result of mutation. Our results show that AMBER forcefields performed better to model the interactions between mRNA and protein. We also calculated the binding affinities of different mRNA sequences and found that the relative order correlates to the reported experimental studies. Our study shows that the favorable binding with the formation of stable complex will occur when there is an increase of the total intermolecular contacts between mRNA and protein, but without the loss of native contacts within the KH-QUA domain.
Marc Laforet, Thomas A McMurrough, Michael Vu, Christopher M Brown, Kun Zhang, Murray S Junop, Gregory B Gloor, David R Edgell
Nucleic Acids Research; doi:10.1093/nar/gkz866

Abstract:Identifying and validating intermolecular covariation between proteins and their DNA-binding sites can provide insights into mechanisms that regulate selectivity and starting points for engineering new specificity. LAGLIDADG homing endonucleases (meganucleases) can be engineered to bind non-native target sites for gene-editing applications, but not all redesigns successfully reprogram specificity. To gain a global overview of residues that influence meganuclease specificity, we used information theory to identify protein–DNA covariation. Directed evolution experiments of one predicted pair, 227/+3, revealed variants with surprising shifts in I-OnuI substrate preference at the central 4 bases where cleavage occurs. Structural studies showed significant remodeling distant from the covarying position, including restructuring of an inter-hairpin loop, DNA distortions near the scissile phosphates, and new base-specific contacts. Our findings are consistent with a model whereby the functional impacts of covariation can be indirectly propagated to neighboring residues outside of direct contact range, allowing meganucleases to adapt to target site variation and indirectly expand the sequence space accessible for cleavage. We suggest that some engineered meganucleases may have unexpected cleavage profiles that were not rationally incorporated during the design process.
Eric W Sayers, Jeff Beck, J Rodney Brister, Evan E Bolton, Kathi Canese, Donald C Comeau, Kathryn Funk, Anne Ketter, Sunghwan Kim, Avi Kimchi, et al.
Nucleic Acids Research; doi:10.1093/nar/gkz899

The publisher has not yet granted permission to display this abstract.
Zhonglong Guo, Zheng Kuang, Ying Wang, Yongxin Zhao, Yihan Tao, Chen Cheng, Jing Yang, Xiayang Lu, Chen Hao, Tianxin Wang, et al.
Nucleic Acids Research; doi:10.1093/nar/gkz894

Abstract:MicroRNAs (miRNAs) are small non-coding RNA molecules that function as diverse endogenous gene regulators at the post-transcriptional level. In the past two decades, as research effort on miRNA identification, function and evolution has soared, so has the demand for miRNA databases. However, the current plant miRNA databases suffer from several typical drawbacks, including a lack of entries for many important species, uneven annotation standards across different species, abundant questionable entries, and limited annotation. To address these issues, we developed a knowledge-based database called Plant miRNA Encyclopedia (PmiREN,, which was based on uniform processing of sequenced small RNA libraries using miRDeep-P2, followed by manual curation using newly updated plant miRNA identification criteria, and comprehensive annotation. PmiREN currently contains 16,422 high confidence novel miRNA loci in 88 plant species and 3,966 retrieved from miRBase. For every miRNA entry, information on precursor sequence, precursor secondary structure, expression pattern, clusters and synteny in the genome, potential targets supported by Parallel Analysis of RNA Ends (PARE) sequencing, and references is attached whenever possible. PmiREN is hierarchically accessible and has eight built-in search engines. We believe PmiREN is useful for plant miRNA cataloguing and data mining, therefore a resource for data-driven miRNA research in plants.
Flore Beurton, Przemyslaw Stempor, Matthieu Caron, Alex Appert, Yan Dong, Ron A-J Chen, David Cluet, Yohann Couté, Marion Herbette, Ni Huang, et al.
Nucleic Acids Research; doi:10.1093/nar/gkz880

Abstract:The CFP1 CXXC zinc finger protein targets the SET1/COMPASS complex to non-methylated CpG rich promoters to implement tri-methylation of histone H3 Lys4 (H3K4me3). Although H3K4me3 is widely associated with gene expression, the effects of CFP1 loss vary, suggesting additional chromatin factors contribute to context dependent effects. Using a proteomics approach, we identified CFP1 associated proteins and an unexpected direct link between Caenorhabditis elegans CFP-1 and an Rpd3/Sin3 small (SIN3S) histone deacetylase complex. Supporting a functional connection, we find that mutants of COMPASS and SIN3 complex components genetically interact and have similar phenotypic defects including misregulation of common genes. CFP-1 directly binds SIN-3 through a region including the conserved PAH1 domain and recruits SIN-3 and the HDA-1/HDAC subunit to H3K4me3 enriched promoters. Our results reveal a novel role for CFP-1 in mediating interaction between SET1/COMPASS and a Sin3S HDAC complex at promoters.
William D Baez, Bappaditya Roy, Zakkary A McNutt, Elan A Shatoff, Shicheng Chen, Ralf Bundschuh, Kurt Fredrick
Nucleic Acids Research; doi:10.1093/nar/gkz855

Abstract:In all cells, initiation of translation is tuned by intrinsic features of the mRNA. Here, we analyze translation in Flavobacterium johnsoniae, a representative of the Bacteroidetes. Members of this phylum naturally lack Shine–Dalgarno (SD) sequences in their mRNA, and yet their ribosomes retain the conserved anti-SD sequence. Translation initiation is tuned by mRNA secondary structure and by the identities of several key nucleotides upstream of the start codon. Positive determinants include adenine at position –3, reminiscent of the Kozak sequence of Eukarya. Comparative analysis of Escherichia coli reveals use of the same Kozak-like sequence to enhance initiation, suggesting an ancient and widespread mechanism. Elimination of contacts between A-3 and the conserved β-hairpin of ribosomal protein uS7 fails to diminish the contribution of A-3 to initiation, suggesting an indirect mode of recognition. Also, we find that, in the Bacteroidetes, the trinucleotide AUG is underrepresented in the vicinity of the start codon, which presumably helps compensate for the absence of SD sequences in these organisms.
Wubin Ding, Jiwei Chen, Guoshuang Feng, Geng Chen, Jun Wu, Yongli Guo, Xin Ni, Tieliu Shi
Nucleic Acids Research; doi:10.1093/nar/gkz830

Abstract:Aberrant DNA methylation plays an important role in cancer progression. However, no resource has been available that comprehensively provides DNA methylation-based diagnostic and prognostic models, expression–methylation quantitative trait loci (emQTL), pathway activity-methylation quantitative trait loci (pathway-meQTL), differentially variable and differentially methylated CpGs, and survival analysis, as well as functional epigenetic modules for different cancers. These provide valuable information for researchers to explore DNA methylation profiles from different aspects in cancer. To this end, we constructed a user-friendly database named DNA Methylation Interactive Visualization Database (DNMIVD), which comprehensively provides the following important resources: (i) diagnostic and prognostic models based on DNA methylation for multiple cancer types of The Cancer Genome Atlas (TCGA); (ii) meQTL, emQTL and pathway-meQTL for diverse cancers; (iii) Functional Epigenetic Modules (FEM) constructed from Protein-Protein Interactions (PPI) and Co-Occurrence and Mutual Exclusive (COME) network by integrating DNA methylation and gene expression data of TCGA cancers; (iv) differentially variable and differentially methylated CpGs and differentially methylated genes as well as related enhancer information; (v) correlations between methylation of gene promoter and corresponding gene expression and (vi) patient survival-associated CpGs and genes with different endpoints. DNMIVD is freely available at We believe that DNMIVD can facilitate research of diverse cancers.
Marta Iannuccelli, Elisa Micarelli, Prisca Lo Surdo, Alessandro Palma, Livia Perfetto, Ilaria Rozzo, Luisa Castagnoli, Luana Licata, Gianni Cesareni
Nucleic Acids Research; doi:10.1093/nar/gkz871

Abstract:CancerGeneNet ( is a resource that links genes that are frequently mutated in cancers to cancer phenotypes. The resource takes advantage of a curation effort aimed at embedding a large fraction of the gene products that are found altered in cancer cells into a network of causal protein relationships. Graph algorithms, in turn, allow to infer likely paths of causal interactions linking cancer associated genes to cancer phenotypes thus offering a rational framework for the design of strategies to revert disease phenotypes. CancerGeneNet bridges two interaction layers by connecting proteins whose activities are affected by cancer drivers to proteins that impact on the ‘hallmarks of cancer’. In addition, CancerGeneNet annotates curated pathways that are relevant to rationalize the pathological consequences of cancer driver mutations in selected common cancers and ‘MiniPathways’ illustrating regulatory circuits that are frequently altered in different cancers.