PaCRISPR: a server for predicting and visualizing anti-CRISPR proteins
Open Access
- 27 May 2020
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 48 (W1), W348-W357
- https://doi.org/10.1093/nar/gkaa432
Abstract
Anti-CRISPRs are widespread amongst bacteriophage and promote bacteriophage infection by inactivating the bacterial host's CRISPR-Cas defence system. Identifying and characterizing anti-CRISPR proteins opens an avenue to explore and control CRISPR-Cas machineries for the development of new CRISPR-Cas based biotechnological and therapeutic tools. Past studies have identified anti-CRISPRs in several model phage genomes, but a challenge exists to comprehensively screen for anti-CRISPRs accurately and efficiently from genome and metagenome sequence data. Here, we have developed an ensemble learning based predictor, PaCRISPR, to accurately identify anti-CRISPRs from protein datasets derived from genome and metagenome sequencing projects. PaCRISPR employs different types of feature recognition united within an ensemble framework. Extensive cross-validation and independent tests show that PaCRISPR achieves a significantly more accurate performance compared with homology-based baseline predictors and an existing toolkit. The performance of PaCRISPR was further validated in discovering anti-CRISPRs that were not part of the training for PaCRISPR, but which were recently demonstrated to function as anti-CRISPRs for phage infections. Data visualization on anti-CRISPR relationships, highlighting sequence similarity and phylogenetic considerations, is part of the output from the PaCRISPR toolkit, which is freely available at http://pacrispr.erc.monash.edu/.Funding Information
- National Health and Medical Research Council (1092262)
- National Natural Science Foundation of China (61862017)
This publication has 51 references indexed in Scilit:
- Bacteriophage genes that inactivate the CRISPR/Cas bacterial immune systemNature, 2012
- Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profileBiochimie, 2010
- jsPhyloSVG: A Javascript Library for Visualizing Interactive and Vector-Based Phylogenetic Trees on the WebPLOS ONE, 2010
- BLAST+: architecture and applicationsBMC Bioinformatics, 2009
- A new taxonomy-based protein fold recognition approach based on autocross-covariance transformationBioinformatics, 2009
- Sequence-based prediction of protein interaction sites with an integrative methodBioinformatics, 2009
- Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction NetworksGenome Research, 2003
- MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transformNucleic Acids Research, 2002
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Comparison of the predicted and observed secondary structure of T4 phage lysozymeBiochimica et Biophysica Acta (BBA) - Protein Structure, 1975