A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Open Access

3 August 2010

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 38 (17), e171
https://doi.org/10.1093/nar/gkq667

Abstract

Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism.

This publication has 83 references indexed in Scilit:

Advanced in silico analysis of expressed sequence tag (EST) data for parasitic nematodes of major socio-economic importance — Fundamental insights toward biotechnological outcomes
Biotechnology Advances, 2009
Applications of next-generation sequencing technologies in functional genomics
Genomics, 2008
Accurate whole human genome sequencing using reversible terminator chemistry
Nature, 2008
InterPro: the integrative protein signature database
Nucleic Acids Research, 2008
iPath: interactive exploration of biochemical pathways and networks
Trends in Biochemical Sciences, 2008
ESTExplorer: an expressed sequence tag (EST) assembly and annotation platform
Nucleic Acids Research, 2007
KOBAS server: a web-based platform for automated annotation and pathway identification
Nucleic Acids Research, 2006
Identification of novel chondroitin proteoglycans in Caenorhabditis elegans: embryonic cell division depends on CPG-1 and CPG-2
The Journal of cell biology, 2006
Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research
Bioinformatics, 2005
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997

Cited by 89 articles