Bioinformatics services for analyzing massive genomic datasets

Open Access

31 March 2020

journal article
Published by Korea Genome Organization in Genomics & Informatics

Vol. 18 (1), e8
https://doi.org/10.5808/gi.2020.18.1.e8

Abstract

The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.

Keywords

Funding Information

National Research Foundation of Korea (2014M3C9A3064552, 2014M3C9A3065221, 2014M3C9A3064548, 2014M3C9A3068554, 2014M3C9A3068822, 2019M3C9A5069653)

This publication has 39 references indexed in Scilit:

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks
Nature Protocols, 2012
ChIP-Seq Data Analysis: Identification of Protein–DNA Binding Sites with SISSRs Peak-Finder
Methods in Molecular Biology, 2011
The sequence read archive: explosive growth of sequencing data
Nucleic Acids Research, 2011
ADGO 2.0: interpreting microarray data and list of genes using composite annotations
Nucleic Acids Research, 2011
CisGenome Browser: a flexible tool for genomic data visualization
Bioinformatics, 2010
GSA-SNP: a general approach for gene set analysis of polymorphisms
Nucleic Acids Research, 2010
Ab initio gene identification in metagenomic sequences
Nucleic Acids Research, 2010
De novo assembly of human genomes with massively parallel short read sequencing
Genome Research, 2009
PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls
Nature Biotechnology, 2009
MEGAN analysis of metagenomic data
Genome Research, 2007

Cited by 6 articles