Deep learning model for metagenome fragment classification using spaced k-mers feature extraction

Open Access

Abstract

An open challenge in bioinformatics is the analysis of the sequenced metagenomes from the various environments. Several studies demonstrated bacteria classification at the genus level using k-mers as feature extraction where the highest value of k gives better accuracy but it is costly in terms of computational resources and computational time. Spaced k-mers method was used to extract the feature of the sequence using 111 1111 10001 where 1 was a match and 0 was the condition that could be a match or did not match. Currently, deep learning provides the best solutions to many problems in image recognition, speech recognition, and natural language processing. In this research, two different deep learning architectures, namely Deep Neural Network (DNN) and Convolutional Neural Network (CNN), trained to approach the taxonomic classification of metagenome data and spaced k-mers method for feature extraction. The result showed the DNN classifier reached 90.89 % and the CNN classifier reached 88.89 % accuracy at the genus level taxonomy.

Keywords

Funding Information

Institut Pertanian Bogor

This publication has 14 references indexed in Scilit:

DeepInteract: Deep Neural Network Based Protein-Protein Interaction Prediction Tool
Current Bioinformatics, 2017
Deep learning
Nature, 2015
Advances in Machine Learning for Processing and Comparison of Metagenomic Data
Published by Elsevier BV ,2014
A metagenome-wide association study of gut microbiota in type 2 diabetes
Nature, 2012
Metagenomic Analyses: Past and Future Trends
Applied and Environmental Microbiology, 2011
MetaSim—A Sequencing Simulator for Genomics and Metagenomics
PLOS ONE, 2008
What's in the mix: phylogenetic classification of metagenome sequence samples
Current Opinion in Microbiology, 2007
MEGAN analysis of metagenomic data
Genome Research, 2007
An obesity-associated gut microbiome with increased capacity for energy harvest
Nature, 2006
PatternHunter: faster and more sensitive homology search
Bioinformatics, 2002