A metagenomic content and knowledge management ecosystem platform

Abstract
The reduced cost of DNA sequencing allows metagenomics to be applied on a larger scale. With metagenomic analysis, we have better insight into supplement usage, methane production, and feed conversion efficiency in livestock systems. Nevertheless, sequencing machines generate an enormous amount of complex data. Conventional methods used in the analysis of genomic data involve pre-processing and synchronous reconstruction by multiple systems, which is time consuming and prone to failure. Furthermore, the sequencing datasets and analysis results need to be organized and stored properly in order for scientists to search and access them. To tackle these challenges, a new workflow for metagenomic analysis with improved infrastructure is needed. The MetaPlat project supports experts in both academic and non-academic sectors dealing with challenges in the field of metagenomics by focusing on improved hardware and software platforms. High-performance, fault-tolerant, flexible, and scalable processors and analysis systems will help to increase the effectiveness and efficiency of current metagenomics studies. In this paper, we propose such as an infrastructure applying emerging technologies, such as Kafka, Docker, and Hadoop. Details of the infrastructure solution and some preliminary results are also discussed.