Next-generation genomics: an integrative approach

Abstract
The integration of transcriptomic, genetic, genomic, epigenetic and network interaction data is crucial for a unified view of biological processes and to advance our understanding of human disease and biology. The genome sequence is a scaffold on which known annotations and experimental data can be assembled. It is useful to view these different levels of information together on a genome browser. Data integration can be used to identify functional elements in the genome, explore the function of genetic variation and improve understanding of gene regulation. Given large multidimensional data sets with minimal parameters, unsupervised learning techniques can be used to identify frequently occurring patterns in the data and therefore to suggest hypotheses. Carefully designed computational experiments for supervised integration can be used to test hypotheses on a global scale. Other supervised approaches, such as Bayesian networks, can also generate hypotheses of function. There are a range of online and stand-alone tools available to bench scientists for tackling large-scale data sets. Several analytical hurdles remain, which are being addressed by bioinformaticians.