MLTreeMap - accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies

Abstract
Background: Shotgun sequencing of environmental DNA is an essential technique for characterizing uncultivated microbes in situ. However, the taxonomic and functional assignment of the obtained sequence fragments remains a pressing problem. Results: Existing algorithms are largely optimized for speed and coverage; in contrast, we present here a software framework that focuses on a restricted set of informative gene families, using Maximum Likelihood to assign these with the best possible accuracy. This framework ('MLTreeMap'; http://mltreemap.org/) uses raw nucleotide sequences as input, and includes hand-curated, extensible reference information. Conclusions: We discuss how we validated our pipeline using complete genomes as well as simulated and actual environmental sequences.