An Application of Random Walk Resampling to Phylogenetic HMM Inference and Learning
- 1 November 2019
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Abstract
Statistical resampling methods are widely used for confidence interval placement and as a data perturbation technique for statistical inference and learning. An important assumption of popular resampling methods such as the standard bootstrap is that input observations are identically and independently distributed (i.i.d.). However, within the area of computational biology and bioinformatics, many different factors can contribute to intra-sequence dependence, such as recombination and other evolutionary processes governing sequence evolution. The SEquential RESampling (“SERES”) framework was previously proposed to relax the simplifying assumption of i.i.d. input observations. SERES resampling takes the form of random walks on an input of either aligned or unaligned biomolecular sequences. This study introduces the first application of SERES random walks on aligned sequence inputs and is also the first to demonstrate the utility of SERES as a data perturbation technique to yield improved statistical estimates. We focus on the classical problem of recombination-aware local genealogical inference. We show in a simulation study that coupling SERES resampling and re-estimation with recHMM, a hidden Markov model-based method, produces local genealogical inferences with consistent and often large improvements in terms of topological accuracy. We further evaluate method performance using an empirical HIV genome sequence dataset.Keywords
This publication has 19 references indexed in Scilit:
- Estimating Divergence Time and Ancestral Effective Population Size of Bornean and Sumatran Orangutan Subspecies Using a Coalescent Hidden Markov ModelPLoS Genetics, 2011
- Accurate Detection of Recombinant Breakpoints in Whole-Genome AlignmentsPLoS Computational Biology, 2009
- Recombination rate estimation in the presence of hotspotsGenome Research, 2007
- A Fine-Scale Map of Recombination Rates and Hotspots Across the Human GenomeScience, 2005
- Approximating the coalescent with recombinationPhilosophical Transactions B, 2005
- A Structural EM Algorithm for Phylogenetic InferenceJournal of Computational Biology, 2002
- Detection of Recombination in DNA Multiple Alignments with Hidden Markov ModelsJournal of Computational Biology, 2001
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical AccuracyStatistical Science, 1986
- Bootstrap Methods: Another Look at the JackknifeThe Annals of Statistics, 1979