An Application of Random Walk Resampling to Phylogenetic HMM Inference and Learning
Open Access
- 8 May 2020
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Nanobioscience
- Vol. 19 (3), 506-517
- https://doi.org/10.1109/tnb.2020.2991302
Abstract
Statistical resampling methods are widely used for confidence interval placement and as a data perturbation technique for statistical inference and learning. An important assumption of popular resampling methods such as the standard bootstrap is that input observations are identically and independently distributed (i.i.d.). However, within the area of computational biology and bioinformatics, many different factors can contribute to intra-sequence dependence, such as recombination and other evolutionary processes governing sequence evolution. The SEquential RESampling (”SERES”) framework was previously proposed to relax the simplifying assumption of i.i.d. input observations. SERES resampling takes the form of random walks on an input of either aligned or unaligned biomolecular sequences. This study introduces the first application of SERES random walks on aligned sequence inputs and is also the first to demonstrate the utility of SERES as a data perturbation technique to yield improved statistical estimates. We focus on the classical problem of recombination-aware local genealogical inference. We show in a simulation study that coupling SERES resampling and re-estimation with recHMM, a hidden Markov model-based method, produces local genealogical inferences with consistent and often large improvements in terms of topological accuracy. We further evaluate method performance using empirical HIV genome sequence datasets.Keywords
Funding Information
- U.S. National Science Foundation (CCF-1565719, CCF-1714417)
- U.S. National Science Foundation (DEB-1737898)
- U.S. National Science Foundation (IOS-1740874)
- Michigan State University
This publication has 10 references indexed in Scilit:
- Non-parametric and Semi-parametric Support Estimation Using SEquential RESampling Random Walks on Biomolecular SequencesPublished by Springer Science and Business Media LLC ,2018
- An HMM-Based Comparative Genomic Framework for Detecting Introgression in EukaryotesPLoS Computational Biology, 2014
- Estimating Divergence Time and Ancestral Effective Population Size of Bornean and Sumatran Orangutan Subspecies Using a Coalescent Hidden Markov ModelPLoS Genetics, 2011
- Accurate Detection of Recombinant Breakpoints in Whole-Genome AlignmentsPLoS Computational Biology, 2009
- Near Full-Length Sequence Analysis of a Unique CRF01_AE/B Recombinant from Kuala Lumpur, MalaysiaAIDS Research and Human Retroviruses, 2007
- Recombination rate estimation in the presence of hotspotsGenome Research, 2007
- Heads or Tails: A Simple Reliability Check for Multiple Sequence AlignmentsMolecular Biology and Evolution, 2007
- A Fine-Scale Map of Recombination Rates and Hotspots Across the Human GenomeScience, 2005
- Approximating the coalescent with recombinationPhilosophical Transactions B, 2005
- Estimating recombination rates from population-genetic dataNature Reviews Genetics, 2003