rtREV: An Amino Acid Substitution Matrix for Inference of Retrovirus and Reverse Transcriptase Phylogeny

Abstract
Retroviral and other reverse transcriptase (RT)-containing sequences may be subject to unique evolutionary pressures, and models of molecular sequence evolution developed using other kinds of sequences may not be optimal. Here we develop and present a new substitution matrix for maximum likelihood (ML) phylogenetic analysis which has been optimized on a dataset of 33 amino acid sequences from the retroviral Pol proteins. When compared to other matrices, this model (rtREV) yields higher log-likelihood values on a range of datasets including lentiviruses, spumaviruses, betaretroviruses, gammaretroviruses, and other elements containing reverse transcriptase. We provide evidence that rtREV is a more realistic evolutionary model for analyses of the pol gene, although it is inapplicable to analyses involving the gag gene.