Abstract
Cavender (1978) introduced a model of an evolutionary branching process on a sequence of characters, where the characters take either of two states with symmetric probabilities of change between them. From this model, with a given evolutionary tree T and a character change probability pe for each edge e of T, we show how to derive some properties of the resulting sequences and distance measures between pairs of taxa. These can be used to test the effectiveness of current algorithms for recovering T, such as parsimony or distance methods. The relationships are described in terms of two matrices of exponential order. Although the matrices involved are large, they have simple inverses, so the initial conditions of the model can be recovered from these predicted properties. Assuming that some observed data are good approximations to the theoretical data derived from Cavender's model on a tree T, we can use a least squares fit estimate to find T, together with the probabilities pe. We will refer to the inferred tree with the probabilities p;, as the “closest tree” for that data. This provides us with a new criterion for inferring evolutionary trees from sequence data, which has a simple algorithm for its computation and overcomes some of the drawbacks of some earlier criteria. These relationships are illustrated by examples of hypothetical and real data, showing how the calculations are performed in the case of n = 4 taxa.