Abstract
A new statistical method called the phylogenetic regression is proposed that applies multiple regression techniques to cross-species data. It allows continuous and categorical variables to be tested for and controlled for. The new method is valid despite the problem that phylogenetically close species tend to be similar, and is designed to be used when information about the phylogeny is incomplete. Information about the phylogeny of the species is assumed to be available in the form of a working phylogeny, which contains multiple nodes representing ignorance about the order of splitting of taxa. The non-independence between species is divided into that due to recognized phylogeny, that is, to phylogenetic associations represented in the working phylogeny; and that due to unrecognized phylogeny. The new method uses one linear contrast for each higher node in the working phylogeny, thus applying the ‘radiation principle’. For binary phylogenies the method is similar to an existing method. A criterion is suggested in the form of a simulation test for deciding on the acceptability of proposed statistical methods for analysing cross-species data with a continuous y-variable. This criterion is applied to the phylogenetic regression and to some other methods. The phylogenetic regression passes this test; the other methods tested fail it. Arbitrary choices have to be made about the covariance structure of the error in order to implement the method. It is argued that error results from omitted but relevant variables, and the implications for those arbitrary choices are discussed. One conclusion is that the dates of splits between taxa, even supplemented by rates of neutral gene evolution, do not provide the ‘ true ’ covariance structure. A pragmatic approach is adopted. Several analytical results about the phylogenetic regression are given, without proof, in a mathematical appendix. A computer program has been written in GLIM to implement the phylogenetic regression, and readers are informed how to obtain a copy.