Abstract
Null hypothesis significance tests are commonly used to provide a link between empirical evidence and theoretical interpretation. However, this strategy is prone to the "p-value fallacy" in which effects and interactions are classified as either "noise" or "real" based on whether the associated p value is greater or less than .05. This dichotomous classification can lead to dramatic misconstruals of the evidence provided by an experiment. For example, it is quite possible to have similar patterns of means that lead to entirely different patterns of significance, and one can easily find the same patterns of significance that are associated with completely different patterns of means. Describing data in terms of an inventory of significant and nonsignificant effects can thus completely misrepresent the results. An alternative analytical technique is to identify competing interpretations of the data and then use likelihood ratios to assess which interpretation provides the better account. Several different methods of calculating the likelihood ratios are illustrated. It is argued that this approach satisfies a principle of "graded evidence," according to which similar data should provide similar evidence.