Decompositions of Proper Scores

Preprint
Abstract
Scoring rules are an important tool for evaluating the performance of probabilistic forecasts. A popular example is the Brier score, which allows for a decomposition into terms related to the sharpness (or information content) and to the reliability of the forecast. This feature renders the Brier score a very intuitive measure of forecast quality. In this paper, it is demonstrated that all strictly proper scoring rules allow for a similar decomposition into reliability and sharpness related terms. This finding underpins the importance of proper scores and yields further credence to the practice of measuring forecast quality by proper scores. Furthermore, the effect of averaging multiple probabilistic forecasts on the score is discussed. It is well known that the Brier score of a mixture of several forecasts is never worse that the average score of the individual forecasts. This property hinges on the convexity of the Brier score, a property not universal among proper scores. Arguably, this phenomenon portends epistemological questions which require clarification.