Abstract
The performance of ensemble prediction systems (EPSs) is investigated by examining the probability distribution of 500-hPa geopotential height over Europe. The probability score (or half Brier score) is used to evaluate the quality of probabilistic forecasts of a single binary event. The skill of an EPS is assessed by comparing its performance, in terms of the probability score, to the performance of a reference probabilistic forecast. The reference forecast is based on the control forecast of the system under consideration, using model error statistics to estimate a probability distribution. A decomposition of the skill score is applied in order to distinguish between the two main aspects of the forecast performance: reliability and resolution. The contribution of the ensemble mean and the ensemble spread to the performance of an EPS is evaluated by comparing the skill score to the skill score of a probabilistic forecast based on the EPS mean, using model error statistics to estimate a probability distribution. The performance of the European Centre for Medium-Range Weather Forecasts (ECMWF) EPS is reviewed. The system is skillful (with respect to the reference forecast) from +96 h onward. There is some skill from +48 h in terms of reliability. The performance comes mainly from the contribution of the ensemble mean. The contribution of the ensemble spread is slightly negative, but becomes positive after a calibration of the EPS standard deviation. The calibration improves predominantly the reliability contribution to the skill score. The calibrated EPS is skillful from +72 h onward. The impact of ensemble size on the performance of an EPS is also investigated. The skill score of the ECMWF EPS decreases steadily with reducing numbers of ensemble members and the resolution is particularly affected. The impact is mainly due to the ensemble spread contributing negatively to the skill. The ensemble mean contribution to the skill decreases marginally when reducing the ensemble size up to 11 members. The performance of the U.S. National Centers for Environmental Prediction (NCEP) EPS is also reviewed. The NCEP EPS has a lower skill score (vs a reference forecast based on its control forecast) than the ECMWF EPS especially in terms of reliability. This is mainly due to the smaller spread of the NCEP EPS contributing negatively to the skill. On the other hand, the NCEP and ECMWF ensemble means contribute similarly to the skill. As a consequence, the performance of the two systems in terms of resolution is comparable. The performance of a poor man’s EPS, consisting of the forecasts of different NWP centers, is discussed. The poor man’s EPS is more skillful than either the ECMWF EPS or the NCEP EPS up to +144 h, despite a negative contribution of the spread to the skill score. The higher skill of the poor man’s EPS is mainly due to a better resolution.