Impact of Ensemble Size on Ensemble Prediction

Abstract
The impact of ensemble size on the performance of the European Centre for Medium-Range Weather Forecasts ensemble prediction system (EPS) is analyzed. The skill of ensembles generated using 2, 4, 8, 16, and 32 perturbed ensemble members are compared for a period of 45 days—from 1 October to 15 November 1996. For each ensemble configuration, the skill is compared with the potential skill, measured by randomly choosing one of the 32 ensemble members as verification (idealized ensemble). Results are based on the analyses of the prediction of the 500-hPa geopotential height field. Various measures of performance are applied: skill of the ensemble mean, spread–skill relationship, skill of most accurate ensemble member, Brier score, ranked probability score, relative operating characteristic, and the outlier statistic. The relation between ensemble spread and control error is studied using L2, L8, and L norms to measure distances between ensemble members and the control forecast or the verification. It is argued that the supremum norm is a more suitable measure of distance, given the strategy for constructing ensemble perturbations from rapidly growing singular vectors. Results indicate that, for the supremum norm, any increase of ensemble size within the range considered in this paper is strongly beneficial. With the smaller ensemble sizes, ensemble spread does not provide a reliable bound on control error in many cases. By contrast, with 32 members, spread provides a bound on control error in nearly all cases. It could be anticipated that further improvement could be achieved with higher ensemble size still. On the other hand, spread–skill relationship was not consistently improved with higher ensemble size using the L2 norm. The overall conclusion is that the extent to which an increase of ensemble size (particularly from 8 to 16, and 16 to 32 members) improves EPS performance, is strongly dependent on the measure used to assess performance. In addition to the spread–skill relationship, the measures most sensitive to ensemble size are shown to be the skill of the best ensemble member (particularly when evaluated on a point-wise basis) and the outlier statistic.