Subjective Verification of Numerical Models as a Component of a Broader Interaction between Research and Operations

Abstract
Systematic subjective verification of precipitation forecasts from two numerical models is presented and discussed. The subjective verification effort was carried out as part of the 2001 Spring Program, a seven-week collaborative experiment conducted at the NOAA/National Severe Storms Laboratory (NSSL) and the NWS/Storm Prediction Center, with participation from the NCEP/Environmental Modeling Center, the NOAA/Forecast Systems Laboratory, the Norman, Oklahoma, National Weather Service Forecast Office, and Iowa State University. This paper focuses on a comparison of the operational Eta Model and an experimental version of this model run at NSSL; results are limited to precipitation forecasts, although other models and model output fields were verified and evaluated during the program. By comparing forecaster confidence in model solutions to next-day assessments of model performance, this study yields unique information about the utility of models for human forecasters. It is shown that, when averaged over many forecasts, subjective verification ratings of model performance were consistent with preevent confidence levels. In particular, models that earned higher average confidence ratings were also assigned higher average subjective verification scores. However, confidence and verification scores for individual forecasts were very poorly correlated, that is, forecast teams showed little skill in assessing how “good” individual model forecasts would be. Furthermore, the teams were unable to choose reliably which model, or which initialization of the same model, would produce the “best” forecast for a given period. The subjective verification methodology used in the 2001 Spring Program is presented as a prototype for more refined and focused subjective verification efforts in the future. The results demonstrate that this approach can provide valuable insight into how forecasters use numerical models. It has great potential as a complement to objective verification scores and can have a significant positive impact on model development strategies.