Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 2014-15
11 February 2019
Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, Edmunds WJ
PLoS Computational Biology (2019) 15(2):e1006785
Abstract Real-time forecasts based on mathematical models can inform critical decision-making during infectious disease outbreaks. Yet, epidemic forecasts are rarely evaluated during or
after the event, and there is little guidance on the best metrics for assessment. Here, we propose an evaluation approach that disentangles different components of forecasting ability
using metrics that separately assess the calibration, sharpness and bias of forecasts. This
makes it possible to assess not just how close a forecast was to reality but also how well
uncertainty has been quantified. We used this approach to analyse the performance of
weekly forecasts we generated in real time for Western Area, Sierra Leone, during the
2013–16 Ebola epidemic in West Africa. We investigated a range of forecast model variants
based on the model fits generated at the time with a semi-mechanistic model, and found
that good probabilistic calibration was achievable at short time horizons of one or two weeks
ahead but model predictions were increasingly unreliable at longer forecasting horizons.
This suggests that forecasts may have been of good enough quality to inform decision making based on predictions a few weeks ahead of time but not longer, reflecting the high level
of uncertainty in the processes driving the trajectory of the epidemic. Comparing forecasts
based on the semi-mechanistic model to simpler null models showed that the best semimechanistic model variant performed better than the null models with respect to probabilistic calibration, and that this would have been identified from the earliest stages of the outbreak. As forecasts become a routine part of the toolkit in public health, standards for evaluation of performance will be important for assessing quality and improving credibility of mathematical models, and for elucidating difficulties and trade-offs when aiming to make the most useful and reliable forecasts.