This entry reviews early forecasting of the COVID pandemic in the context of forecast accuracy and epidemic and pandemic forecasting.
The COVID-19 Pandemic has been a unique experience for many people in the current era. While there have been epidemics and pandemics, they have been regionally isolated, relatively mild, or primarily associated with particular demographic groups. The AIDS pandemic and previous SARs pandemics, while world-wide, affect a limited population. The Ebola outbreaks, while severe, have affected a limited geographical area. Living adults in much of the world may remember the poliomyelitis epidemics of the mid-20th century, but even it is comparatively less severe when considering the combination of health severity and number of people affected. The most likely comparative pandemic is the 1918 influenza pandemic. COVID-19 was first identified in late 2019 and became recognized as a worldwide pandemic in early 2020. By March 2020 it was a source of concern for the entire planet. It also became the frequent subject of forecasts, some of which soon became questioned.[1][2][3]
There has been extensive research on forecast accuracy across many domains – although heavily weighted toward finance and related matters.[4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19] These studies often focus on technical issues such as the precise error measure to consider or the comparative accuracy of various approaches. One well-known set of studies, the M-competitions, has compared the accuracy of numerous time series forecast methods.[20][21][22][23] A well-established practice is to compare multiple forecasts and examining variation by forecaster characteristics.[24][25][26][27] In recent years, there has been evaluation of short-term disease forecasting, such as influenza and dengue.[28][29][30]
The COVID pandemic led to a variety of forecasting challenges. First, there was a poor understanding of factors such as the transmission rate, the virus's method of spread, and the amount of time the virus survives in different environments. Second, any significant precedence on social distancing outcomes and the general population’s compliance with shutdowns was not available. The forecasting models now have relatively much more information and robust assumptions than they had access to in the initial months of the coronavirus outbreak. Table 1 summarizes the major forecasts for the United States that were being aggregated as part of the early ensemble forecasting efforts of the Center for Disease Control and Prevention (CDC). Some of these forecasts were epidemiological, and some were purely statistical forecasts, based on exponential smoothening and time series of observed trends.
The epidemiological projections relied on a variety of methods, including using simulations based on the behavior of earlier strains of the coronavirus family, like SARS-CoV-1 and MERS, using the transmission data from China, Italy, and Spain, and increasingly the information gained from the U.S. transmission behavior and social distancing compliance. Kissler, Tedijanto, Goldstein, Grad, and Lipsitch[31] used a medical model of transmission behavior, based on the immunity, cross-immunity, and seasonality for Hcov-OC43 and HcoV-HKU1, to suggest that a seasonal resurgence is the most likely scenario, requiring intermittent social distancing through 2022 and resurgences as late as 2024. Another study used early data from Hubei province in China. between 11 January 2020 and 10 February 2020 and predicted that the cumulative case count in China by 29 February could reach 180,000 with a lower bound of 45,000 cases.[32] The reported cases were less than the forecasted possibly because of underreporting, effective non-pharmaceutical interventions, or flawed model assumptions. The epidemiological models rely on various scientific assumptions related to the behavior of viruses, underlying health conditions and immunity, availability of health infrastructure, etc., which may pose a significant challenge for forecasting in the initial days of a new virus, such as COVID-19. However, as data becomes available, employing time series or purely statistical forecasting models could also be effective. For example, Petropoulos and Makridakis[33] used exponential smoothening with multiplicative error and multiplicative trend components to forecast the trajectory of COVID-19 outcomes. The forecasts released and their subsequent follow-up on social media, typically showed over-forecasting in cases and deaths, but the actual values were within the 50% prediction level, except in the first round.[34] Castle, Doornik, and Hendry[35][36] used local averaged time trend estimation that assumed no seasonality, and they argue that, in the short run, their forecasts outperformed the epidemiological forecasts, such as the ones from Imperial College. A probabilistic model from the Los Alamos National Laboratory in New Mexico compared the forecasts to the actuals and reported fairly robust coverage of the forecasts for three-week periods following their releases.[37]
Though the statistical forecasts have immense appeal. since they can be created in real-time and can forecast micro-level patterns, for example, state or county-level trends, they have also come under criticism. Jewell, Lewnard, and Jewell[38] raised several concerns that warrant a careful approach toward statistical models in the case of epidemics. They highlighted that epidemic curves may not follow a normal distribution, and curves may fit early data in various ways, which may change as the epidemic progresses, for example, a second wave may occur and change things. They suggested that such models can be helpful for short-term predictions, but, otherwise, extreme caution has to be exercised, a point that is reaffirmed by Holmdahl and Buckee.[1] They argued that, for long-term outcomes, only mechanistic models, like SEIR (Susceptible–Exposed–Infectious–Recovered) models, are reliable—many of the forecasts listed in Table 1 used SEIR models.
Some recent studies have examined the effectiveness of different models and also compared the individual models to ensemble forecasts. (2022) evaluated the individual and ensemble forecasts and found that ensemble forecasts outperformed any individual forecasts, using the data from more than 90 different forecasting agencies at https://covid19forecasthub.org/. Marchant et al.[2] evaluated the forecast accuracy of IHME data and found that IHME data underestimated mortality, and the results did not improve over time. Perone[39] examined the efficacy of hybrid models against a range of technical forecasting models, using Italian Ministry of Health data, in early 2020, and found that hybrid models were better at capturing the pandemic’s linear, non-linear and seasonal patterns and significantly outperformed single time series models. Wang et al.[40] used the data from the second wave of the pandemic in India and the United States. They found that the ARIMA model had the best fit for India and the ARIMA-GRNN model had the best fit for the United States. Bracher et al.[41] undertook an evaluation of thirteen forecasts for Germany and Poland during the ten weeks of the second wave and found considerable heterogeneity in both point estimates and the forecasts concerning spread. Pathak and Williams[42] examined forecast errors for two SEIR models and two time series models and found suggestive evidence that the SEIR models performed better over the long run, but the time series models performed better over shorter horizons. Ioannidis, Cripps, and Tanner[43] argued that the COVID-19 pandemic highlighted the weakness of epidemic forecasting, and that when forecasts and forecast errors could determine the strength of policy measures, such as the implementation of lockdowns, they should receive closer scrutiny. This study adds to this growing literature and undertakes a comparative evaluation of two sets of statistical and epidemiological models, using trend-based comparison, and a set of forecast accuracy measures discussed in the next section. The findings have relevance for the practice of ensemble forecasting and the study of short-term forecasting of infectious diseases.
Table 1. Selected COVID-19 Forecasts, Methodology, Assumptions.
No. |
Models |
Method |
Assumptions |
Webpage |
1. |
Institute of Health Metrics and Evaluation (IHME) |
Combination of Mechanistic transmission model and curve-fitting approach |
Adjusted to differences in mobility. |
https://covid19.healthdata.org/united-states-of-america |
2. |
Columbia University |
Metapopulation SEIR model |
Accounts for social distancing. |
https://columbia.maps.arcgis.com/apps/webappviewer/index.html?id=ade6ba85450c4325a12a5b9c09ba796c |
3. |
Auquan Data Science |
SEIR Model |
No assumption about interventions. |
https://covid19-infection-model.auquan.com/ |
4. |
COVID-19 Simulator Consortium |
SEIR Model |
20% increase in contact rates after lifting statistics at home orders. |
https://www.covid19sim.org/team |
5. |
Georgia Technology Authority |
Deep Learning |
Assumes effects of interventions embedded in the data. |
https://www.cc.gatech.edu/~badityap/covid.html |
6. |
Imperial College, London |
Ensembles of mechanistic transmission models |
No specific assumptions about the interventions. |
https://mrc-ide.github.io/covid19-short-term-forecasts/index.html |
7. |
John Hopkins University |
Stochastic Metapopulation SEIR model |
Assumes reduction in effectiveness of mitigation after lifting shelter-in-place. |
https://github.com/HopkinsIDD/COVIDScenarioPipeline |
8. |
Los Alamos National Laboratory |
Statistical dynamic growth model accounting for population susceptibility |
Assumes the NPIs would continue. |
https://covid-19.bsvgateway.org/ |
9. |
Massachusetts Institute of Technology |
SEIR Model |
Assumes continuation of present interventions. |
https://www.covidanalytics.io/projections |
10. |
Northeastern University |
Metapopulation, age structured SLIR model |
Assumes continuation of social distancing policies. |
https://covid19.gleamproject.org/ |
11 |
Iowa State University |
Nonparametric spatiotemporal model |
No specific assumptions related to interventions. |
http://www.covid19dashboard.us/ |
12 |
Predictive Science Inc. |
Stochastic SEIRX model |
Assumes that current interventions would not change. |
https://github.com/predsci/DRAFT |
13 |
US Army Engineer Research and Development Center |
SEIR mechanistic model |
Projections assume that interventions would not change. |
https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html |
14 |
University of California, Los Angeles |
Modified SEIR Model |
Projections assume that interventions would not change. |
https://covid19.uclaml.org/ |
15 |
University of Texas, Austin |
Nonlinear Bayesian hierarchical regression with a negative-binomial model |
Estimate the extent of social distancing, using mobile phone geolocation data. Does not assume changes in social distancing during the forecast period. |
https://covid-19.tacc.utexas.edu/projections/ |