This entry reviews early forecasting of the COVID pandemic in the context of forecast accuracy and epidemic and pandemic forecasting.
The COVID-19 Pandemic has been a unique experience for many people in the current era. While there have been epidemics and pandemics, they have been regionally isolated, relatively mild, or primarily associated with particular demographic groups. The AIDS pandemic and previous SARs pandemics, while world-wide, affect a limited population. The Ebola outbreaks, while severe, have affected a limited geographical area. Living adults in much of the world may remember the poliomyelitis epidemics of the mid-20th century, but even it is comparatively less severe when considering the combination of health severity and number of people affected. The most likely comparative pandemic is the 1918 influenza pandemic. COVID-19 was first identified in late 2019 and became recognized as a worldwide pandemic in early 2020. By March 2020 it was a source of concern for the entire planet. It also became the frequent subject of forecasts, some of which soon became questioned.[1][2][3]
In the next section, we briefly discuss the study of forecast accuracy. In the second following section, we discuss the recent study of COVID-19 forecasts. Following that is a section on COVID-19 forecast accuracy. In the final section we define key terms.
There has been extensive research on forecast accuracy across many domains – although heavily weighted toward finance and related matters..[4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19] These studies often focus on technical issues such as the precise error measure to consider or the comparative accuracy of various approaches. One well-known set of studies, the M-competitions, has compared the accuracy of numerous time series forecast methods..[20][21][22][23] A well-established practice is to compare multiple forecasts and examining variation by forecaster characteristics.[24][25][26][27] In recent years, there has been evaluation of short-term disease forecasting, such as influenza and dengue.[28][29][30]
There are two major forms of forecasts of COVID-19. The primary form is epidemiological, that is forecasting built on knowledge of how epidemics spread. The second form is stochastic, that is, use of the statistical properties of time series data. Each had limits at the beginning of the epidemic. For the statistical methods, there simply wasn’t a time series in existence until the pandemic had progressed for some time. For the epidemiological methods th COVID pandemic led to a variety of forecasting challenges. First, there was a poor understanding of the virus’sfactors such as the transmission rate, itthe virus's method of spread, and the amount of time the virus survives in different environments. There was insufficient information on the effect ofSecond, any significant precedence on social distancing outcomes orand the general population’s compliance with shutdowns. As time passed, information deficits were replaced with current empirical data.
T was not available. The forecasting models now have relatively much more information and robust assumptions than they had access to in the epidemiological models rest on hypotheses about virus characteristics, the effects of underlying health conditions and immunity, the availability of health infrastructure, and sometimes other factors. To make tnitial months of the coronavirus outbreak. Table 1 summarizes the major forecasts for the United States that were being aggregated as part of the early ensemble forecasting efforts of the Center for Disease Control and Prevention (CDC). Some of these forecasts were epidemiological, and some were purely statistical forecasts, based on exponential smoothening and time series of observed trends.
Thes epidemiological projections, forecasters used relied on a variety of methods, including using simulations based on the behavior of earlier strains of the coronavirus family such as , like SARS-CoV-1. The simulations used current data on and MERS, using the transmission from data from China, Italy, China, aand Spain, as those data became available. Later,nd increasingly the information gained from the U.S. data on ttransmission behavior and social distancing compliance became available and was used. Kissler, Tedijanto, Goldstein, Grad, and Lipsitch[31] used a medical model of transmission behavior, based on the immunity, cross-immunity, and seasonality for Hcovarious common coronaviruses and found that there are likely -OC43 and HcoV-HKU1, to suggest that a seasonal resurgences up through 2023. They suggest a need for is the most likely scenario, requiring intermittent social distancing. Anastassopoulou, Russo, Tsakris, & Siettos[32] through 2022 and resurgences as late as 2024. Another study used early data from Hubei, province in China data from . between 11 January 2020 and 10 February 2020 and predicted between 45,000 and 180,000that the cumulative cases count in China by the e29 February could reach 180,000 with a lower bound of February45,000 cases.[32] The Rreported cases were less than the forecasted. Reasons for overestimation may have reflected an ineffective model. Alternatively, there may have been possibly because of underreporting or , effective non-pharmaceutical interventions. As shown in Table 1 there are many other models. None of these, or flawed model assumptions. The epidemiological models appear to have anticiprely on various scientific assumptions related the widespread active campaign to undermine advice provided from experts and governmental officials. Despite these shortcomings, Holmdahl and Buckee.[1]o the behavior of viruses, underlying health conditions and immunity, availability of health infrastructure, etc., which may pose a significant challenge for forecasting in the initial days of a new virus, such as COVID-19. However, as data rbecommend that for long-term outcomes the epidemiologicales available, employing time series or purely statistical forecasting models are the most reliable.
Predictive Science Inc. | ||||
Stochastic SEIRX model | ||||
Assumes that current interventions would not change. | ||||
https://github.com/predsci/DRAFT | ||||
13 |
US Army Engineer Research and Development Center |
SEIR mechanistic model |
Projections assume that interventions would not change. |
https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html |
14 |
University of California, Los Angeles |
Modified SEIR Model |
Projections assume that interventions would not change. |
https://covid19.uclaml.org/ |
15 |
University of Texas, Austin |
Nonlinear Bayesian hierarchical regression with a negative-binomial model |
Estimate the extent of social distancing, using mobile phone geolocation data. Does not assume changes in social distancing during the forecast period. |
https://covid-19.tacc.utexas.edu/projections/ |
Source |
Method |
Webpage |
Columbia University |
Metapopulation SEIR model |
https://columbia.maps.arcgis.com/apps/webappviewer/index.html?id=ade6ba85450c4325a12a5b9c09ba796c |
Auquan Data Science |
SEIR Model |
https://covid19-infection-model.auquan.com/ |
COVID-19 Simulator Consortium |
SEIR Model |
https://www.covid19sim.org/team |
John Hopkins University |
Stochastic SLIR model |
https://github.com/HopkinsIDD/COVIDScenarioPipeline |
Massachusetts Institute of Technology |
SEIR Model |
https://www.covidanalytics.io/projections |
Northeastern University |
Age structured SLIR model |
https://covid19.gleamproject.org/ |
US Army Engineer Research and Development Center |
SEIR mechanistic model |
https://www.cdc.gov/coronavirus/2019-ncov/covid-data/forecasting-us.html |
University of California, Los Angeles |
Modified SEIR Model |
https://covid19.uclaml.org/ |
Predictive Science Inc. |
Stochastic SEIRX model |
https://github.com/predsci/DRAFT |
Nevertheless, once serial data becomes available, standard statistical models can be developed and used to project the series direction. Two sorts of serial models are pure time-series methods such as exponential smoothing, and time series dependent models such as curve fitting. Petropoulos and Makridakiscould also be effective. For example, Petropoulos and Makridakis[33] used exponential smoothening with multo iplicative error and multiplicative trend components to forecast the trajectory of COVID-19 outcomes. They updated these and forecasts released the results from time-to-time overand their subsequent follow-up on social media. Their forecasts , typically over-estimated ofshowed over-forecasting in cases and deaths, but, after the first round, the the actual values were within athe 50% prediction intervallevel, except in the first round.[34] Castle, Doornik, and Hendry Castle, Doornik, and Hendry[35][36] used local averaged time trend estimation that assumed no seasonality. They assert that their forecasts outperformed the epidemiological models for shorter horizons. Los Alamos National Laboratory in New Mexico produced a probabilistic model.[37] They reported fairly robust accuracy for three-week periods following their forecast releases
J usewed ll, Lewnard, and Jewell[38]ocal averaged time trecommend caution when using statistical models to forecastnd estimation that assumed no seasonality, and they argue that, in the short run, their forecasts outperformed the epidemics. For example, early data may fit to curves in a variety of wayological forecasts, such as the ones from Imperial College. A probabilistic model from the Los Alamos National Laboratory in New Mexico compared the forecasts to the actuals and the fit may change as the epidemic prreported fairly robust coverage of the forecasts for three-week periods following their releases.[37]
Thougresses. A typical non-epidemic curve showh the statistical forecasts have immense appeal. since they can be created in real-time and can forecast micro-level patterns, for example, state or county-level trends, they have also come under criticism. Jewell, Lewnard, and Jewell[38] raised some variant of an S shaped grow which may then lead to gradual decline. However,everal concerns that warrant a careful approach toward statistical models in the case of epidemics. They highlighted that epidemic curves may not follow a normal distribution, and curves may fit early data in various ways, which may change as the epidemics can have multiple subsequent waves. While progresses, for example, a second wave may occur and change things. They suggested that such models can be helpful for short-term predictions, but may fail if not used with, otherwise, extreme caution.
W has to ben multiple forecasts are available, t exercised, a point that is reaffirmed by Holmdahl and Buckee.[1] They may all have some useful data. Forecast literature dating to the 1960s recommends combining suchrgued that, for long-term outcomes, only mechanistic models, like SEIR (Susceptible–Exposed–Infectious–Recovered) models, are reliable—many of the forecasts.[39][40][41] Thlis approach is now labeled enseted in Table 1 used SEIR models.
Somble forecasting.[42][43]recent studies The United States Center for Disease Control and Prevention (CDC) produced anave examined the effectiveness of different models and also compared the individual models to ensemble forecast. Their components. (2022) evaluated the individual and ensemble forecasts shown in Table 1 for epidemiological models and Table 2 for statisticaland found that ensemble forecasts outperformed any individual forecasts, using the data from more than 90 different forecasts.
Source |
Method |
Webpage |
Institute of Health Metrics and Evaluation (IHME) |
Combination of Mechanistic transmission model and curve-fitting approach |
https://covid19.healthdata.org/united-states-of-america |
Los Alamos National Laboratory |
Statistical dynamic growth model accounting for population susceptibility |
https://covid-19.bsvgateway.org/ |
Georgia Technology Authority |
Deep Learning |
https://www.cc.gatech.edu/~badityap/covid.html |
Imperial College, London |
Ensembles of mechanistic transmission models |
https://mrc-ide.github.io/covid19-short-term-forecasts/index.html |
Iowa State University |
Nonparametric spatiotemporal model |
http://www.covid19dashboard.us/ |
University of Texas, Austin |
Nonlinear Bayesian hierarchical regression with a negative-binomial model |
https://covid-19.tacc.utexas.edu/projections/ |
Turning agencies ato accuracy, https://covid19forecasthub.org/. Marchant et al.[2] evaluated the IHME forecast accuracy. They of IHME data and found that these modelsIHME data underestimated mortality, and the results did not improve over time. Cramer Peronet al.[4239] examined thensemble forecasts using data from more than 90 different forecasting agencies as posted at https://covid19forecasthub.org/. They found that ensemble efficacy of hybrid models against a range of technical forecasts outperformed any of the individual component forecasts. Uing models, using Italian Ministry of Health data, Perone[44]in examined the relative efficacy of hybrid models. Therly 2020, and found that hybrid models were found to better at captureing the pandemic’s linear, non-linear and seasonal patterns. They and significantly outperformed single time series models. FocWang et al.[40] used the data fronm the second wave iof the pandemic in India and the United States. Wang et al.[45] They found that the ARIMA model had the best fit for India and the ARIMA-GRNN model had the best fit for the United States. Bracher et al.[4641] undertook an evaluatedion of thirteen forecasts for Germany and Poland during the ten weeks of the second wave/ They and found considerable heterogeneity in both point estimates and the forecasts concerning spread. Pathak and Williams[4742] examined forecast errors for two SEIR models and two time series models and found suggestive evidence that the SEIR models performed better over the long run, but the time series models performed better over shorter horizons. Ioannidis, Cripps, and Tanner[4843] assrguertd that the COVID-19 pandemic shows that epidemic forecasting is weak. They suggest that better forecasts could strengthen policy responses such as the implementation of lockdowns.
AIDS: Acquirhighlighted the weakness of epided Immunodeficiency syndrome.
ARIMA Autoic foregrecassive Integrated Moving Average – a commonly used model for time series data andting, and that when forecasts
ARIMA-GRNN A variation of ARIMA model. Generalized Regression Neural Network (GRNN) model provides greater capability for non-linear fitting.
COVID 19: Cond forecast errona (CO) Virus (VI) Disease originating in the year 2019.
cs courve-fitting:ld Ddetermines the functional relationship between time and forecasted unit in the past to project the future trend
Deep Learning: A the strength of policy measures, subcategory of machine learning model.
Ebola: Shorh as t for Ebola Virus or Ebola Fever: A hemorrhagic (causing severe blood loss) Fever.
Ense imbple model: A model that combines multiple forecasts.
Expmentation of lockdonewntial smoothing: Smoothing technique for time series data using exponentially declining weights.
Gs, they should receive closeolocation: Geographic coordinates (typically latitude and longitudes) of place scrutiny.
Horizon: Theis future period for which a forecast is made.
Hybristudy add models: Models that use both epidemiological parameters and stochastic techniques.
IHME: to this growing Instlitute of Health Metrics and Evaluation, University of Washington. IHME forecasts were widely used by federal government in early months of the pandemic for policy decisions.
Local averature and undertakes a comparative evaluation of two seraged time trend estimation: Time series estimates adjusted for seasonality using moving average.
Mechanistic transmission: Es of statistical and epidemiological models, accounting for disease transmission dynamics, for e.g. SEIR models.
MERS: Middle Eusing trend-based compast respiratory syndrome, originated in 2012.
Nonlineison, and ar Bayesian hierarchical regression with a negative-binomial model: A statistical curve fitting approach with time evolving Gaussian curves (Link).
Nonparametrset of forecast accuracy measures disc spatiotemporal model: Nonparametric models for predicting transmission.
NPI: Non-pharmacussed in the next seutical intervention. For COVID 19, these include social distancing, lockdowns, mandatory use of face masks, and careful hand hygiene.
Ption. The findings have relevandemic: Similar to epidemic, but generally over a wider geographic area.
Probabilise for the practice model A model that outlines probabilities associated with outcomes rather than a fixed of ensemble forecast.
SARS-CoV-1: The first known instance of severe acute respiratory syndrome - Corona Virus. Originated 2002.
SARs-COVID 19: Severe acute respiratorg and the study syndrome - Corona virus disease originating in the year 2019 (synonym for COVID 19).
Seasonality: Regular pf short-term foreriodic fluctuations in serial data.
SEIR Model: A dastisease spread model that encapsulates the dynamics g of infection and progression through the us diseased state s.
Table 1. Susceptible (S), Exposed (E), Infectious (I), Recovered (R)
SElected COVIRXD-19 Model: A modified version of SEIR model
Serial datForeca: A series of data for the same units or over time. For most methods, the time units need to be of equal, or nearly equal, distance.
SLIR model: ts, Methodology, A type of disease transmission model Susceptible-Latent-Infected-Removed (SLIR).
Statistical dynamic growth: Time series models for sumprojections.
No. |
Models |
Method |
Assumptions |
Webpage |
1. |
Institute of Health Metrics and Evaluation (IHME) |
Combination of Mechanistic transmission model and curve-fitting approach |
Adjusted to differences in mobility. |
https://covid19.healthdata.org/united-states-of-america |
2. |
Columbia University |
Metapopulation SEIR model |
Accounts for social distancing. |
https://columbia.maps.arcgis.com/apps/webappviewer/index.html?id=ade6ba85450c4325a12a5b9c09ba796c |
3. |
Auquan Data Science |
SEIR Model |
No assumption about interventions. |
https://covid19-infection-model.auquan.com/ |
4. |
COVID-19 Simulator Consortium |
SEIR Model |
20% increase in contact rates after lifting statistics at home orders. |
https://www.covid19sim.org/team |
5. |
Georgia Technology Authority |
Deep Learning |
Assumes effects of interventions embedded in the data. |
https://www.cc.gatech.edu/~badityap/covid.html |
6. |
Imperial College, London |
Ensembles of mechanistic transmission models |
No specific assumptions about the interventions. |
https://mrc-ide.github.io/covid19-short-term-forecasts/index.html |
7. |
John Hopkins University |
Stochastic Metapopulation SEIR model |
Assumes reduction in effectiveness of mitigation after lifting shelter-in-place. |
https://github.com/HopkinsIDD/COVIDScenarioPipeline |
8. |
Los Alamos National Laboratory |
Statistical dynamic growth model accounting for population susceptibility |
Assumes the NPIs would continue. |
https://covid-19.bsvgateway.org/ |
9. |
Massachusetts Institute of Technology |
SEIR Model |
Assumes continuation of present interventions. |
https://www.covidanalytics.io/projections |
10. |
Northeastern University |
Metapopulation, age structured SLIR model |
Assumes continuation of social distancing policies. |
https://covid19.gleamproject.org/ |
11 |
Iowa State University |
Nonparametric spatiotemporal model |
No specific assumptions related to interventions. |
http://www.covid19dashboard.us/ |
12 |