In recent years, substantial progress has been achieved in the domain of atmospheric predictability and forecasting, harnessing both traditional partial differential equation (PDE)-based methods and advanced artificial intelligence (AI) technologies. Drawing on our recent comprehensive review of predictability studies, this investigation explores the potential for extending the two-week predictability limit initially proposed in the 1960s. While PDE-physics-based systems have consistently provided valuable insights, the emergence of AI-powered models, particularly those employing deep learning and transformer-based architectures, has paved the way for extending prediction horizons. These AI models have demonstrated performance that is comparable to, or even surpasses, traditional methods in short-term forecasts (3-14 days) and hold significant promise for addressing the challenges of subseasonal prediction. The synergy of AI and traditional approaches also underscores the potential for cost-effective, long-range weather predictions, with reports indicating promising predictions beyond 30 days. Furthermore, the development of generalized Lorenz models, incorporating time-varying parameters, has deepened our understanding of the coexistence of chaotic and regular behaviors with distinct predictability, challenging the conventional perception of weather systems as purely chaotic. This dual nature introduces fresh perspectives on long-term predictability and regional dependencies, such as seasonal variations and blocking patterns. In addition to reviewing recent advancements, this study proposes future research directions aimed at enhancing predictive accuracy and further exploring the limits of predictability in the realms of both weather and climate modeling. The authors’ recent study (https://www.mdpi.com/2073-4433/15/7/837 ) offers a review of the origins of the two-week predictability limit and recent advancements in predictability and prediction using PDE-based, AI-powered, and idealized chaotic models. To help readers quickly grasp the concepts presented in the latter part of the study, the following discussions are adapted from Sections 3 and 4 of the afore-mentioned work.
To indirectly support the concept of the two-week predictability limit, the saturation time has been employed as a measure across various time scales for predictability estimates. However, instances of saturation times exceeding two weeks have been documented (Magnusson and Kallen 2013 [1]; Zagar and Szunyogh 2020 [2]). For instance, produced by the extended version of the Lorenz 1965 model (Lorenz 1965) [3], Figure 1 of Krishnamurthy (2019) [4] displayed a saturation time of about 100 days.
Recent studies have also reported reasonable predictions at time scales longer than two weeks (e.g., Liu et al. 2009 [5]; Shen et al. 2010, 2011 [6, 7]; Judt 2018 [8]; Krishnamurthy 2019 [4]). For example, applying ensemble forecasts to simulate stratospheric sudden warming (Mukougawa and Hirooka 2004) [9], Mukougawa et al. (2005) [10] reported a lead time of more than two weeks. By examining the dependence of predictability on regions, Judt (2020) [11] suggested that the tropics have longer predictability than the middle latitudes and polar regions (tropics > 20 days). Using an atmosphere–ocean coupled model (or a stand-alone model), Mishra et al. (2021) [12] reported a predictability limit of 22 days (or 20 days) for Indian monsoon rainfall.
Compared to PDE-physics-based methods, machine learning (ML), or more broadly, artificial intelligence (AI) methods have shown promise in improving weather predictions (Weyn et al. 2019, 2020, 2021 [13-15]; Rasp and Thuerey 2021 [16]; Pathak et al. 2022 [17]; Bi et al. 2023 [18]; Bonev et al. 2023 [19]; Chen, Han, et al. 2023 [20]; Chen, Zhong et al. 2023 [21]; Nguyen. et al. 2023 [22]; Lam et al. 2023 [23]; Selz and Craig 2023 [24]; Watt-Meyer et al. 2023 [25]; Bach et al. 2024 [26]; Bouallègue et al. 2024 [27]; Li et al. 2024 [28]). As shown in Table 1, these AI-powered models were trained using the ERA5 reanalysis dataset (Hersbach et al. 2018, 2020) [29, 30] and CMIP6 data (Eyring et al. 2016 [31]) and assessed using various metrics including RMSE, ACC, Continuous Ranked Probability Score (CRPS), Temporal Anomaly Correlation Coefficient (TCC), Ranked Probability Skill Score (RPSS), Brier Skill Score (BSS), and bivariate correlation (COR).
As summarized in Table 1, by applying deep convolutional neural networks (CNNs), Weyn et al. (2019) [13] reported lead times of 14 days. More importantly, recent advances in AI technology, particularly transformer technology (e.g., Vaswani et al. 2017) [32] and its variants, including the “vision transformer” (Dosovitskiy et al. 2020 [33]), have offered significant opportunities to reduce the cost of weather predictions and revisit the predictability limit. Table 1 lists major AI-powered systems, most of which were published in 2023 and 2024 following the widespread recognition of transformer technology due to its major application in ChatGPT. Among the listed AI-powered systems, as compared to PDE-physics-based systems, all produced comparable or slightly better predictions for conventional short-term forecasts (3–14 days). Three studies have attempted to perform simulations at subseasonal or larger scales. Among the three studies, the ClimX system was reported in a conference article. The enhanced Fu-Xi system (and its base version) was documented in a preprint article (and journal article). In the 3rd study, the hybrid dynamical and data-driven approach was applied by Bach et al. (2024, PNAS) [26] to successively demonstrate the potential for improving subseasonal monsoon prediction. As derived from their study, Figure 1 displays a correlation above 0.5 over a 46-day period in two predictions.
Table 1. A list of major AI-powered systems.
Study |
Model’s Name |
AI Technology |
Data |
Simulation Length |
Evaluation Metric |
Remark |
Weyn et al. (2020) [14] |
Deep Learning Weather Prediction (DLWP) |
CNN |
ERA5, 1979–2018, 2° |
up to 7 days |
RMSE, ACC |
|
Weyn et al. (2021) [15] |
|
CNN |
ERA5, 1979–2018, 1.4° |
up to 6 weeks |
RMSE, ACC, Continuous Ranked Probability Score (CRPS) |
|
Rasp and Thuerey (2021) [16] |
WeatherBench ResNet |
Residual Neural Network (ResNet) |
ERA5, 1979–2018; CMIP6, climate model simulations |
up to 5 days |
RMSE, ACC |
|
Bi et al. (2023) [18] |
Pangu-Weather |
(modified) Vision Transformer |
ERA5, 1979–2017, 2.5° |
up to 7 days |
RMSE, ACC |
|
Selz and Craig (2023) [24] |
the same |
the same |
the same |
up to 72 h |
RMSE, ACC |
study butterfly effect |
Bouallègue et al. (2024) [27] |
the same |
the same |
the same |
up to 10 days |
the same |
in an operational-like context |
Lam et al. (2023) [23] |
GraphCast |
Graph Neural Network (GNN) |
ERA5, 1979–2018, 2.5° |
up to 14 days |
RMSE, ACC |
developed by Google |
Pathak et al. (2022); Bonev et al. (2023) [17, 18] |
FourCast Net |
Vision Transformer with Fourier Neural Operators |
ERA5, 1979–2018, 2.5° |
up to 1 or 2 weeks |
ACC |
manuscript posted; sponsored by Nvidia |
Watt-Meyer et al. (2023) [25] |
ACE |
the same |
the same FVGFS |
10 years |
RMSE, time-mean RMSE |
ACE stands for AI2 Climate Emulator |
Nguyen et al. (2023) [22] |
CimaX |
Vision Transformer |
CMIP6, 1850-current, various; ERA5, 1979–2018, 2.5° |
up to 1 month |
RMSE, ACC |
sponsored by Microsoft |
Chen, Zhong, et al. (2023) [21] |
FuXi |
modified Vision Transformer |
ERA5, 1979–2018, 2.5° |
up to 15 days |
RMSE, ACC, CRPS |
|
Li et al. (2024) [28] |
FuXi-S2S |
Enhanced FuXi base model with other modules |
ERA5, 1950–2021, 1.5° |
up to 42 days |
TCC, RPSS, BSS, COR |
manuscript posted 14 Feb 2024 |
Chen, Han, et al. (2023) [20] |
FengWu |
a cross-modal fusion transformer |
ERA5, 1979–2018, 2.5° |
up to 14 days |
RMSE, ACC |
|
Bach et al. (2024) [26] |
hybrid dynamical and data-driven methods |
EOF, Neural network architecture, Ensemble Oscillation Correction (EnOC) |
ERA5, 1979–2018, 2.5°; IMD rainfall, 1901–2016 |
up to 46 days |
RMSE, ACC, Bivariate Correlation Coefficient |
|
Further studies utilizing AI-powered systems include, but are not limited to, works by Bodnar et al. (2024) [34], Kochhov et al. (2024) [35], Lang et al. (2024) [36], Mardani et al. (2023) [37], Price et al. (2024) [38], Vonich and Hakim (2024) [39], and Wu and Xue (2024) [40], all demonstrating the swift advancements facilitated by AI. Notably, Wu and Xue (2024) [40] conducted a thorough review of AI-based models from a developmental viewpoint. Lang et al. (2024) [36] described the ECMWF’s AI-based system. Kochhov et al. (2024) [35] introduced a novel model that integrates PDEs and ML to generate ensemble weather forecasts more accurately, as assessed by CRPS, than the existing ECMWF model. Vonich and Hakim (2024) [39] reported a predictability of 23 days for the Pacific Northwest heatwave.
Although the above systems have not yet established a new predictability horizon, our suggestion, viewing the two-week limit as a predictability hypothesis, makes it easier for scientists to understand why the above promising results with specific weather systems are possible and encourage attempts for proving or disproving the hypothetical two-week predictability limit. More importantly, AI-powered methods provide alternative, cost-effective approaches.
Figure 1. Time-varying correlation coefficient between predicted and observed monsoon intraseasonal oscillation (MISO) modes, for forecasts initiated on the 1st of July, August, and September, spanning the years 2008 to 2016. (Bach et al. (2024) [26])
By guiding the choice of numerical results from real-world models, the concept of Lorenz’s chaos, characterized by aperiodic features and instability, indeed, indirectly influenced the establishment of the two-week predictability limit. Specifically, the key characteristics of Lorenz’s 1963 and 1969 models provide evidence for the existence of a predictability limit. However, it is important to note that the underlying mechanisms or sensitivities that lead to finite predictability differ between these models (e.g., as discussed in Shen et al. 2021a [41], 2022a, c [42, 43]).
Lorenz’s 1963 model is characterized by its limited scale and chaotic nature, while the Lorenz 1969 model is closure-based, physically multiscale, mathematically linear, and numerically ill-conditioned. Furthermore, as elaborated on in Section 3.1.2 of Shen et al. 2022a [42], the Lorenz 1969 ill-conditioned system tends to easily capture numerical instability.
As discussed by Shen et al. (2023a) [44] and Shen (2023a) [45], most of Lorenz’s models did not incorporate spatial and time-varying “backgrounds.” For example, the Lorenz 1963 model utilized time-independent parameters and the Lorenz 1969 model applied a time-independent kinetic energy spectrum, as well as assumed homogeneity and isotropy. To address this limitation, we employed a generalized Lorenz model (Shen 2019a [46]; Shen et al. 2019 [47]) and applied a time-varying parameter that emulates the impact of slowly varying variables.
As illustrated by Shen et al. (2021a) [41], the Rayleigh parameter is set to a periodic function of time that allows different types of solutions; equations for the generalized Lorenz model are additionally provided in the Supplementary Materials of Shen et al. (2021a) [41]. Our findings revealed the coexistence of chaotic and non-chaotic properties, including nonlinear oscillations (i.e., limit cycle solutions), and the coexistence of rapidly and slowly varying solutions (Shen et al. 2021a) [41]. Such findings challenge the conventional view that the system is solely chaotic and suggest a revised view on the dual nature of chaos and order with coexisting short-term and long-term predictability. For example, the appearance of the theoretical nonlinear oscillations could provide a support to the existence of oscillations such as monsoon intraseasonal oscillation (MISO) for better predictability, as discussed above in Figure 1 (Bach et al. 2004) [26]. The concept of attractor coexistence also helped us uncover regional dependencies, such as blocking patterns and seasonal variations.
Furthermore, as depicted in Figure 5 of Shen et al. (2022c) [43], our generalized Lorenz model results suggest the possibility of regime transitions between regular and chaotic solutions. Both the previously mentioned studies and recent research (e.g., Zeng 2023) [48] support Lorenz’s 1997 updated perspective, suggesting a possibility for the coexisting long predictability of ENSO and short-term predictability (i.e., 2-week predictability, when applicable).
The “Next Generation Earth System Prediction: Strategies for Subseasonal to Seasonal Forecasts” report by the National Academy of Sciences (2016) [49] underscores the societal benefits of the broad implementation of subseasonal to seasonal (S2S) forecasts, which provide predictions ranging from two weeks to twelve months ahead. The report projects that S2S forecasts will become as commonplace as daily weather reports, playing a vital role in sectors such as agriculture, water management, and public health. These forecasts are pivotal in reducing risks associated with extreme weather events, thereby safeguarding lives and reducing economic losses. The report points out a significant disparity in the support for S2S forecasts compared to immediate weather predictions and long-term climate projections, and it identifies substantial challenges, including the need for forecasts that align more closely with the specific temporal and spatial requirements of users. It calls for collaborative efforts between physical and social scientists to enhance the applicability of S2S forecasts across various decision-making scenarios. Concurrently, as mentioned earlier, recent advancements in AI and ML have produced promising forecasts that extend beyond two weeks (refer to Figure 1) and have improved forecast customization through techniques such as downscaling, as demonstrated by Mardani et al. (2023) [37], highlighting the critical role of data scientists in this field.
Building on the insights from this report and previous research, we are optimistic that ongoing innovations in both practical and theoretical modeling, augmented by artificial intelligence, will continually enhance our understanding and prediction of weather phenomena. These advancements are especially beneficial for investigating pivotal phenomena such as the butterfly effect (model sensitivities), multiscale interactions, and multistability, which encompass feedback from numerically or physically small-scale processes, modulation by large-scale systems, and predictability that varies with different types of weather systems. To advance these efforts, we propose a series of research areas that would benefit from an integrated approach combining PDE-based methods and AI-enhanced techniques:
For additional details, please refer to Shen et al. (2024) [50].
To provide a clearer perspective on the two-week predictability limit, we have conducted a systematic study of Lorenz’s models and predictability, collectively suggesting the following statement:
“Much like Moore’s Law in the realm of computing, the predictability limit hypothesis, specifically the two-week predictability limit, is an empirical association based on practical modeling and idealized chaotic modeling from the 1960s. It stands as a limited set of observed findings and as a reasonable extrapolation from early modeling results during the 1960s, rather than constituting fundamental physics.”
The above, referred to as the Predictability Limit Hypothesis, summarizes the historical context of predictability research, encompassing real-world, theoretical models, and other approaches from the 1960s. Our reevaluation highlights quantitative estimates of two-week predictability using the Mintz–Arakawa model, as well as qualitative, finite predictability within Lorenz’s 1963 and 1969 models, under the leadership of Charney et al. This concept also aligns with Lorenz’s evolved perspective on predictability limits in the 1990s and 2000s.
Our studies, combined with our previous research (e.g., Shen et al. 2021a; 2022c [41, 43]), highlight cumulative advancements in both PDE-physics-based and AI-powered systems since the 1960s, which have shown promising results in long-term simulations. Although these simulations extend beyond the traditional two-week limit, they do not contradict the Predictability Limit Hypothesis based on 1960s models. This new concept helps explain why the practical capabilities of current models are still consistent with the major findings of finite predictability within Lorenz’s theoretical models.
Recognizing that weather prediction is a combined boundary-initial value problem imposes constraints on temporal changes. This concept can also extend to examine the predictability of seasonal, yearly, decadal, and longer climate predictions. The feasibility of this approach was illustrated by developing a unified weather and climate model, which led to successful short-term weather predictions in the early 2000s (e.g., Lin et al. 2003 [51]; Atlas et al. 2005 [52]; Shen et al. 2006a, b, 2010, 2011 [53, 54, 6, 7]). In comparison, AI-powered systems trained over several decades using ERA5 reanalysis and/or CMIP6 data can be viewed as weather-climate unified systems. Both weather and climate forecasts predict changes in the state of variables such as temperature and precipitation, then compute their averages over different time scales. With weather, this can involve hourly or daily averages, while climate focuses on yearly and longer-term statistics. Both forecasts essentially use the same mathematical framework (with climate involving more nonlinear interactions) within PDE-physics-based or AI-powered systems. The question remains whether and how the concepts discussed for short time scales can extend to climate prediction at longer time scales.