Monitoring and forecasting hospitalization rates are of essential significance to public health systems in understanding and managing overall healthcare deliveries and strategizing long-term sustainability. Early-stage prediction of hospitalization rates is crucial to meet the medical needs of numerous patients during emerging epidemic diseases such as COVID-19. Nevertheless, this is a challenging task due to insufficient data and experience. In addition, relevant existing work neglects or fails to exploit the extensive contribution of external factors such as news, policies, and geolocations. Herein, researchers demonstrate the significant relationship between hospitalization rates and COVID-19 infection cases. A transfer learning architecture with dynamic location-aware sentiment and semantic analysis (TLSS) is adapted to a new application scenario: hospitalization rate prediction during COVID-19. This architecture learns and transfers general transmission patterns of existing epidemic diseases to predict hospitalization rates during COVID-19. Researchers combine the learned knowledge with time series features and news sentiment and semantic features in a dynamic propagation process. Extensive experiments are conducted to compare the proposed approach with several state-of-the-art machine learning methods with different lead times of ground truth. The results show that TLSS exhibits outstanding predictive performance for hospitalization rates. Thus, it provides advanced artificial intelligence (AI) techniques for supporting decision-making in healthcare sustainability.
1. Introduction
Healthcare systems strive to tackle severe pressure and sustainability challenges due to changing priorities in widespread pandemics, such as COVID-19
[1][2][3]. In 2019, the Coronavirus (COVID-19) outbreak in Wuhan, China, rapidly spread to over 228 countries
[4][5][6]. This emerging infectious disease has become a pandemic with alarming scales in a short period. In January 2020, it was declared as a public health emergency of international concern by WHO
[7]. The Centers for Disease Control and Prevention (CDC) announced that more than 98 million confirmed cases and more than 1 million deaths have been recorded in the United States
[8]. This public health issue has forced the world to reconsider the existing sustainable strategy in healthcare, especially for responding to crises rapidly when a new pandemic breaks out
[9]. Despite the intense growth of research in COVID-19 and the current situation of various requests for healthcare services, relatively little progress has been made to improve healthcare sustainability
[10]. Healthcare sustainability refers to the ability of healthcare systems to meet present and future healthcare needs while simultaneously considering social, economic, and environmental factors
[11]. It involves the provision of high-quality healthcare services, efficient resource allocation, and the promotion of positive health outcomes
[12][13][14]. The current context requires effective and accurate AI technical support on big data analysis to derive meaningful information for decision-making to achieve the above purposes
[15].
2. Hospitalization Rate Forecasting
Early monitoring and forecasting of hospitalization rates provide valuable opportunities for health organizations and public authorities to adjust sustainability strategies. We aim to design an analytic framework using machine learning methods to accurately and effectively predict hospitalization rates during emerging pandemics (e.g., COVID-19). Existing research on healthcare analytics and forecasting has made great progress in various aspects. Mathematical methods, such as stochastic processes, Markov decision processes, and compartmental models, show great success in the theoretical analysis of macroscopic regularities of epidemic transmission, like the epidemic threshold and epidemic infection scale
[16]. Yet, the homogeneous assumption of data and minor groups of variables are insufficient to acquire the variety of factors related to epidemic transmission processes
[17]. Other statistical models such as autoregressive (AR), autoregressive moving average (ARMA), and seasonal auto-regressive integrated moving average with exogenous factors (SARIMAX) are convenient and straightforward for obtaining exact results in short-term time series analysis
[18][19][20]. Nevertheless, their performance decreases with relatively long-term forecasting because of the evolution of COVID-19 and the impact of multiple complex factors over time. Although deep learning approaches, such as deep neural networks (DNN), recurrent neural networks (RNN), and temporal fusion transformers (TFT), are burgeoning methods to learn temporal patterns from different perspectives
[21], these models must include or should be expanded to include social factors to predict healthcare performance indicators.
In general, several grand challenges lie in hospitalization rate forecasting, especially with sparse data and deficient historical experience. First, most existing mathematical and statistical methods are isolated and cannot exploit the previous experience from existing diseases in the relevant forecasting problems of an emerging disease. For instance, they fail to transfer and utilize the learned knowledge of existing diseases (e.g., flu) to predict hospitalization rates during new epidemics (e.g., COVID-19). Second, most studies cannot directly and accurately capture the temporal dependencies of cultural/social factors during an emerging epidemic, such as the growing hospitalizations rates caused by large-scale epidemic outbreaks after holidays and festivals in November and December 2021. Third, existing research in healthcare sustainability ignores the prime importance of forecasting techniques for a long-term development strategy. Accurate and effective analysis results can provide reliable and intelligent decisions to benefit health system management.
In this paper, we propose a novel analytical framework from the perspective of data science to provide more accurate monitoring and forecasting results for hospitalization. Given the delay in data monitoring and collection, we initially tackle the issue of hospitalization rate forecasting based on CDC track data of 50 US states with a lead time from 1 to 14 days. (Lead time is the time-span that the model forecasts in advance. For instance, if the input is
𝑋T and lead time = 14, the expected output is
𝑋T+14, where
T is the window size.) Some studies indicate similar evolving patterns within the existing contagious diseases and new emerging diseases
[22]. It is intuitive to conduct research in the initial phase of pandemics, based on the critical information and clues hidden in epidemic emergence and persistence mechanisms
[23]. In this work, we aim to exploit the experience of existing infectious diseases, such as influenza (flu), to forecast hospitalization rates during an emerging pandemic, such as COVID-19. Due to data scarcity, learning and transferring knowledge directly from historical hospitalization data of existing diseases (e.g., flu) is hard. We use non-linear correlation tests to demonstrate the significant relationship between infection cases and hospitalization rates
[24]. Based on discovered significant correlations, we apply a Heterogeneous Transfer Learning (HTL) approach to learn common characteristics from rich infection case data of flu, and transfer the learned knowledge to predict hospitalization rates during COVID-19. Several prior studies have demonstrated the association of social factors with healthcare problems, such as the mediating impact of human awareness and behavior change
[25][26]. Despite the potentially valuable information in text data, it is under-utilized in time series prediction. In our work, we analyze the effect of social factors on hospitalization rates during COVID-19 from two aspects: sentiment and semantic features of COVID-19 related news articles. Specifically, we address three motivating questions: (1) Will public sentiments and attitudes (e.g., pessimistic or optimistic) affect hospitalization rates? (2) Will public policies (e.g., lockdown and quarantine) affect hospitalization rates? (3) Will the news information from different locations affect local health situations (e.g., COVID-19 rates and hospitalization rates)?
This paper proposes an analytical framework using machine learning techniques to provide AI technical support for healthcare sustainability. We formulate the problem as predicting hospitalization rates during emerging epidemics (e.g., COVID-19) using limited historical time series data and epidemic-related news articles. Our key contributions can be summarized as follows: (1) We apply the transfer learning architecture with dynamic location-aware sentiment and semantic analysis (TLSS)
[27], which is initially designed for emerging epidemic forecasting. We extend TLSS into a new application scenario: hospitalization rate prediction during the outbreak of an emerging pandemic. (2) We leverage non-linear correlation tests to demonstrate the significant correlation between COVID-19 infection cases and hospitalization rates. Therefore, we realize utilizing the rich infection data of existing diseases for hospitalization rate forecasting during an emerging disease outbreak. (3) We use sentiment and semantic analysis methods to extract relevant features from news articles. We apply multimodal data learning within TLSS to learn the impact of news sentiment and semantic information on hospitalization to interpret non-traditional variation patterns. (4) We then concatenate the learned information from the infection records of existing disease (e.g., flu), COVID-19 news semantic/sentiment features, and temporal dependencies in local time series data to forecast hospitalization rates during COVID-19 in a dynamic propagation process. (5) We conduct state- and country-level experiments on real-world hospitalization data during COVID-19 with different time settings. We evaluate the performance of various state-of-the-art methods with exogenous variables to demonstrate the efficacy and flexibility of TLSS in different application scenarios (e.g., hospitalization rate forecasting).
Overall, our research provides valuable statistical evidence and support that can enhance the sustainability of healthcare systems. We have developed and optimized an early-stage forecasting method for hospitalization rates during emerging epidemics, which can help predict the expected volume of patients in advance. By knowing the future possible hospitalizations in advance, health systems can manage costs accordingly, primarily to keep health costs under control and sustainable over the long term during pandemics. Furthermore, our method offers healthcare providers the opportunity to anticipate future demand, adjust medical resource allocation and staffing levels, and prevent hospitals from becoming overburdened. By forecasting hospitalization rates, our proposed method also helps public health institutions in enhancing patient care planning and coordination, improving service quality, and achieving overall healthcare sustainability.
In the future, we intend to refine our AI analytics framework and improve its flexibility, adapting it to various scenarios through continuously updating and purposefully utilizing appropriate techniques. We propose to provide comprehensive readable and visualized forecasting results, enabling broad application in healthcare sustainability, thereby developing its further application to respond to the needs of both health providers and patients simultaneously. Additionally, we aim to extend the exploration of ESG factors, including governance and climate risks, which may also have specific impacts on hospitalization rates or other critical health-related indicators. Such further research will provide a valuable and comprehensive analysis that supports healthcare sustainability in long-term development.