1000/1000
Hot
Most Recent
The COVID-19 epidemic has caused a large number of human losses and havoc in the economic, social, societal, and health systems around the world. Controlling such epidemic requires understanding its characteristics and behavior, which can be identified by collecting and analyzing the related big data. Big data analytics tools play a vital role in building knowledge required in making decisions and precautionary measures. However, due to the vast amount of data available on COVID-19 from various sources, there is a need to review the roles of big data analysis in controlling the spread of COVID-19, presenting the main challenges and directions of COVID-19 data analysis, as well as providing a framework on the related existing applications and studies to facilitate future research on COVID-19 analysis.
On 30 January 2020, the World Health Organization (WHO) declared the spread of the COVID-19 pandemic as a cause of concern and called for raising the level of health emergencies. Afterward, the government of the Kingdom of Saudi Arabia urgently took several strict measures to limit the spread of the pandemic within the regions of Saudi Arabia [1][2]. The Saudi Ministry of Health (MoH) and many other countries have implemented WHO recommendations related to the identification and isolation of suspected COVID-19 cases.
Nevertheless, the pandemic has spread dramatically, with the number of infected people over 82 million, and the number of deaths exceeding one million [3]. The rapid spread of the pandemic, with its continuous evolving patterns and the difference in its symptoms, makes it more difficult to control. Moreover, the pandemic has affected health systems and the availability of medical resources in several countries around the world, contributing to the high death rate [4].
A regular monitoring and remote detection system for individuals will assist in the fast-tracking of suspected COVID-19 cases. Moreover, using such systems will generate a huge amount of data, which will provide many opportunities for applying big data analytics tools [5] that are likely to improve the level of healthcare services. There are a large number of open-source software such as the big data components for the Apache project [6], which are designed to operate in a cloud computing and distributed environment to assist in the development of big data-based solutions. Furthermore, there are several key characteristics of big data called the Six V’s [7], namely, Value, Volume, Velocity, Variety, Veracity, and Variability. However, the original definition of the big data key characteristics considers only three Vs, namely Volume, Velocity, and Variety [8].
The big data characteristics apply to data acquired from the healthcare sector, which increases the tendency to use big data analysis tools to improve sector services and performance. There are wide applications of big data analytics in the healthcare sector, including genomics [9], drug discovery and clinical research [10], personalized healthcare [11], gynecology [12], nephrology [13], oncology [9][12], and several other applications found in the literature. However, in this paper, we present the contributions of the most important review papers found in the literature that cover the field of big data in healthcare. We also investigate the opportunities and challenges for applying big data analytics tools to COVID-19 data and provide findings and future directions at the end of the paper.
Promising wearable technology is expected to be one of the primary sources of health information, given its widespread availability and acceptance by people. Based on a survey conducted in January 2020, 88% of 4600 subjects included in the study indicated a willingness to use wearable technology to measure and track their vital signs. While 47% of chronically ill patients and 37% of non-chronically ill patients reported a willingness to blindly share their health information with healthcare research organizations. Of the same group, 59% said they would likely use artificial intelligence (AI)-based services to diagnose their health symptoms [14]. People sharing such data routinely will greatly increase the volume of data, which calls for planning to design and implement data analysis tools and models in this sector.
Several studies used big data for sentiment analysis, such as Reference [15], which linked between social media behavior and political views, opinions, and expressions. The study consisted of a representative survey conducted on 62.5% of adults from Chile and it showed the huge effect of social media on changing people’s opinions regarding political views and elections. Similarly, the authors of Reference [16] had studied how the management responding to customer satisfaction online review affects the choice of the customers for some facilities or hotels. It showed a positive correlation between the response and customer satisfaction. The authors of Reference [17] had reviewed the classification techniques, including deep and convolutional, to identify the writer from their handwriting. They discussed several challenges in identification related to language characteristics, scripts, and the lack of datasets. Also, the authors of Reference [18] had reviewed and analyzed the latest papers about big data analytics latest developments, capabilities, and profits. Their study showed that big data can support business industries in many functionalities including prediction, planning, managing, decision-making, and traceability. The limitation of their study is the data sources, which were hard to find due to privacy and conservation of the information. Moreover, the authors of Reference [19] had surveyed numerous papers about mathematical models to improve the efficiency in detecting and predicting COVID-19. Their survey suggested using artificial intelligence to detect COVID-19 cases, big data to trace cases, and nature-inspired computing (NIC) to select suitable features to increase the accuracy of detection. Some surveys studied heart-related diseases and suggested some recommendations and guidelines, such as Reference [20], to help people in understanding heart failure causes, symptoms, and the most affected group. They declared that heart failure can escalate the patient’s injuries, especially the ones with serious illnesses.
Analyzing health data in real-time with the utilization of AI techniques will have a vital role in predictive and preventive healthcare. For example, it will help predict the sites of infection and the flow of the virus. It will also help in estimating the needs of beds, healthcare specialists, and medical resources during such pandemic crises as well as in the diagnosis and characterization of the virus [21].
The spread of the global pandemic, COVID-19, has generated a huge and varied amount of data, which is increasing rapidly. This data can be used by applying big data analytics techniques in multiple areas, including diagnosis, estimate or predict risk score, healthcare decision-making, and pharmaceutical industry [22]. Figure 1 shows examples of potential application areas.
Figure 1. Potential application areas of big data analytics for COVID-19.
In the following subsections, we present several examples of COVID-19 data utilization from the literature with a primary focus on reviewing studies that have provided solutions to control the COVID-19 pandemic and fall within one of the three areas, namely (1) diagnosis, (2) estimate or predict risk score, and (3) healthcare decision-making . We also summarize the data analysis techniques and the data type used for each study in Table 1.
Table 1. Data analysis technique, type, source, and findings of the existing studies.
Area | Ref | Aim | Technique | Used Data Type | Data Source | Findings |
---|---|---|---|---|---|---|
Diagnosis | [23] | Develop a diagnosis model for COVID-19 detection and diagnosis of symptoms to define appropriate care measures | Best Worst Method (BWM) | Symptoms and CT scans | Body sensors | The model can differentiate COVID-19 from four other viral chest diseases with 98% accuracy |
[24] | Design a medical device to detect and track respiratory symptoms of COVID-19 | N/A | Symptoms | Headsets and mobile phone | The approach provided good and stable results and can be expanded to include more sensors to detect other COVID-19 symptoms | |
[25] | Develop a remote patient monitoring program (RPM) for discharged COVID-19 cases | The mixed-effects logistic regression model | Demographics, medical data | The remote monitoring program, pulse oximeter, and thermometer | RPM provides scalable remote monitoring capabilities and decreases readmission risk | |
[26] | Investigate smartwatches usefulness in pre-symptoms COVID-19 detection | Two anomaly detection models (RHR-Diff and HROS-AD) | Demographics, activity, medical data, COVID-19 status | Smartwatches and MyPHD mobile app | Respiratory infections can be detected through activity tracking and health monitoring via wearable devices | |
[27] | Identify symptoms associated with positive COVID-19 cases | Principal component analysis (PCA), and logistic regression model | Demographics, medical data | Screening via phone and COVID-19 PCR test | Fever, anosmia/ageusia, and myalgia were the strongest signs of positive COVID-19 cases, while no symptoms were limited to nasal congestion/sore throat associated with negative cases | |
[28] | Determine the clinical characteristics and outcomes of COVID-19 patients in the NY area | N/A | Demographics, medical data, COVID-19 status | Northwell Health system | The common comorbidities were obesity, hypertension, and diabetes.From outpatients or dead patients (n = 2634): 21% died, 14.2% were treated in the ICU, 12.2% received MV, and 3.2% were treated with kidney replacement | |
[29] | Distinguish COVID-19 cough sound from other respiratory diseases through crowd source data | Logistic Regression (LR), Gradient Boosting Trees, and Support Vector Machines (SVMs) | Demographics, medical data, COVID-19 data | Web app and Android app | Wet and dry cough are the common symptoms of positive COVID-19 cases, whereas chest tightness and the lack of smell are the common combination symptoms | |
[30] | Discuss the importance of developing complementary technologies to diagnose and monitor COVID-19 infections | N/A | Activity data, medical data | Sensors | Recommend deploying advanced wearable technologies configured to directly address needs in COVID-19 monitoring and noticing the symptoms | |
[31] | Identify the clinical characteristics of COVID-19 to help in mapping the disease and guiding pandemic management | N/A | Demographics, medical data, COVID-19 status, travel data | Health Electronic Surveillance Network (HESN) database for all Saudi Arabia regions | Fever and cough were common symptoms in the study sample | |
[32] | Employing a two-stage cascading platform to enhance the accuracy of machine learning models | Progressive machine learning technique merged with Spark-based linear models, Multilayer Perceptron (MLP), and LSTM | Medical data | Cardiac Arrhythmia Database. Uniform Resource Locator (URL) Reputation Dataset from University of California Irvine Machine Learning (UCI ML) Repository | Using an improved algorithm with two-step data analysis platforms can increase accuracy in lower computation time | |
[33] | Analyzing the dense layers among the convolutional network can help to increase the accuracy of classification of images for diabetic retinopathy | Deep learning model | Medical data, Demographics data | The Messidor-2 dataset from the hospital | Using improved programming technology can enhance accuracy | |
[34] | Analyze the effects of COVID-19 on patients with cardiovascular disease | Generalized linear mixed model | Demographics, medical data, COVID-19 status | HERs from General Hospital of Central Theatre Command in Wuhan, China | Middle-aged and elderly heart patients are most likely to have COVID-19, whereas new-onset hypertension and heart injury are common complications of severe COVID-19 cases | |
Estimate or Predict Risk Score | [35] | Specify the effect of COVID-19 on the cardiovascular system | The multi -factor logistic regression model | Demographics and medical data | HERs | Cardiac function and vital signs should be monitored in COVID-19 patients, especially those with hypotension, pericardial effusion, or severe myocardial injury |
[36] | Develop and validate a risk score to predict adverse events of suspected COVID-19 patients | Least absolute shrinkage and selection operator (LASSO) and logistic regression models | Demographics and medical data | 15 EDs in Southern California | COVAS score can help physicians to identify patients who may experience a serious event within 7 days | |
[37] | Discover unregistered suspected COVID-19 patients and infectious places | SIR and θ-SEIHRDmathematical models | Demographics and COVID-19 data | IoT-based system and GPS | The proposed system helps identify people who had close contact with COVID-19 patients | |
[38] | Verify if the COVID-19 virus can be transmitted through indirect contact | N/A | Demographics, medical, environmental, and other data | Guangzhou CDC database and sample collection | The virus can survive for a short period on surfaces, allowing indirect transmission of infection to uninfected people | |
[39] | Identify the COVID-19 outbreak impact on the psychological side | Bivariate linear regression | Demographics, medical, social data | Online questionnaire | The COVID-19 outbreak has a significant mental impact on people | |
[40] | Analyze the risk of tuberculosis skin on getting infected by tuberculosis | Statistical | Medical data, Demographics data | Public source | The tuberculin skin can increase the infection by up to 20% | |
[41] | Predict the course of the COVID-19 epidemic to design a control strategy | A designed mathematical model called SIDARTHE | Demographics, medical, environmental data | Public data from Italian MoH and Italian Civil Protection | Social distancing measures and lockdowns are necessary and effective, and precautionary measures for COVID-19 can only be relieved when tests are conducted on a large scale and a mechanism for contact tracing is in place | |
Healthcare Decision-Making | [42] | Evaluate the effectiveness of COVID-19 control measures | C-SEIR model(mathematical model of disease transmission dynamics) | Confirmed COVID-19 data | Public data sources | Quarantine measures have an effective role in containing COVID-19, but they are economically expensive |
[43] | Develop a patient monitoring platform to directly provide the necessary care | N/A | Demographics, medical, COVID-19 data | Online questionnaire via patient monitoring program | Analyzing patient monitoring data helps to know the risk score to determine the care required, allowing optimal consumption of medical resources | |
[44] | Provide a platform for data collection and analysis to estimate disease incidence to develop risk mitigation strategies and resource allocation | Weighted prediction model | Demographics, medical, COVID-19, and other data | Mobile app | Existing data collection methods can be repurposed to track and obtain real-time data for the population during any rapid global health crisis | |
[45] | Identify the regional distribution of the spread of infection and the percentage of healthcare consumption in each region | N/A | Demographics, medical, and other data | Mobile app | Can rely on the mobile app to perform self-assessment and data collection that can be displayed on an interactive map and linked to the results of the COVID-19 test results to support decision-makers and healthcare providers in making decisions | |
[46] | Forecast the census and ventilators requirements for a specific hospital | Weibull and conditional distributions (analytical model) | Statistical data | COVID-19 hospitalized patient records | The model can predict the census and the required number of MV in one, three, and seven days after the simulation run date | |
[47] | Estimate the need for health services and the number of daily deaths over the next 4 months from the date of the study | Statistical model | COVID-19 and other data | WHO websites and local and national authorities in the US states | The model predicts an increased death rate and demand for medical beds, ICU, and MVs | |
[48] | Prove that the three clinical variables: age, fever, tachypnea, can be used to predict the need to admit COVID-19 patients into the ICU | EHRead from Savana [49], and deep learning convolutional neural network classification methods (Prediction model) | Demographics, medical data | EHRs of the hospitals within the Servicio de Salud de Castilla-La Mancha (SESCAM) Healthcare Network in Castilla-La Mancha, Spain | The most common symptoms of male COVID-19 with an average age of 58.2 years who were admitted to ICU are coughing, fever, and shortness of breath, while those between 40 and 79 years of age are likely to be admitted to the ICU if they suffer from rapid breathing | |
[50] | Pre-risk assessment of the epidemic in Italy and identification of high-risk areas | a-priori effect of hazard and vulnerability model (a-priori E_H_V) | Statistical and environmental data | Data from Italian Ministry of Economic Policy Planning and Coordination, Italian Ministry of Health website, WHO, Italian Ministry of Agriculture, and ISTAT database | The risk of a pandemic is higher in some northern regions of Italy and the policy model developed can help policymakers make decisions | |
[51] | Estimate the remaining period before consuming the operational capacity of the hospital and its resources | Monte Carlo simulation, SIR model, and COVID-19 Hospital Impact Model (CHIME) | Statistical data | Academic health system for three hospitals in the Philadelphia region | The model can help in making proactive decisions |
Suspected COVID-19 cases are diagnosed using the Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test. This test takes around 24 h to several days, depending on the multiple conditions. Many countries experienced increased demand for diagnosing suspected COVID-19 cases, which exceeded the available local testing capacity. Therefore, several researchers have proposed alternative solutions for the COVID-19 RT-PCR diagnosis test, including the following.
The authors in Reference [23] have proposed a model to differentiate between COVID-19 and four other viral chest diseases. The model utilizes several body sensors to collect information and monitor the patient’s health condition, including temperature, blood pressure, heart rate, respiratory monitoring, glucose detection, and others. The collected data is stored on a cloud database containing AI-enabled expert systems that help diagnose symptoms of patients infected or suspected of having COVID-19 to determine the appropriate procedure to deal with them. However, it is not clear how the patient’s health information will be presented to the hospital staff. Moreover, the authors in Reference [19] had surveyed numerous papers about mathematical models to improve the efficiency in detecting and predicting COVID-19. Their survey suggested using artificial intelligence to detect COVID-19 cases, big data to trace cases, and nature-inspired computing (NIC) to select suitable features to increase the accuracy of detection.
In Reference [24], the authors provided a flexible and low-cost design of a medical device that can be used to detect and track symptoms of COVID-19. It utilizes headphones and a mobile phone to detect breathing problems. The signals are collected and saved in an audio file format through the mobile app, after which the signals are analyzed using the MATLAB program to identify the respiratory symptoms associated with COVID-19.
Researchers [25] also developed a program to remotely monitor discharged COVID-19 patients. Each patient registered to the app is provided with a pulse oximeter and thermometer to self-report daily symptoms, O2 saturation, and temperature. The abnormal vital signs and symptoms are flagged to be assessed by a group of nurses. Depending on the evaluation outcome, the patient might be readmitted to the Emergency Department (ED). The program helps reduce ED utilization and provides scalable remote monitoring capabilities when a patient is discharged from the hospital.
The authors in Reference [26] found that smartwatches could be utilized in COVID-19 pre-symptomatic detection. They analyzed the physiological and activity data collected from smartwatches of the infected COVID-19 cases. They concluded that 63% of COVID-19 cases could be detected before symptoms appear by applying a two-level warning system based on severe elevations in resting heart rate relative to individual baseline. Moreover, they found that activity tracking and health monitoring using wearable devices can help in early detection of respiratory infections.
Since the COVID-19 symptoms have not been fully identified and due to the changing nature of COVID-19, some studies have focused on identifying the medical characteristics and symptoms associated with positive COVID-19 cases. The study in Reference [27] focused on identifying the symptoms associated with the positive results of the COVID-19 examination, and it was conducted on a group of healthcare workers (HCWs). Initial screening was performed by phone, and a COVID-19 PCR test was also performed for each HCW to identify symptoms associated with each case. The study found that the most common symptoms of positive COVID-19 cases were fever, myalgia, and anosmia/ageusia, while the negative cases mostly have no symptoms, or the symptoms are limited to nasal congestion and sore throat.
The study in Reference [28] aimed to determine the clinical characteristics and outcomes of 5700 hospitalized patients with COVID-19 in the NY area. However, the study included non-critically ill patients and the follow-up time was limited.
Another study [29] proposed a website and Android app to separate a COVID-19 cough sound from other respiratory sounds with the aid of crowdsourcing data from about 7000 unique users (more than 200 of whom reported a recent positive test for COVID-19). Their proposed method employed Logistic Regression (LR), Gradient Boosting Trees, and Support Vector Machines (SVMs) classifiers to distinguish the cough sound data based on gender, age, and symptoms. Also, their classifiers distinguish the user based on other features, such as whether they are asthmatic patients, smokers, or healthy. Their app asks the user to cough from three to five times then repeat the process every two days to update the user’s health status. Their method proved that a COVID-19 cough can be distinguished from other lung diseases coughs from the sound of the cough combined with breathing sound to screen the disorder. It achieved 82% Area Under the Curve (AUC) in identifying the cases that tested positive for COVID-19. They recommended more studies in the field to specify more characteristics of a COVID-19 cough sound to make it more distinguishable from other respiratory sounds.
The authors in Reference [30] declared the importance of using complementary technologies such as on-body sensors for diagnosing and monitoring COVID-19 infections. They stated that clinical devices are more reliable and provide more functions than smartwatches since these devices are distributed in different areas of the human body to detect different body signals. A thin, soft sensor with a high-bandwidth accelerometer and a precision temperature sensor placed on the neck is very important to record respiratory activity from cough frequency, intensity, and duration to respiratory rate and effort, to high-frequency respiratory features associated with wheezing and sneezing. Also, they recommended machine learning and predictive algorithms to help to diagnose and monitor COVID-19.
In Reference [31], researchers emphasized on the importance of identifying the characteristics of COVID-19 among patients of Saudi Arabia in managing the pandemic. The study included 1519 cases where data related to their ages, genders, vital signs, public data, and clinical examinations were collected. Their test was conducted based on the quantitative RT-PCR approach, which is the protocol established by the World Health Organization. After the data was gathered, it was entered into electronic sheets with distinct data collectors, and data was analyzed with Statistical Package for Social Sciences program, version 24 (SPSS-24). The statistics manifested that the most common symptoms of COVID-19 are cough and fever, with 89.4% and 85% presence in reported positive cases, respectively. Also, it confirmed that the most infected patients’ demographics include elder males, severe cardiac condition patients, and diabetic patients.
The authors in Reference [32] had utilized machine learning techniques along with spark-based linear models, Multilayer Perceptron (MLP), and Long Short-Term Memory (LSTM) with a two-stage cascading platform to enhance the prediction accuracy in different datasets. They applied their method on two datasets for cardiac arrhythmia and resource locator, so their model performed with higher accuracy and lower computation time. Thus, the authors in Reference [33] had proposed a computer program method to aid the classification model to analyze the retinal image of diabetic retinopathy to investigate its effect among adults in causing blindness. It proved that the focused connection among layers of the convolutional network assists the accuracy of the classification result.
The retrospective, observational study in Reference [34] conducted a statistical analysis to show the cardiovascular implications of COVID-19 on the patients. The study was performed on 116 patients who tested positive for COVID-19. The data was clinically collected and tested to extract clinical symptoms and signs, chest computed tomography, treatment measures, and medical records. The statistical analysis was performed on the data to reveal similar results as those reported by Reference [31], where the common symptoms were fever and dry cough, and the elder or middle-aged males, heart injury patients, hypertension patients, and diabetics were the most infected populations.
Estimating the risk score helps in determining the care level and priority for each patient with an insight to the necessary proactive measures. In the following section, we present the studies that cover this area.
In Reference [35], the authors aimed to validate a hypothesis that COVID-19 infection could lead to serious cardiovascular diseases or maybe worse. They utilized statistical analysis by employing a multi-factorial logistic regression model to analyze COVID-19-related causes. The study was conducted on 54 patients with different ages, genders, and vital signs, where 39 were diagnosed as severe COVID-19 cases and 15 as critical COVID-19 cases. The data was collected clinically from the patients with attached vital sign measurement devices updated every four hours. Results showed that elder males, diabetic patients, and hypotension patients are more likely to develop a serious heart-related condition and need more care. Their proposed study is limited due to the small sample size, and they suggested a higher sample size to conduct a more appropriate study and verify the results.
The authors in Reference [36] are interested in developing and validating the risk score to predict adverse events among patients suspected of having COVID-19. They conducted a retrospective cohort study of adult visits to the emergency department. The study concluded that the primary outcome was death or no respiratory decompensation within 7 days. To derive the risk score, they used the Least Absolute Shrinkage and Selection (LASSO) and Logistic Regression models. They concluded that the COVID-19 Acuity Score (COVAS) can assist in decision-making to discharge patients during the COVID-19 pandemic. They also reported the derivation and validation metrics of cohorts and subgroups with pneumonia or COVID-19 diagnosis.
The authors in Reference [37] proposed an Internet of Things (IoT) based system to discover unregistered COVID-19 patients, as well as infectious places. This would help the responsible authorities to disinfect contaminated public places and quarantine the infected persons and their contacts even if they did not have any symptoms. The newly confirmed and recovered cases would be recorded in the system by the healthcare staff, while the geolocation data will be collected automatically by Global Positioning System (GPS) technology in the IoT devices. The authors discussed how their proposed system could be utilized to apply three different prediction mathematical models, namely the θ-SEIHRD model, Susceptible-Infected-Recovered (SIR) model, and Susceptible-Exposed-Infectious-Removed (SEIR) model.
Another study [38] demonstrated the possibility of transmitting the COVID-19 virus through indirect contact, like touching surfaces contaminated with the droplets of an infected person. Therefore, it was recommended that paying attention to personal hygiene and disinfection of public places could possibly reduce the incidence.
Furthermore, researchers also [39] conducted a cross-sectional study to show the impact of the COVID-19 outbreak on the psychological side. They found that fear of a COVID-19 outbreak can have significant psychological repercussions on people, which requires more attention by the relevant authorities to cope with this impact. Also, the authors in Reference [40] had proposed a model that identified the risk of getting infected by tuberculosis based on several factors related to tuberculin skin, age, and weak immune system. They stated that those factors can increase the infection from 10% to 20%.
The authors in Reference [41] provided a model that predicts the course of the outbreak to help plan an efficient method of prevention. Model stages are SIDARTHE (susceptible, infected, diagnosed, ailing, recognized, threatened, healed, and extinct). It discriminates between infected people based on whether they have been diagnosed and on the severity of their symptoms. The simulation results obtained by combining the model with the available data on the COVID-19 pandemic in Italy indicate that it is an urgent necessity.
During the COVID-19 pandemic, the demand for emergency departments and medical equipment such as ventilators increased. Therefore, many studies have aimed to provide monitoring tools and models that help in making several medical decisions to mitigate potential risks, and these solutions include the following.
The authors in Reference [42] designed a prediction model called Conscious-based Susceptible-Exposed-Infective-Recovered (C-SEIR) model to ensure the usefulness of the lockdown and protective countermeasures in decreasing the influence of the pandemic in Wuhan city. The proposed model consisted of two classification groups, namely the quarantined suspected infection group (P), and the quarantined diagnosed infection group (Q), along with a blue/green curve with a solid line for daily patients and dashed line for cumulative patients. It showed that the result of the prediction is a double drop-down or increase based on the city lockdown precautions in Wuhan. The authors also gave guidance for protection against COVID-19, such as being educated about the virus, social distancing, and lockdown.
In Reference [43], the authors have developed a patient monitoring program that allows daily electronic checking of symptoms, providing advice and reminders via text messages, and providing care by phone. Patients registered in the system complete a daily questionnaire to evaluate 10 symptoms using a scale from 0 to 4. In addition to determining how much they feel the infection is affecting them, the number of analgesic/antipyretic tablets they take, and the temperature measured, questionnaire responses are used to classify patients and specify the care needed. The study focused on three measures, namely the number of patients monitored over time, the daily symptoms score, and daily ED referrals.
Likewise, the authors in Reference [44] developed a mobile app to track the spread of COVID-19 symptoms in the UK by analyzing a set of data reported by patients registered in the app, including location, age, health risk factors, symptoms, healthcare visits, and COVID-19 test results. Survey data helped in determining patients’ type and intensity, availability of personal protective equipment, and work-related stress and anxiety.
The study presented in Reference [45] was concerned with evaluating one of the COVID-19 applications in terms of user satisfaction and the possibility of using the data collected to support decision-makers and healthcare providers. The app collects information daily from patients, including symptoms, vital signs, and an assessment of their satisfaction with the services provided by the app. The data collected is distributed on an interactive map according to the postal code for each user, which helps in knowing the regional distribution of the spread of infection in addition to the percentage of healthcare consumption in each region.
Another study [46] provided an analytical model for predicting patient census and estimating ventilator needs for a given hospital during the COVID-19 pandemic. Through this study, it was noticed that the estimation of the bed and ventilator needs is influenced by the length of hospital stay, and the number of days of inpatient ventilator use. Also, there was no relationship between the age of hospitalized patients and the likelihood of needing a ventilator, or between the inpatient gender and the length of stay. They recommended that each hospital relies on its internal data for accurate resource planning.
Furthermore, the Institute for Health Metrics and Evaluation (IHME) COVID-19 health service utilization forecasting team conducted a study to predict the expected daily use of health services and the number of deaths due to COVID-19 for the next four months from the date of the study for each state in the US [47].
The authors in Reference [48] tried to describe the clinical characteristics and identified factors that predict intensive care unit (ICU) admission for COVID-19 patients. They found that the need for a COVID-19 patient to enter the ICU can be predicted by checking a set of medical parameters that can be easily obtained: age, fever, and tachypnea with/without respiratory crackles. They used the EHRead [49] technique that was developed by Savana to extract information from the medical records. Also, deep learning convolutional neural network classification methods are used to classify the extracted data.
The authors in Reference [50] provided a data-driven framework to pre-assess the risks of the COVID-19 pandemic and to identify high-risk areas in Italy. The framework assesses the risk index using a function consisting of three criteria, namely disease risk, area exposure, and the vulnerability of its population. The twenty Italian regions are classified based on available historical data, which include population density, age, human mobility, air pollution, and winter temperature. The study showed a correlation between the risk index and the number of deaths, infected, and patients in ICU. They also provided a policy model to assist authorities in making several decisions.
Moreover, regional healthcare models have been developed to estimate the pandemic, like the simulation approach developed at the University of Pennsylvania called Monte-Carlo [51]. Such models can be used to manage facilities and plan for an anticipated increase in patient numbers, but not for an estimate of daily operational needs. Applying the Pennsylvania model in an individual hospital requires unknown parameters like the proportion of the region’s patients expected to visit that hospital, and the percentage of the regional population isolated sufficiently to avoid infection.