For instance, in the education sector, big data analytics played a key role in overcoming the negative consequences of the pandemic on the educational sector. It supported tutors and instructors to personalize the remote learning experience for educators. Additionally, it helped bridge the unemployment gap that resulted from COVID-19 major economic losses globally. The importance of big data applications and analytics in the transportation field of COVID-19 has been explicitly shown to decision-makers. For instance, regulators supported their decisions and judgments based on the data captured and analyzed via AI techniques and predictive models. Based on the results, precautionary measures were clearly defined, and any violations were easily detected. Furthermore, predictive models guided decision-makers on citizens’ movement within and among cities and metropolitans; consequently, they were able to detect and predict future endemic areas.
2. B Lig Dataterature Review
Big Data is a critical asset in the competitive market of the digital economy. The benefits of Big Data allow organizations to achieve various objectives under the umbrella of Big Data insights
[17]. The following sub-sections present the overall review of Big Data and its applications.
2.1. Big Data and Analytics
There has not been a standardized definition for BD among industry, business, media, academia, and various stakeholders. Absence of a systematic definition for BD concept leads to a sort of confusion
[12][18][19][12,18,19]. BD is usually defined by individuals. It is different from one industry to another, and according to the types of available sizes of datasets and the software tools are common in a particular industry
[8][20][21][22][8,20,21,22].
There have been remarkable thoughts from both industry and academia on BD definition
[23]. By coupling the concept of BD with current grounded academic research, the BD concept can be more understandable. A clear view of BD concept will enhance the awareness about BD phenomenon for both practitioners and academics, resulting in faster growth and more efficient value obtained from BD
[24]. In spite of the fact that there is no identified definition for BD, from a technical and business point of view, BD is identified as the increasing flow of various types of data from different resources
[25].
The first BD definition was written by scientists from NASA. The paper published in 1997, by NASA referred to the data volume as an exciting challenge for computer systems to increase the demand for the big volume of main memory, local disk, and in addition to a remote disk. It was identified by NASA as the problem of BD that required to obtain more resources
[8][9][10][11][12][13][26][8,9,10,11,12,13,26]. The META Group analyst Dough Laney (now Gartner) has defined data growth challenges and opportunities in to three-dimensional (velocity, volume, variety)
[18][27][18,27].
The researchers have defined BD concepts from different point of views (BD characteristics, technology, business, Innovation, etc.). One of the definitions had been updated by Gartner in 2013, who defined BD concept as “high-volume, high velocity and/or high variety information assets that demand cost-effective innovative forms of information processing for enhanced insight, decision making, and process optimization”
[26][28][29][26,28,29]. The Statistical Analysis System Institute (SAS) defined BD as “Popular term used to describe the exponential growth, availability, and use of information, both structured and unstructured”
[30]. IBM also added a definition for BD, “Data is coming from everywhere; sensors that gather climate information, social media posts, digital videos and pictures, purchase transaction record, and GPS signal of mobile phone to name a few”, “BD can be defined as large set of very unstructured and disorganized data”, “BD is a form of data that oversteps the processing power of traditional database infrastructures or engines”
[30][31][32][30,31,32].
BD was referred from more than one perspective (BD as technology, entity, and process)
[33]. The definition of BD analytics consists of the technologies (database and data mining tools) and techniques (analytical methods and techniques) that organizations can utilize to analyze vast amount and complex data for a variety of applications prepared to increase the performance of organizations in many perspectives. BD can be considered as both entity and process. BD as an entity includes a volume of data captured from a variety of resources (internal and external) and consists of structured, semi-structured, and unstructured data that cannot be processed using traditional databases and software techniques. BD as a process refers to both the organizations’ infrastructure and the technologies used to capture, store and analyze numerous types of data
[10][11][12][13][33][10,11,12,13,33].
New insights are provided by BD to discover new values, supporting organizations to get the benefit of a deep understanding of the hidden values
[23]. BD is pointed out as a technology that enables the processing of unstructured data; and BD technologies are the systems and tools used to process BD such as NoSQL databases, the Hadoop Distributed File System, and MapReduce
[34][35][34,35].
According to
[14], different theories and definitions on what shape BD exist in are provided. The most often referred definition is BD oversteps the capabilities of popularly and currently used software tools and hardware platforms to capture, manage, and process it within an acceptable and bounded time. The concept of BD has been promoted to define the novel and powerful computational technologies that have been provided to process an enormous volume of data. BD has been described in various ways, however, fundamentally is a modern technology that is primarily characterized and derived from Business Analytics (BA) and Business Intelligence (BI). It is capable of creating business values via its predictive analytics, and decision support abilities, which results in the potency to deal with data that traditional techniques cannot process
[25][34][25,34].
According to the studies by
[13][36][13,36], BD is defined as “a term that describes large volumes of high velocity, complex and variable data that requires advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information”.
2.2. Characteristics of Big Data
Existing work characterized BD as novel technologies and architectures which are designed for extracting value from enormous volumes of a wide range of data, by empowering high-velocity capture, discovery, and analysis in a cost-effective way
[28]. Since BD is relatively new, it is significant for organizations to know what makes this trend valuable and they should identify the “Vs” that describe the key characteristics of BD
[37]. Still, a lot of confusion and obscurity among the Vs of BD exists. Some pioneering studies pointed out that there are three, four, five, and sometimes even seven characteristics of BD
[38].
The large-scale feature of BD is reflected in three different characteristics of volume, variety, and velocity. Traditional technologies do not have the ability to successfully deal with the enormous data volume, which is generated at a growing velocity, via online streaming and a variety of other different resources such as transactional systems, sensors, social media, product/service instrumentation, and web platforms
[38]. META Group analyst Doug Laney (now Gartner) presented 3Vs of BD to characterize the data management in 3 dimensions represented by three main Vs of Volume, Velocity, and Variety
[39]. Volume represents the amount of data. Velocity represents the speed of data generation and process. Variety refers to the diversity of resources and data types. Variety refers to the diversity of resources and data types
[40][41][42][40,41,42]. The three Vs have been mentioned by NIST and Gartner in 2012 and extended by IBM to involve the 4th V representing “Veracity “. Contrarily, Oracle avoided using the paradigm of “Vs” in its BD definition. Instead, it is highly believed that BD is the derivation of values from traditional relational database-driven business decision making, grown with new resources of unstructured data
[18].
The 4Vs (volume, variety, velocity and value) model was presented by
[14][33][41][43][44][14,33,41,43,44]. Excluding the 4Vs mentioned, another V which is veracity is identified to represent the uncertainties of BD and data analysis outcome. Another research conducted by
[41][42][41,42], pointed out the four major Vs of BD namely volume, velocity, variety, and value that pertains to the insight obtained by organizations from BD which not only require scalability, but also for preferable operational procedures and strategies
[41][44][45][46][41,44,45,46] pointed out five key characteristics of BD as 5Vs (Volume, Velocity, Variety, Veracity, and Value). “Complexity” is a “C” feature added to the 4-Vs (Volume, Variety, Velocity, Value) of BD by
[47][48][49][47,48,49] to formulate another 5 characteristics of BD. Security and management are additional characteristics to the 3Vs (Volume, Variety, and Value)
[48]. A study by
[48] also presented a critical problem of technical research that requires more investigation by scholars.
Recently, other Vs which are (Visualization/Visibility, Variability/Volatility, Validity, Virtual, and Complexity) are added to BD characteristics by
[26]. Another work done by
[50], defined the 7 Vs of BD namely Volume, Velocity, Variety, Value, Veracity, Variability which implies inconstancy and heterogeneity; and visualization which implies the illustrative character of data. Volume, Velocity, Variety, Veracity, and Value are the widely accepted and common Vs by stakeholders. However, the other Vs are important for BD paradigm too. By comparing existing definitions of BD and its related aspects, the 5Vs (volume, velocity, variety, veracity, and value) characteristics are extracted and formulated to point out how different traditional data and BD are
[24][51][52][24,51,52] as illustrated in
Figure 1.
Figure 1.
The Five Features of Big Dat.
Big data is more of a concept than an exact term. Some classify big data as a volume problem only for petabyte-scale (>1 million GB) data collection. Some people associate big data with different data types, even if the volume is measured in terabytes. These interpretations made the big data problem situational
[51].
2.3. The Types of Data Analytics
Big Data Analytics refers to the process of collecting, organizing, and analyzing high volume, velocity, variety of data to discover the valued patterns that could use for making decisions. Analyzing the big data need new tools, methods, and technologies such as data mining, predictive analytics, and perspective analytics
[52].
Most of existing literature identified the use of big data applications defined in the presence of the four types of data analytics. The four types of big data analytics that can be implemented in governments are: (i) Descriptive, (ii) Diagnostic, (iii) Predictive, (iv) Prescriptive
[53].
The following section will describe each type with their related examples in governments and more specifically during COVID-19:
Descriptive analytics is the preliminary stage in the analytics categorization. Descriptive analytics is known as business reporting, as such stage emphasize in creating summary reports to highlight business activities, and to illustrate the answers of questions of “what is happening or happened?”
[54][55][56][54,55,56].
This type of data analysis depends on analyzing past data, visualize and understand historical trends. An example of BDA during COVID-19 are Dashboards used in the health care sector to monitor live data about the spread of COVID-19 in a particular area. Such dashboards track, illustrate and statistically explain the historical records captured about COVID-19 cases in a specific area, city or country
[29].
The second data analytics type is Diagnostic data analysis. This type focuses on illustrating the correlation, hidden patters, cause-effect relation and interrelationships between different variables. An example would be the data captured from job portals. Such data is used to analyze and visualize potential market sectors and match it with the relevant workforce in the country
[54].
Diagnostic analytics figure out answers to questions of “why did it happen?’. The main goal in Diagnostic Analytics is to highlight the root causes of a challenge or problem. Such root causes identification depends on specialized techniques such as visualization, drill-down, data discovery, and data mining
[54][55][56][54,55,56].
Predictive analytics is categorized in the third level on the data analytics hierarchy. More specifically it is the stage residing after the descriptive analytics. Based on the Data Analytics maturity model, organizations that have matured in descriptive analytics can move forward to the next stage to answer “What will happen?”
[55][56][55,56].
The third data analysis focuses on patterns from past existing data and predict what will happen when changes occur in such set of data. The example here is the vaccine distribution prioritization mechanism. Data analysts predict through machine learning model who is next in need to the vaccine and prepare patient priority list accordingly
[57].
The last data analysis type Prescriptive analytics. It is where the best alternative among many–that are usually created/identified by predictive and/or descriptive analytics–courses of action is determined using sophisticated mathematical models. Therefore, in a sense, this type of analytics tries to answer the question of “What should I do?”. Prescriptive analytics uses optimization, simulation, and heuristics-based decision modelling techniques
[58].
Perspective analytics is ranked as the highest level in data analytics maturity model, it is also viewed as the most sophisticated and complex data analysis type. Both AI and big data analysis techniques are used in Prescriptive analytics. Utilizing such techniques facilitate decision makers to frame the optimal strategic decision. Decision makers will reach to these decisions via selected optimization models. For example, Prescriptive analytics were used during COVID-19 pandemic to understand citizens’ reactions towards the vaccine, and support decision makers to structure the optimal strategic decisions to control citizens’ hesitant towards the vaccine
[55][56][55,56].
To illustrate how the four data analytics types are classified based on the level of sophistication and data complexity, researchers have introduced the following two categorize as shown in
Figure 2:
Figure 2.
A simple taxonomy for analytics.
3. Big Data Analytics Opportunities and Applications
Big data analytics can be described as the use of mathematical and statistical techniques, to find the hidden patterns and variances in large amount of data from multiple sources, and from different type of data (structure, semi-structured, unstructured) to gain future insight and faster decision making
[60]. Such findings will be the base for organizations to provide them with valuable knowledge and support them in their strategic decisions
[61]. The utilization of big data analytics has shown an added value to governments and firms during COVID-19. Consequently, those who have implemented big data analytics, outperform others. For instance, they were able to map their current status and structure better strategic decisions
[55].
In a Mckinsey’s report it was highlighted that big data analytics empowered those who applied it, by incrementing their annual economic value between $9.5 trillion and $15.4 trillion
[62]. Furthermore, as the COVID-19 outbreak, big data analytics has emphasized its effectiveness in detecting the spread of COVID-19, and supported governments to reach optimal decisions against it
[63].
The main goal for organizations is the bottom line represented in their profits, market share and customer loyalty and satisfaction levels
[64]. This fact is applied for both business firms and governmental entities. With the exponential increase in the volume of data, the speed in which it is generated, the variety of sources generating it, and the importance of its quality and relevance. The vital role of big data applications in various business sectors and governmental entities have been a necessity for their success
[65]. The implementation of big data applications has supported organizations to enhance their customers experience, improve cost savings, and facilitate strategic decision making
[66]. Consequently, organizations’ processes and operations become achieve a higher level of effectiveness and efficiency
[67].
New, advanced and tactical digital technologies were considered recently as a response to the COVID-19 pandemic, such as big data applications. Countries such as Taiwan, South Korea, Hong Kong, and Singapore have demonstrated the significant positive impact from adopting such applications. Those countries proved the seamless of controlling the pandemic expected risks effectively
[61].
Big Data Applications can derive insights from various data sources to provide ideal solutions for several sectors
[52]. Organizations from a variety of industries have started using MapReduce-based solutions for processing enormous amounts of data
[68]. To meet their needs for handling large-scale data processing, many businesses rely on MapReduce. As businesses from a variety of sectors embrace MapReduce together with parallel databases. new MapReduce workloads have appeared that contain a large number of brief interactive tasks
[68].
Table 1 highlights the alignment of each big data application to its respective big data analytics model. It provides an explanation on how such an implementation has supported organizations and governments to cope with COVID-19 pitfalls. Furthermore, such an implementation provided an optimal solution to harness its operations and decision-making process. This categorization has been developed based on the description of each application in their relevant fields, and on the definition highlighted in the section on the four big data analytics types.
Table 1.
How to utilize Big Data Analytics in Healthcare, Education, Transportation, and Banking.
Field |
Data Analytics Type |
How BDA Has Been Utilized |
Data Processing Models Used to Analyse Big Data |
Reference |
Healthcare |
Descriptive and Predictive Data Analytics |
Proactive actions and interventions based on predictive models to trigger any noncommunicable diseases. |
Predictive models based on search engines and social media data. Smart phone applications tracking system to identify infection hot spots |
[69][70][69,70] |
Perspective Data Analytics |
Vaccine distribution |
Sentiment analysis to reduce community resistance towards the vaccine. |
[69][70][71][69,70,71] |
Diagnostic and Predictive Data Analytics |
Vaccine distribution |
Machine learning models to prioritize the citizens’ need and urgency to the vaccine |
|
Diagnostic and Prescriptive Data Analytics |
Monitoring live and frequent data on the spread of the disease Provide more personalized consultations by “virtual doctors” |
Dashboards AI Chabot |
[72][73][72,73] |
Education |
Descriptive Data Analytics |
Enhance online educational platform experience |
Analyzing data captured from online educational platforms can ease educators remote leaning experience |
[69][74][69,74] |
Diagnostic Data Analytics |
Bridge the gap of unemployment |
Analysis of data captured from job portals |
[69][75][69,75] |
Transportation |
Descriptive and Prescriptive Data Analytics |
implementation of precautionary measures-Ensure social distancing in public transportation |
Capturing relevant data and use machine learning techniques to detect incompliance actions |
[76] |
Detect citizens’ commute route to store their travel history. |
Use both AI and Big data applications to capture, track and predict valuable insights about citizens movement within and across cities and countries |
[77] |
Banking |
|
Fraud Detection |
Use AI and ML techniques to describe and detect real-time abnormal activities and online transaction, and build ML models based on classification algorithims to predict any suspecious case. |
[78] |
Descriptive and Predictive Data Analytics |
Risk Assessment |
Use both diagnositic and prescriptuve data analytics models to analyze real-time data and asses the creditworthiness to customers. Consequenlty developing the appropriate cutomer portfolio and tailor clients needs to their services. Cossequently boosting customers’ satisfaction, loayality and enhance banks botom line records. |
[78] |