Business Analytics | Encyclopedia MDPI

Business Analytics: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Computer Science, Interdisciplinary Applications

Contributor:

Junyang Chen

Business analytics has been widely used in various business sectors and has been effective in increasing enterprise value. With the advancement of science and technology in the Big Data era, business analytics techniques have been changing and evolving rapidly.

business analytics
descriptive analytics
predictive analytics
prescriptive analytics

1. Introduction

In recent decades, data have been rapidly changing the world. Especially in the era of Big Data, data are cheap and ubiquitous, but what makes datum a valuable asset is how it is used to obtain useful information. Since there are many different types of business objectives, different analytics techniques are needed to achieve them. These techniques have many applications in the business area and “business analytics” enables the business application of Big Data. Since the emergence of the term business analytics, it is growing by leaps and bounds, reflecting the increasing importance of data in terms of volume, variety and velocity [1]. Although there is no uniform definition of business analytics, the existing definitions can be summarized into several dimensions, such as a movement, a transformation process, a capacity set and so on [2].

Interest in analytics and data science is growing as business organizations are using business analytics extensively to improve their business value. Business analytics has evolved into an important part of the business decision-making process, using data to drive decisions and support decision-makers in making strategic, operational and tactical decisions [3]. Specifically, business analytics can help companies to leverage the value of historical data by harnessing the power of statistical and mathematical models and advanced techniques such as artificial intelligence algorithms. Through these models and algorithms, enterprises can integrate disparate data sources for trend prediction, decision optimization and more. As business analytics continues to evolve, its applications continue to broaden. It is adapted in some functional departments within the enterprise and some non-business areas.

2. Definitions of Business Analytics

At present, there is still no uniform definition of business analytics. Scholars in different fields have defined the term business analytics from several perspectives. Holsapple concluded 18 definitions of analytics in 6 dimensions [2].

First, from the perspective of techniques, business analytics is considered an application of any data analytics [5] or data science [6] in business fields, which uses tools and techniques statistically and quantitatively to analyze a huge collection of data sources to support decisions for business [7]. More specifically, business analytics can be viewed as ‘a broad category of applications, technologies, and processes for gathering, storing, accessing, and analyzing data to help business users make better decisions’ [8]. With the continuous emergence of new technologies, business analytics can also be viewed as a combination of operation research, artificial intelligence (machine learning) and information systems [1].

Second, from the process perspective, business analytics is an encapsulation of tools to convert data into actionable insights through a scientific/mathematical/intelligent process [9]. The Institute for Operations Research and the Management Sciences (INFORMS) defined it as ‘a scientific process of transforming data into insight for making better decisions’ [10].

Third, from the practice perspective, business analytics is defined as ‘an ability of firms and organizations to collect, manage, and analyze data from a variety of sources to enhance the understanding of business processes, operations, and systems’ [11]. Business analytics refers to ‘the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions’ [12].

Finally, from the perspective of the management, business analytics is a qualitative methodology to derive valuable meanings based on data [13] and is ‘a paradigm shifter of models, technologies, opportunities, and capabilities used to scrutinize a corporation’s data and performance to transpire data-driven decision-making analytics for the corporation’s future direction and investment plans’ [3].

Overall, regardless of the perspective from which it is defined, it can conclude that the implementers of business analytics are enterprises; the approaches to achieve business analytics are various techniques; and the ultima goal of business analytics is to improve enterprise values.

3. Techniques of Business Analytics

Business analytics is generally identified into three types: descriptive analytics, predictive analytics and prescriptive analytics [9]. Descriptive analytics is used to provide a summary of descriptive statistics as a straightforward presentation of facts. Predictive analytics is used to discover what is likely to happen in the future based on current data. Prescriptive analytics focuses on identifying optimal actions in the decision-making process.

3.1. Descriptive Analytics Techniques

Descriptive statistics is a process of characterizing historical data. There are two core techniques of descriptive statistics: data visualization and data analysis. Data visualization produces graphical images of data or concepts, which helps decision making [15]. Data analysis consists of common statistical techniques, including mean, median, standard deviation, range, stem, histogram and advanced data mining techniques used to describe hidden patterns in the data.

3.1.1. Data Visualization

Over the years, many data visualization techniques have been developed to represent large amounts of information and examine them. These methods include bar charts, box, and whiskers, bubble charts, choropleth maps, dot distribution maps, histograms, line graphs, pie charts, population pyramids, proportional symbol maps, scatter plots, stacked bar charts and tree maps.

When working with data sets that include big data points, automation of the data visualization process makes the process much easier. Therefore, a large variety of data visualization tools are developed to create visual representations of large data sets, including Tableau Software 2022.4 [16], Microsoft Power BI [17], Excel, FusionCharts, Sisense, etc. In addition to the visualization tools as software, there are many online visualization tools such as Infogram, RAWGraphs, Sovit, etc.

3.1.2. Data Analysis

Data analysis is to analyze the collected data and derive various quantitative characteristics reflecting objective phenomena. In addition to the traditional statistical methods of data concentration trend analysis, data dispersion analysis and data frequency distribution analysis, advanced data mining techniques probe more deeply into the underlying characteristics of data. Association and cluster analysis are two typical data analysis methods used in descriptive analytics.

Association analysis

Association analysis, also called association rule mining, is an unsupervised algorithm that is used to mine potential association relationships from data. There are two classical algorithms in association analysis: Apriori Algorithm and Frequent Pattern tree (FP-tree). Apriori Algorithm uses an iterative method of searching the database level by level to find the relationships of item sets to form rules. Its process consists of concatenation and pruning. To improve the Apriori algorithm, many improvement methods are proposed including the Direct Hashing and Pruning (DHP) algorithm [18], Dynamic Itemset Counting (DIC) [19], Parallel Apriori algorithms based on various frameworks such as MapReduce [20,21,22], Spark [23,24] and Flink [25] and adaptive Apriori algorithms [26]. Compared to the Apriori algorithm, the FP-tree algorithm only requires two scans of the database when performing frequent pattern mining and does not generate candidate item sets. There are various improvement algorithms based on FP-tree, such as QFP-growth [27], fuzzy FP-tree [28], PFP [29], balanced parallel FP-tree (BPFP) [30] and tree partition based parallel FP-tree [31].

Cluster analysis

Cluster analysis is a multivariate statistical analysis method for classifying samples or indicators. The clustering algorithms can be divided into five categories: partitioning-based, hierarchical-based, density-based, grid-based and model-based.

Partitioning-based algorithms include K-means [32], Fuzzy C-means (FCM) [33], K-medoids [34], CLARA (Clustering Large Applications) [34], K-modes [35], and CLARANS (Clustering Large Applications based on a RANdomized Search) [36]. Hierarchical clustering algorithms include BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) [37], (CURE) [38], ROCK (Robust Clustering using Links) [39] and Chameleon (clustering using interconnectivity) [40]. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is the first density-based clustering algorithm [41]. In addition, DENCLU (DENsity-based CLUstEring) [42] and OPTICS (Ordering Points to Identify the Clustering Structure) [43] are both widely used in cluster analysis. The typical gird-based algorithms include STING (Statistical Information Grid) [44], CLIQU (Clustering in Quest) [45] and WaveCluster [46]. There are usually two attempted ideas in the model-based algorithm: statistical methods and neural network methods. Among them, statistical methods are the COBWEB algorithm [47], GMM (Gaussian Mixture Model) [48] and the neural network algorithm is the SOM (Self Organized Maps) algorithm [49].

3.2. Predictive Analytics Techniques

In general, predictive analytics techniques can be divided into statistical techniques and machine learning techniques. Statistical methods to predict mainly refer to building suitable forecasting models and estimating model parameters, listing forecasting formulas and thus making extrapolated forecasts. In machine learning techniques, systems are trained to use specialized algorithms to study, learn and make predictions and recommendations based on large amounts of data.

3.2.1. Statistical Techniques

In statistical predictive techniques, statistical theories and methods are used for prediction by building statistical models and fitting the model parameters with past data. There are two groups of methods in statistical predictive techniques: regression models and time series models.

Regression model

Regression model is one of the most famous statistical techniques used to predict. The linear regression model is the basic model which is represented as an equation that finds specific weights for the input variables, which in turn describe a straight line that best fits the relationship between the input variables and the output variables [50]. When the output variable is a categorical variable, a classification model such as the logistic regression model [51] is needed. Meanwhile, polynomial regression models are used to fit nonlinear relationships between variables.

Time series model

Time series models can be divided into two groups: exponential smoothing models and ARIMA series models [52]. The exponential smoothing model decomposes the time series into components and uses an additive or multiplicative structure to reassemble the smoothed components to predict future values [53]. Typical exponential smoothing models include simple exponential smoothing, Holt’s exponential smoothing and Holt-Winters’ seasonal exponential smoothing [54]. The ARIMA series model mainly includes AR (AutoRegressive) model, MA (Moving Average) model, ARMA (AutoRegressive Moving Average) model, ARIMA (AutoRegressive Integrated Moving Average) model and SARIMA (seasonal ARIMA) model [55].

3.2.2. Machine Learning and Artificial Intelligence Techniques

With the advent of the big data era, machine learning to guide predictive analytics has become a widely used approach. There are many classic machine learning prediction algorithms, such as support vector machine, nearest neighbor, decision tree, ensemble learning and artificial neural network and more advanced deep learning techniques.

Support vector machine

In machine learning, the support vector machine (SVM) is a supervised learning model for analyzing data in classification and regression analysis with associated learning algorithms [56]. Given a set of training instances, each labeled as belonging to one or the other of two classes, the SVM training algorithm builds a model that assigns new instances to one of the two classes. Thus, it is usually used to predict binary classification problems.

Nearest neighbor

The k-nearest neighbor algorithm, known as KNN, is a non-parametric supervised learning classifier [57]. It can be applied to classification problems or regression problems. In a classification problem, the output is a member of a category, and in a regression problem, the output is the value of an object’s attributes. The nearest neighbor is considered the simplest type of machine learning algorithm [58].

Decision tree

Decision tree is a non-parametric supervised learning algorithm for classification and regression tasks. It is a hierarchical tree structure consisting of a root node, branches, internal nodes and leaf nodes. There are three typical decision tree algorithms: ID3, C4.5 and CART (Classification and Regression Tree). Iterative Dichotomiser 3 (ID3) uses information entropy and information gain as metrics to evaluate candidate splitting [59]. C4.5 is an improved version of ID3, which does not use the information gain directly but introduces the information gain ratio metric as the basis for feature selection [60].

Ensemble learning

The basic idea of the Ensemble Learning algorithm is combining multiple classifiers to achieve an integrated classifier with better prediction. Ensemble learning includes the bagging method, boosting method and stacking method.

Artificial Neural network

Artificial Neural network (ANN) is a model that mimics the structure and function of biological neural networks, especially in the brain [70]. According to the connectionism of networks, ANN can be divided into feed-forward neural networks and feedback neural networks. Feedforward neural networks (FNN) divide each neuron into different groups according to the order of receiving information, and each group can be considered as a neural layer [71]. The neurons in each layer receive the output of the neurons in the previous layer and output to the neurons in the next layer. FNN has two categories depending on the number of layers: single-layer and multi-layer networks [72]. Single-layer FNN is also known as fully connected feedforward neural networks (FC), and a typical multi-layer network is the convolutional neural network (CNN). In feedback neural networks, neurons can receive signals from other neurons and their own feedback signals. Compared with feedforward neural networks, the neurons in feedback neural networks have a memory function and have different states at different moments. Common feedback neural networks include recurrent neural networks (RNN) [73], Hopfield networks [74] and Boltzmann machines [75].

Deep learning

The concept of deep learning originates from the study of artificial neural networks, and a multilayer perceptron with multiple hidden layers is a deep learning structure. Recently, deep learning has been widely used in predictive analytics, including RNN, CNN, Transformer and Nbeats. LSTM is a well-known RNN algorithm used in prediction [76]. DeepAR employs a classical RNN model to solve the time series forecasting problem [77], and Deep state space model is proposed to improve DeepAR limitations [78]. Since DeepAR and Deep state space model are both one-horizon forecast models, MQRNN (multi-horizon forecast model) is designed to simultaneously predict for multiple future time steps [79]. The CNN-LSTM algorithm, which combines CNN and LSTM, has been applied in many predictive analyses [80,81,82].

3.3. Prescriptive Analytics

Prescriptive analytics is the final step of business analytics. Prescriptive analytics mainly refers to the use of operations research methods such as mathematical programming models and intelligent optimization algorithms to give recommendations on the optimal actions that an enterprise should take. Compared to the traditional decision methods which rely too much on human experience, prescriptive analytics gives more reliable and reasonable decisions through scientific approaches including traditional optimization algorithms and heuristic algorithms.

3.3.1. Traditional Optimization Algorithm

Based on the features of the objective function, constraints and decision variables, mathematical programs can be divided into linear programming, nonlinear programming, integer programming, stochastic programming, dynamic programming and so on [89]. In order to solve these problems, many traditional optimization algorithms are proposed. For constrained programming, Simplex algorithm is a well-known linear programming algorithm [90], and penalty-series methods are proposed for nonlinear programming. Gradient Descent Method [91], Quasi-Newton Method [92] and Conjugate gradient method [93] are classical iteration algorithms for unconstrained optimizations.

3.3.2. Heuristic Algorithm

Simple Heuristic Algorithms

Simple heuristic algorithms mainly contain greedy algorithms, local search algorithms and hill-climbing algorithms. The greedy algorithm is an algorithm that takes the optimal choice in the current state at each step of the selection process, thereby hopefully leading to the best or optimal outcome [94]. The local search algorithm is based on the greedy idea of starting with a candidate solution and continuously searching in its neighborhood until there are no better solutions in the neighborhood [95]. The hill-climbing algorithm is a simple greedy search algorithm that selects one optimal solution at a time as the current solution from the proximity solution space of the current solution until a local optimal solution is reached [96].

Meta-heuristic algorithms

Meta-Heuristic algorithms are improvements of simple heuristic algorithms, usually using randomized search techniques, and can be applied to a wide range of problems. Meta-heuristic algorithms include Evolutionary Algorithms, Swarm Intelligence algorithms, Simulated Annealing algorithms and Tabu Search algorithms. Evolutionary algorithms are inspired by the evolutionary mechanisms of living organisms and simulate the evolutionary processes to conduct evolutionary calculations on the candidate solutions of optimization problems. Typical evolutionary algorithms are Genetic Algorithm (GA), Differential Evolution (DE) and Immune Algorithm (IM). Swarm intelligence refers to the property of unintelligent subjects to exhibit intelligent behavior through cooperation and is a computational technique based on the behavioral laws of biological groups. Two representative swarm intelligence algorithms are Particle Swarm Optimization (PSO) [97] and ACO (ant colony optimization) [98]. Simulated Annealing is an algorithm that solves the global optimum by finding states with relatively small objective values in the neighborhood [99]. Tabu search algorithm searches for the optimal solution of the target by searching for a better solution in the solution neighborhood and puts the search history into a Tabu List during the search process to avoid duplicate searches [100].

Hyper-Heuristic algorithms

Hyper-Heuristic algorithms provide a high-level heuristic by managing or manipulating a set of Low-Level Heuristics (LLH) to generate new heuristics. These new heuristics are used to solve various combinatorial optimization problems.

4. Business Analytics Applications

4.1. Applications in Functional Areas

Supply chain management is a representative application of business analytics in the business area. Business analytics has a strong impact on the supply chain performance in the plan, source, make and deliver area [106,107,108]. For example, descriptive analysis helps to identify demand patterns and predict analysis forecasts customer demand in the future through statistical and machine learning algorithms. Based on the predictions, optimization algorithms are used to make pricing and inventory management decisions to maximize retailers’ profit.

In the area of marketing management, business analytics integrates market and customer-related data and uses analysis algorithms to provide managers with a variety of relevant perspectives for better optimization decisions. Among the various areas of marketing, customer relationship management (CRM) is a key area that uses business analytics to analyze, integrate and utilize information resources and customer feedback to support CRM technology, such as acquiring and retaining customers [109].

Risk management is an essential area of company management, and business analytics techniques are widely used in the process of risk management. Predict analysis techniques such as artificial neural networks and support vector machines are applied to establish the early warning system [111,112] and risk evaluation [113,114]. Optimization tools of prescriptive analysis are used to make better risk-based decisions [115].

Strategic management plays an important role in the business area to create or sustain competitive advantages of an enterprise, which consists of analyses, decisions and actions undertaken. Business analytics helps firms to reveal their strengths and weaknesses by identifying business units, activities and processes [116].

The emergency of business analytics drives the development of data-driven human resources (HR) management [120]. Human resources management is progressively increasing its adoption of advanced data analytics, visualization models and techniques to strengthen strategic decision-making and serve the needs of decision-makers. Descriptive analytics uses internal and external organizational data and HR administrative information to generate ratios, metrics, dashboards and reports on HR. Predictive analytics can analyze process data and make predictions. Based on predictive analytics and the large and diverse HR data available, HR departments gain decision options to optimize performance and completely reshape the decision-making process [121].

4.2. Applications in Industry Sectors

Business analytics is widely used in the healthcare sector. Data visualization tools such as dashboards and control charts are used to monitor outcomes and look for variations in process [123]. Descriptive analytics techniques are used to mine genetic data to identify the relationships between human genes, diseases, variants, proteins, cells and biological pathways [124]. Predictive analytics methods help to forecast the emergency and development of diseases [125]. The application of prescriptive algorithms can increase efficiency and reduce costs in the healthcare industry [126].

The retail industry has various applications of business analytics. Retailers can collect customer demographics and behavior data to analyze customer preferences and shopping features through business analytics. The classical one is the market basket analysis using data mining methods to examine large transaction databases and determine which items are most frequently purchased [131,132]. Customer visit segments can be mined by data mining rules [133]. Business analytics techniques are also used in the establishment of recommend systems, especially in the electric-commerce fields [134,135].

5. Challenges in Business Analytics

5.1. Data Quality

With the advent of the Big Data era, the accessibility of data and the volume of data available have increased significantly compared to the past. However, the problem that arises is how to select useful and accurate data for analytics from the vast amount of information. Machine learning plays an important role in business analytics, which relies on data. Thus, business analytics can be considered a data-driven analytics process; so, data quality is very important for subsequent analysis and guidance. In business analytics, data quality challenges mainly include data completeness, consistency and accuracy.

Data accuracy refers to anomalies or errors in the information recorded in the data. Common data accuracy errors include garbled data and abnormally large or small data. There are various outlier detection algorithms, each with its advantages, disadvantages and scope of application, and it is difficult to directly determine which one is the best. In practical applications, an appropriate outlier detection algorithm is selected according to the characteristics of business operations, such as the requirements for computational volume and tolerance for outliers.

5.2. Data Security and Privacy

There is no completely secure data infrastructure unless it is isolated and disconnected from all other networks. However, this is impossible for business analytics, especially when cloud computing emerges [9]. Throughout the data lifecycle, enterprises need to comply with stricter security standards and confidentiality regulations; therefore, the security requirements for data storage and use are increasingly high.

Meanwhile, the security needs of data are changing, and a new complete chain has been formed from data collection, data integration, data refinement, data mining, security analysis, security posture determination and security detection to threat discovery. In this chain, data may be lost, leaked, accessed by unauthorized access, tampered with, or even involved in user privacy and corporate secrets. Therefore, data security protection in the big data environment is a significant challenge for business analytics. From the perspective of customers, there are concerns about the privacy of individuals. The use of the personal data of customers, even within the limits of the law, should be avoided or scrutinized to prevent the organization from adverse effects and public condemnation.

This entry is adapted from the peer-reviewed paper 10.3390/math11040899

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.