In statistical predictive techniques, statistical theories and methods are used for prediction by building statistical models and fitting the model parameters with past data. There are two groups of methods in statistical predictive techniques: regression models and time series models.
Regression model is one of the most famous statistical techniques used to predict. The linear regression model is the basic model which is represented as an equation that finds specific weights for the input variables, which in turn describe a straight line that best fits the relationship between the input variables and the output variables [
50]. When the output variable is a categorical variable, a classification model such as the logistic regression model [
51] is needed. Meanwhile, polynomial regression models are used to fit nonlinear relationships between variables.
Time series models can be divided into two groups: exponential smoothing models and ARIMA series models [
52]. The exponential smoothing model decomposes the time series into components and uses an additive or multiplicative structure to reassemble the smoothed components to predict future values [
53]. Typical exponential smoothing models include simple exponential smoothing, Holt’s exponential smoothing and Holt-Winters’ seasonal exponential smoothing [
54]. The ARIMA series model mainly includes AR (AutoRegressive) model, MA (Moving Average) model, ARMA (AutoRegressive Moving Average) model, ARIMA (AutoRegressive Integrated Moving Average) model and SARIMA (seasonal ARIMA) model [
55].
3.2.2. Machine Learning and Artificial Intelligence Techniques
With the advent of the big data era, machine learning to guide predictive analytics has become a widely used approach. There are many classic machine learning prediction algorithms, such as support vector machine, nearest neighbor, decision tree, ensemble learning and artificial neural network and more advanced deep learning techniques.
Support vector machine
In machine learning, the support vector machine (SVM) is a supervised learning model for analyzing data in classification and regression analysis with associated learning algorithms [
56]. Given a set of training instances, each labeled as belonging to one or the other of two classes, the SVM training algorithm builds a model that assigns new instances to one of the two classes. Thus, it is usually used to predict binary classification problems.
Nearest neighbor
The k-nearest neighbor algorithm, known as KNN, is a non-parametric supervised learning classifier [
57]. It can be applied to classification problems or regression problems. In a classification problem, the output is a member of a category, and in a regression problem, the output is the value of an object’s attributes. The nearest neighbor is considered the simplest type of machine learning algorithm [
58].
Decision tree
Decision tree is a non-parametric supervised learning algorithm for classification and regression tasks. It is a hierarchical tree structure consisting of a root node, branches, internal nodes and leaf nodes. There are three typical decision tree algorithms: ID3, C4.5 and CART (Classification and Regression Tree). Iterative Dichotomiser 3 (ID3) uses information entropy and information gain as metrics to evaluate candidate splitting [
59]. C4.5 is an improved version of ID3, which does not use the information gain directly but introduces the information gain ratio metric as the basis for feature selection [
60].
Ensemble learning
The basic idea of the Ensemble Learning algorithm is combining multiple classifiers to achieve an integrated classifier with better prediction. Ensemble learning includes the bagging method, boosting method and stacking method.
Artificial Neural network
Artificial Neural network (ANN) is a model that mimics the structure and function of biological neural networks, especially in the brain [
70]. According to the connectionism of networks, ANN can be divided into feed-forward neural networks and feedback neural networks. Feedforward neural networks (FNN) divide each neuron into different groups according to the order of receiving information, and each group can be considered as a neural layer [
71]. The neurons in each layer receive the output of the neurons in the previous layer and output to the neurons in the next layer. FNN has two categories depending on the number of layers: single-layer and multi-layer networks [
72]. Single-layer FNN is also known as fully connected feedforward neural networks (FC), and a typical multi-layer network is the convolutional neural network (CNN). In feedback neural networks, neurons can receive signals from other neurons and their own feedback signals. Compared with feedforward neural networks, the neurons in feedback neural networks have a memory function and have different states at different moments. Common feedback neural networks include recurrent neural networks (RNN) [
73], Hopfield networks [
74] and Boltzmann machines [
75].
Deep learning
The concept of deep learning originates from the study of artificial neural networks, and a multilayer perceptron with multiple hidden layers is a deep learning structure. Recently, deep learning has been widely used in predictive analytics, including RNN, CNN, Transformer and Nbeats. LSTM is a well-known RNN algorithm used in prediction [
76]. DeepAR employs a classical RNN model to solve the time series forecasting problem [
77], and Deep state space model is proposed to improve DeepAR limitations [
78]. Since DeepAR and Deep state space model are both one-horizon forecast models, MQRNN (multi-horizon forecast model) is designed to simultaneously predict for multiple future time steps [
79]. The CNN-LSTM algorithm, which combines CNN and LSTM, has been applied in many predictive analyses [
80,
81,
82].
3.3. Prescriptive Analytics
Prescriptive analytics is the final step of business analytics. Prescriptive analytics mainly refers to the use of operations research methods such as mathematical programming models and intelligent optimization algorithms to give recommendations on the optimal actions that an enterprise should take. Compared to the traditional decision methods which rely too much on human experience, prescriptive analytics gives more reliable and reasonable decisions through scientific approaches including traditional optimization algorithms and heuristic algorithms.
3.3.1. Traditional Optimization Algorithm
Based on the features of the objective function, constraints and decision variables, mathematical programs can be divided into linear programming, nonlinear programming, integer programming, stochastic programming, dynamic programming and so on [
89]. In order to solve these problems, many traditional optimization algorithms are proposed. For constrained programming, Simplex algorithm is a well-known linear programming algorithm [
90], and penalty-series methods are proposed for nonlinear programming. Gradient Descent Method [
91], Quasi-Newton Method [
92] and Conjugate gradient method [
93] are classical iteration algorithms for unconstrained optimizations.
3.3.2. Heuristic Algorithm
Simple Heuristic Algorithms
Simple heuristic algorithms mainly contain greedy algorithms, local search algorithms and hill-climbing algorithms. The greedy algorithm is an algorithm that takes the optimal choice in the current state at each step of the selection process, thereby hopefully leading to the best or optimal outcome [94]. The local search algorithm is based on the greedy idea of starting with a candidate solution and continuously searching in its neighborhood until there are no better solutions in the neighborhood [95]. The hill-climbing algorithm is a simple greedy search algorithm that selects one optimal solution at a time as the current solution from the proximity solution space of the current solution until a local optimal solution is reached [96].
Meta-heuristic algorithms
Meta-Heuristic algorithms are improvements of simple heuristic algorithms, usually using randomized search techniques, and can be applied to a wide range of problems. Meta-heuristic algorithms include Evolutionary Algorithms, Swarm Intelligence algorithms, Simulated Annealing algorithms and Tabu Search algorithms. Evolutionary algorithms are inspired by the evolutionary mechanisms of living organisms and simulate the evolutionary processes to conduct evolutionary calculations on the candidate solutions of optimization problems. Typical evolutionary algorithms are Genetic Algorithm (GA), Differential Evolution (DE) and Immune Algorithm (IM). Swarm intelligence refers to the property of unintelligent subjects to exhibit intelligent behavior through cooperation and is a computational technique based on the behavioral laws of biological groups. Two representative swarm intelligence algorithms are Particle Swarm Optimization (PSO) [97] and ACO (ant colony optimization) [98]. Simulated Annealing is an algorithm that solves the global optimum by finding states with relatively small objective values in the neighborhood [99]. Tabu search algorithm searches for the optimal solution of the target by searching for a better solution in the solution neighborhood and puts the search history into a Tabu List during the search process to avoid duplicate searches [100].
Hyper-Heuristic algorithms
Hyper-Heuristic algorithms provide a high-level heuristic by managing or manipulating a set of Low-Level Heuristics (LLH) to generate new heuristics. These new heuristics are used to solve various combinatorial optimization problems.
4. Business Analytics Applications
4.1. Applications in Functional Areas
Supply chain management is a representative application of business analytics in the business area. Business analytics has a strong impact on the supply chain performance in the plan, source, make and deliver area [
106,
107,
108]. For example, descriptive analysis helps to identify demand patterns and predict analysis forecasts customer demand in the future through statistical and machine learning algorithms. Based on the predictions, optimization algorithms are used to make pricing and inventory management decisions to maximize retailers’ profit.
In the area of marketing management, business analytics integrates market and customer-related data and uses analysis algorithms to provide managers with a variety of relevant perspectives for better optimization decisions. Among the various areas of marketing, customer relationship management (CRM) is a key area that uses business analytics to analyze, integrate and utilize information resources and customer feedback to support CRM technology, such as acquiring and retaining customers [
109].
Risk management is an essential area of company management, and business analytics techniques are widely used in the process of risk management. Predict analysis techniques such as artificial neural networks and support vector machines are applied to establish the early warning system [
111,
112] and risk evaluation [
113,
114]. Optimization tools of prescriptive analysis are used to make better risk-based decisions [
115].
Strategic management plays an important role in the business area to create or sustain competitive advantages of an enterprise, which consists of analyses, decisions and actions undertaken. Business analytics helps firms to reveal their strengths and weaknesses by identifying business units, activities and processes [
116].
The emergency of business analytics drives the development of data-driven human resources (HR) management [
120]. Human resources management is progressively increasing its adoption of advanced data analytics, visualization models and techniques to strengthen strategic decision-making and serve the needs of decision-makers. Descriptive analytics uses internal and external organizational data and HR administrative information to generate ratios, metrics, dashboards and reports on HR. Predictive analytics can analyze process data and make predictions. Based on predictive analytics and the large and diverse HR data available, HR departments gain decision options to optimize performance and completely reshape the decision-making process [
121].
4.2. Applications in Industry Sectors
Business analytics is widely used in the healthcare sector. Data visualization tools such as dashboards and control charts are used to monitor outcomes and look for variations in process [
123]. Descriptive analytics techniques are used to mine genetic data to identify the relationships between human genes, diseases, variants, proteins, cells and biological pathways [
124]. Predictive analytics methods help to forecast the emergency and development of diseases [
125]. The application of prescriptive algorithms can increase efficiency and reduce costs in the healthcare industry [
126].
The retail industry has various applications of business analytics. Retailers can collect customer demographics and behavior data to analyze customer preferences and shopping features through business analytics. The classical one is the market basket analysis using data mining methods to examine large transaction databases and determine which items are most frequently purchased [
131,
132]. Customer visit segments can be mined by data mining rules [
133]. Business analytics techniques are also used in the establishment of recommend systems, especially in the electric-commerce fields [
134,
135].
5. Challenges in Business Analytics
5.1. Data Quality
With the advent of the Big Data era, the accessibility of data and the volume of data available have increased significantly compared to the past. However, the problem that arises is how to select useful and accurate data for analytics from the vast amount of information. Machine learning plays an important role in business analytics, which relies on data. Thus, business analytics can be considered a data-driven analytics process; so, data quality is very important for subsequent analysis and guidance. In business analytics, data quality challenges mainly include data completeness, consistency and accuracy.
Data accuracy refers to anomalies or errors in the information recorded in the data. Common data accuracy errors include garbled data and abnormally large or small data. There are various outlier detection algorithms, each with its advantages, disadvantages and scope of application, and it is difficult to directly determine which one is the best. In practical applications, an appropriate outlier detection algorithm is selected according to the characteristics of business operations, such as the requirements for computational volume and tolerance for outliers.
5.2. Data Security and Privacy
There is no completely secure data infrastructure unless it is isolated and disconnected from all other networks. However, this is impossible for business analytics, especially when cloud computing emerges [
9]. Throughout the data lifecycle, enterprises need to comply with stricter security standards and confidentiality regulations; therefore, the security requirements for data storage and use are increasingly high.
Meanwhile, the security needs of data are changing, and a new complete chain has been formed from data collection, data integration, data refinement, data mining, security analysis, security posture determination and security detection to threat discovery. In this chain, data may be lost, leaked, accessed by unauthorized access, tampered with, or even involved in user privacy and corporate secrets. Therefore, data security protection in the big data environment is a significant challenge for business analytics. From the perspective of customers, there are concerns about the privacy of individuals. The use of the personal data of customers, even within the limits of the law, should be avoided or scrutinized to prevent the organization from adverse effects and public condemnation.