Business Purchase Prediction Based on Artificial Intelligence

Business Purchase Prediction Based on Artificial Intelligence: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Computer Science, Artificial Intelligence

Contributor:

An architecture of a machine learning time series prediction system for business purchase prediction based on neural networks and enhanced with Explainable artificial intelligence (XAI) techniques is proposed. The architecture is implemented on an example of a system for predicting the following purchases for time series using Long short-term memory (LSTM) neural networks and Shapley additive explanations (SHAP) values. Explanations generated by the XAI module are provided with the prediction results to the user to allow them to understand the system’s decisions.

explainable AI
neural networks
purchase prediction

1. Introduction

The Industry 4.0 paradigm has the goal of automating all business processes and replacing human workers wherever that is possible. The application of technologies belonging to Industry 4.0 is an ongoing process. The introduction of artificial intelligence into business systems is part of this process. The role of AI systems is to make recommendations, classify instances of specific objects, perform predictions of future values for certain features, etc. The performance of these systems is measured with metrics appropriate for the specific task the system is performing. However, especially in domains dealing with sensitive data (medicine, military) [1,2], for the system to be used in practice, trust in the system is also required. Many AI systems, such as neural networks, operate as black boxes, i.e., the users only know the input and expected output but not how the input is transformed to produce the output [1]. It is, therefore, expected that trust in such a system is difficult to achieve. Explainable artificial intelligence (XAI) gives a means to justify and interpret the decisions made by the system and makes the process transparent to the user. Its main focus is to explain the reasoning of an AI model. When the user understands why the system has produced a specific output, they can view it critically and make a judgment about it more easily [2]. Mistakes are, therefore, easier to distinguish, but some peculiar decisions may seem more understandable and appropriate. The goal of incorporating XAI can be viewed as keeping the humas in the loop and in the center. This goal aligns with the arising Industry 5.0 paradigm, where expert workers will manage and oversee automated processes, creating collaboration between humans and machines [3]. Domain experts must be confident that the system makes appropriate decisions in order to enforce them.

There are different approaches to introducing explainability into a machine learning system. Two main directions are developing systems that already include explainability at their core and adding explainability components to existing systems [1]. Both approaches have benefits and disadvantages. While developing an innately explainable system seems like a superior proposition, it may result in a system with inferior performance, either in efficiency or accuracy [1,4]. Additionally, it may be more expensive or more complicated to design and build a new explainable system than to upgrade an existing well-performing system by incorporating an explainability module.

2. Explainable Artificial Intelligence

Most of the existing literature on XAI deals with the problem of classification, specifically assigning a single class to an instance described by selected input features [6]. Another common theme is that most researchers focus on image and text classification, presumably because features derived from images and text can be more understandable to humans than numerical features. However, there are some research studies geared towards explainability in systems that solve regression problems. Conclusions that can be derived from these research studies are that not all XAI methods are suitable for regression and that applying them to regression is not always straightforward. Additionally, the authors of [6] explicitly recommend SHAP as one of the preferred methods to use when dealing with a regression problem. Perhaps this is indicative of the need to define individual approaches to XAI incorporation for different classes of problems.

The systematic meta-survey of challenges and future research directions in XAI [2] focuses on two main themes: general challenges and research directions in XAI and those that are based on machine learning lifecycle phases. Some of the most significant conclusions highlighted are the role of explainability in fostering trustworthy AI, the interpretability vs. performance trade-off, the value of communicating underlying uncertainties in the model to the user, and the imperative to establish reproducibility standards for XAI models in order to alleviate comparison of existing work and new ideas. One of the main contributions is defining the distinction between interpretability and explainability.

A detailed analysis of XAI research across various domains and applications is given in [7]. It provides an additional perspective on interpretability techniques as tools to give machine learning models the ability to explain or present their behavior understandably to humans. The authors deem that XAI will become mandatory in the near future to address transparency in designing, validating, and implementing black-box models. As an especially important case for introducing proper explanations, safety-critical applications are listed where assurance and explainability methods have yet to be developed.

The study of examining the application of existing XAI methodologies to financial timer series prediction was described in [8]. Ablation, permutation, added noise, and integrated gradients were applied to a gated recurrent unit network, a long short-term memory neural network, and a recurrent neural network. The explainability analysis was focused on the ability to retain long-term information, and different XAI methods provided complementary results. The overall conclusion was that existing methods were transferable to financial prediction; however, a development of less abstract metrics with more practical information was recommended.

Ref. [6] is a review of conceptual differences in applying XAI in classification and regression. Novel insights and analysis in XAI for regression models are established as well. Demonstrations of XAI for regression are given for a few practical regression problems, such as image data and molecular data from atomistic simulations. An especially meaningful conclusion is that overall benefit to the user can be ensured by extending the evaluation while considering whether an attribution of input features or a more structured explanation is more desirable.

XAI is regarded from a multimedia (image, audio, video, and text) point of view in [1], and methods are grouped for each of the media types with the aim of providing a reference for future multi-modal applications. The need for transparency and trust by laypeople is highlighted as a reason to step away from the traditional black-box model and towards explainability. This is demonstrated in two specific case studies. However, some key issues with XAI are also outlined, such as providing identical explanations for multiple classes or the possibility of achieving the same predictions with different sets of features.

In [9], convolutional neural networks (CNN) are used to achieve explainable predictions with multivariate time series data. This is achieved with a two-stage CNN architecture, which allows the use of gradient-based techniques for creating saliency maps. Saliency maps are defined for both the time dimension and features of the data. The specific type of two-stage network utilized results in preserving the temporal and spatial dynamics of the multivariate time series throughout the complete network. Explainability consists of determining specific features responsible for a given prediction during a defined time interval, but also detecting time intervals during which the joint contribution of all features is most important for prediction.

3. Long-Short-Term Memory Neural Networks

Due to the non-stationary nature of financial time series, difficulties are found when trying to analyze them using statistical methods [10]. LSTM neural networks have been used both for financial data prediction [11,12] and general purchase prediction [13,14]. In experiments with input length [15], LSTM performed better when using longer time ranges compared to other types of neural networks and statistical methods. They are generally used for time series with long-term dependencies, as they are particularly suitable for such applications [16].

In [4], an energy usage forecasting model based on LSTM neural networks and explainable artificial intelligence was proposed. In the experiments conducted, this model achieved high performance in forecasting, and the SHAP method was used to identify features that had a strong influence on the model output. The authors emphasized the expectation that the model will offer insight for policymakers and industry leaders to make more informed decisions, develop more effective strategies, and support the transition to sustainable development.

A visually explainable LSTM network framework focused on temporal prediction was introduced in [17]. Throughout the entire architecture, irregular instances highlight the hindrance to the training process. Users are supported in customizing and rearranging network structures by the interactive features of the framework. The evaluation is performed on several use cases, presenting framework features such as highlighting abnormal time series, filtering, focusing on temporal profiles, and explaining temporal contributions vs. variable contributions.

4. Purchase Prediction

In the field of purchase prediction, a great deal of research is focused on forecasting the object of the next purchase, primarily in systems that recommend products to customers [18,19]. The recommendations are generally based on customer preferences, product relationships, and customer purchasing histories. A greater number of feature interactions were detected in [20] for customers that proceeded with purchases than for those that did not. These results were achieved by considering 22 decontextualized features defining customer purchasing decisions as input for a Naïve Bayes classifier and a random forest.

Another direction is the prediction of the next purchase timing, which can be viewed combined with the purchase target or separately [5,21]. The approach described in [22] consists of utilizing customer features derived from times and contents of earlier purchases to predict if the customer will make a purchase in a predefined time frame, with features being recalculated each month. The gradient tree boosting technique was the most successful technique in this research, and the biggest challenge was differentiating between customers that decided to shift to another supplier and those that simply had a gap in their transactions.

Analyzing purchase confirmation emails and customer characteristics such as age, gender, income, location, etc. were used to build a model for prediction of the next purchase date and the spending amount for each of the customers [23]. This consumer behavior analysis yielded the highest accuracy when used in combination with Bayesian network classification.

An interesting approach used in [13] relied on the collection of tweets that mentioned mobile devices and cameras for purchase prediction. The sequential nature of tweets was shown to be a very significant factor in the process of predicting a realized purchase. While an LSTM neural network had the best performance in determining which user would buy a device, a feed-forward neural network proved most successful in assigning relevance to customer purchase behavior.

Predicting day of the week, time of the day, and product category is the topic of a multi-task LSTM neural network model presented in [14] that uses online grocery shopping transactions as input data. Multiple network settings and feature combinations were tested, but none was the most successful in all three tasks, with the product category being the most difficult to predict.

This entry is adapted from the peer-reviewed paper 10.3390/electronics12214510

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.