Credit Card Fraud Detection: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

With the rapid developments in electronic commerce and digital payment technologies, credit card transactions have increased significantly. Machine learning (ML) has been vital in analyzing customer data to detect and prevent fraud.

  • credit card
  • feature selection
  • fraud detection

1. Introduction

Over the years, electronic payments (e-payments) have been the most common payment option due to technological advancements and the development of several electronic funding methods [1]. E-payment systems are essential to the present competitive financial sector and are mostly performed using credit cards [2]. The introduction of credit cards has resulted in convenient and seamless e-payments. A recent study stated that in the second quarter of 2021, Mastercard and Visa issued 1131 million and 1156 million cards, respectively [3]. However, the rise of credit card usage globally has increased the fraud rate, affecting consumers and merchants [4]. For instance, a report stated that financial losses due to credit and debit cards are among the leading causes of losses in the financial sector [3]. Therefore, developing efficient credit card fraud-detection systems is necessary to reduce such losses.
Machine learning algorithms have been widely employed to detect credit card fraud [5,6,7]. Meanwhile, there have been enormous datasets with very high dimensions due to the advent of big data and the Internet of Things (IoT) [8,9]. Furthermore, some features in these datasets might be redundant or less significant to the response variable. Using such features for machine learning could increase the complexity of the model and lead to overfitting [10]. Therefore, to handle the high dimensionality issue, an approach containing dimensionality reduction, such as feature selection, is necessary to obtain valuable insights and make accurate predictions [11].
Feature-selection techniques aim to identify the most important attributes needed to develop a well-performing machine learning model [12,13], ensuring improved classification performance and reduced computational complexity by removing irrelevant and redundant features. Feature selection techniques are usually grouped into three methodological groups: filters, wrappers, and embedded methods [10,14]. The internal workings and configuration of the various feature-selection methods make them suitable for different applications. Filter methods employ attribute ranking to determine the most informative features. Features that attain scores above a given threshold are selected, and those below the threshold are discarded. After identifying the most important features, they can be fed as input to the learning algorithm. Filter methods vary from wrapper and embedded methods as they are not dependent on a classifier and are, therefore, independent of the classifier’s bias [15].
However, wrapper methods use an ML classifier’s performance as the evaluation metric in selecting the most relevant feature set. Wrapper methods usually lead to better classification performance than filter techniques because the feature-selection procedure is optimized for the chosen classification algorithm [16,17]. Generally, wrapper methods employ a search strategy to identify the candidate subsets. The classifier’s performance on the various feature subsets is measured, and the subset that leads to the highest performance is selected as the most informative subset. Examples of wrapper-based feature selection techniques include the Boruta algorithm, forward selection, backward elimination, and the genetic algorithm. Embedded methods select the features that enhance the model’s performance during training. The feature selection is incorporated into the learning procedure [13]. Unlike wrapper methods, this type of feature selection aims to reduce the time used in training different subsets. Embedded methods include random forest, decision tree, gradient boosting, elastic net, and LASSO [10].
Meanwhile, the GA wrapper is an effective method for feature selection, with applications in diverse domains, including natural language processing (NLP) [18], fraud detection [19], sentiment analysis [20], and medical diagnosis [21]. This study proposes a hybrid feature-selection approach, combining the IG-based filter and GA-based wrapper techniques. The main contributions and objectives of the work include the following:
  • Using the information gain technique for initial feature selection to rank the features in the credit card dataset, only the top-ranked features are fed into the GA wrapper to reduce the search space and enhance the classification performance.
  • Secondly, the GA wrapper is employed to select the best feature subset that results in optimal classification performance, and the ELM is employed as the learning algorithm in the GA wrapper.
  • Additionally, this study employs the G-mean as the fitness function in the GA wrapper instead of the conventional accuracy evaluation criterion, ensuring the recognition rate of the minority samples is considered and improved.
The rationale behind this approach is that the initial IG-based feature selection and ELM’s ability to produce promising performance while converging faster than traditional neural networks could reduce the computational complexity of the GA and improve the classification performance. The ELM is chosen as the learning algorithm in the GA wrapper because it converges far more rapidly and achieves higher generalization performance than conventional neural networks. At the same time, its learning process is thousands of times quicker than neural networks trained via backpropagation [22]. Furthermore, for convenience, the proposed hybrid approach is called IG-GAW. It would be compared with the conventional ELM classifier, an ELM classifier with IG-based feature selection (IG-ELM), the GA wrapper (GAW), and well-performing methods in related literature.

2. Credit Card Fraud Detection

Recently, ML algorithms have been widely applied for credit card fraud detection [23,24,25]. Researchers have used both traditional ML and deep learning (DL) algorithms to predict credit card fraud efficiently. For example, Alarfaj et al. [26] conducted a study using ML and DL techniques for detecting credit card fraud, while Van Belle et al. [27] employed inductive graph representation learning, Esenogho et al. [28] used a neural network ensemble, and Zhang et al. [29] employed an ensemble classifier based on isolation forest and adaptive boosting.
Some problems encountered when dealing with credit card datasets include high dimensionality and imbalance class [30,31], making it difficult for ML classifiers to learn and make accurate predictions. In addition, high dimensional data often make the learning process complex and computationally expensive, resulting in models with poor generalization ability [32]. Therefore, feature selection is essential in such datasets to reduce the computational burden and enhance the model’s generalization ability. For example, Chaquet-Ulldemolins et al. [33] recorded an increase in the classification performance of ML classifiers after introducing feature selection. Generally, feature-selection methods are useful in applications where the number of features affects the classifier’s performance.
The wrapper feature-selection methods have been widely applied in numerous applications [34,35]. They compute the importance of each feature based on its usefulness when training the ML model. The primary components of a wrapper method are the learning classifier and search strategy. The wrapper technique exists as a wrapper around the learning classifier and uses the same classifier to select the most relevant features. Therefore, a robust learning classifier could enhance the wrapper-based feature selection. Furthermore, the search strategy employed in the wrapper could affect the feature selection, and using the right search strategy for a given application is crucial in obtaining good performance.
Evolutionary search techniques such as genetic algorithms can avoid becoming stuck in local optima. Unlike deterministic algorithms, they can identify reduced feature sets that can effectively represent the original feature set [36]. The GA-based wrapper can easily identify feature redundancy and correlations. In addition, selecting a suitable classifier is vital in developing robust GA wrapper models since the wrapper procedure is tied to the selected classifier’s performance. However, there are specific issues to consider when selecting the classifier. Firstly, the classifier should be able to achieve good classification performance and have excellent generalization ability. Secondly, since the classifier would be used to train numerous subsets, it should have good training speed. Thirdly, the number of features in the various subsets might differ. Therefore, using the same model parameters might not be enough to obtain good performance in all the subsets [37]. Hence, it would be preferred to use a classifier that automatically updates the model parameters for every feature subset to achieve good performance.
Other recent methods for credit card fraud detection include a signal processing framework [38], signal processing on graphs [39], and a deep learning ensemble [40]. In addition, in the literature, several learning algorithms (such as decision tree [41], naïve Bayes [42], SVM [43], and random forest [44]) have been used as the classifier in the GA wrapper. However, these classifiers are not able to consider the issues mentioned above. Therefore, a hybrid wrapper approach that considers all the above-mentioned issues is proposed. The proposed approach employs the IG-based filter feature selection to rank the attributes, and only the top-ranked features would be used as input into the GA wrapper. Meanwhile, the GA wrapper employs the ELM as the learning classifier. The ELM can achieve excellent classification performance and generalization ability with an extremely fast learning speed compared to conventional training methods. Furthermore, unlike traditional neural networks based on backpropagation algorithms, the ELM’s training process is entirely automatic and does not require it to be tuned iteratively.

This entry is adapted from the peer-reviewed paper 10.3390/app13127254

This entry is offline, you can click here to edit this entry!