Techniques for Credit Card Fraud Detection: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , , ,

Fraudulent activities are on the rise within the financial sector, with an escalating trend observed in credit card fraud. The incidence of credit card fraud is expanding swiftly in tandem with the increasing daily usage of credit cards. The Federal Trade Commission (FTC) report underscores the severity of the issue, noting that 2021 marked the most challenging year in history for identity theft. It is crucial to note that many cases of identity theft go unreported, suggesting that the actual number may surpass the reported figures. The FTC report emphasises the need for innovative approaches to safeguard the financial well-being of both consumers and businesses. In the era of digital advancements, the escalation of credit card fraud necessitates the development of robust and efficient fraud detection systems. 

  • credit card fraud detection
  • ensemble model
  • machine learning
  • data

1. Statistical Methods

Statistical approaches have been extensively employed in the identification of credit card fraud. These methods discover suspicious trends by analysing the statistical properties of transaction data [10]. Statistical models identify outlier transactions using thresholds or criteria. Popular statistical methods include descriptive statistics, hypothesis testing, and time series analysis.
Descriptive statistics, hypothesis testing, and time-series analysis detect credit card fraud. Descriptive statistics, such as mean, standard deviation, and percentiles, can help uncover abnormal transactions [11]. Hypothesis testing compares genuine and fraudulent transactions using null and alternative hypotheses and statistical tests like t-tests or chi-square tests [12]. ARIMA (AutoRegressive Integrated Moving Average) models and STL (Seasonal and Trend Decomposition using Loess) provide transaction data patterns and trends for fraud detection [13].

2. Deep Learning (DL) in Credit Card Fraud Detection

Deep learning teaches multi-layered neural networks hierarchical data representations. These techniques collect complex patterns and relevant attributes from high-dimensional data. They revolutionised computer vision, natural language processing, and credit card fraud detection. Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Multilayer Feed Forward Neural Networks (MLFF), Artificial Neural Networks (ANNs), and Recurrent Neural Networks (RNNs) are some of the deep learning algorithms.
Deep learning techniques, such as Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Generative Adversarial Networks (GAN), have revolutionised various fields, including credit card fraud detection. CNNs are adept at classifying images and extracting features from temporal data, making them suitable for detecting fraud in transaction sequences. LSTM, as a recurrent neural network, excels at analysing sequential data and capturing long-term dependencies, allowing it to identify complex fraud patterns involving multiple transactions effectively. GANs, with their generator and discriminator networks, can synthesise realistic fraud patterns, enhancing the adaptability and robustness of fraud detection systems. These deep-learning approaches have significantly improved the accuracy and efficiency of credit card fraud detection [14,15,16,17].

3. Machine Learning (ML) in Credit Card Fraud Detection

Due to the ability to learn from data, find complex patterns, and predict credit card theft, machine learning algorithms are important in credit card fraud detection. These algorithms are supervised and unsupervised learning methods. A few of the algorithms used for CCFD (Credit Card Fraud Detection) include Logistic Regression (LR), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Naive Bayes (NB), Decision Trees (DT), Random Forest (RF), and Tree-Augmented Naive Bayes (TAN).
For credit card fraud detection, SVM, KNN, NB, DT, RF, and TAN are powerful machine learning models. SVM classifies data points using the best hyperplane [18], KNN classifies transactions based on their K-Nearest Neighbors [19], NB uses probabilistic learning to estimate class probabilities [20], DT generates decision trees for feature-based classification [20], RF combines decision trees to reduce overfitting [21], and TAN enhances NB with a tree-like dependency structure to capture feature correlations [22]. These models offer diverse approaches to identifying and preventing fraudulent transactions, contributing to robust fraud detection systems. Credit card fraud detection algorithms have pros and downsides. When choosing an algorithm for an application, dataset size, feature space, processing needs, interpretability, and fraud must be considered.
Several researchers have highlighted the route to improved fraud prevention and detection in this comprehensive analysis of credit card fraud detection with machine learning. In [23], Prasad Chowdary et al. propose an ensemble technique to improve credit card fraud detection. The authors focus on optimising model parameters, improving performance measures, and integrating deep learning to fix identification errors and reduce false negatives. Decision Tree (DT), Gradient Boosting Classifier (XGBoost), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine were used in this paper. The paper compares these algorithms across multiple evaluation metrics and finds that DT performs best with a 100% recall value, followed by XGBoost, LR, RF, and SVM with 85%, 74.49%, 75.9%, and 69%, respectively. By combining multiple classifier ensembles and rigorously assessing their performance, this project greatly improves CCFD system efficiency. However, the evaluation parameters reveal the low performance of the model.
Sahithi et al. [1] developed a credit card fraud detection algorithm in 2022. Their model used a Weighted Average Ensemble to combine Logistic Regression (LR), Random Forest (RF), K-Nearest Neighbors (KNN), Adaboost, and Bagging. The paper used the European Credit Card Company dataset. Their model had 99% accuracy, topping base models like RF Bagging (98.91%), LR (98.90%), Adaboost (97.91%), KNN (97.81%), and Bagging (95.37%). This paper shows that their ensemble model can detect credit card theft in this key domain. Nevertheless, the feature selection process was not provided, which hinders reproducibility.
Also, in 2022, Qaddoura et al. [24] investigated the effectiveness of oversampling methods: SMOTE, ADASYN, borderline1, borderline2, and SVM oversampling algorithms for credit card fraud detection. The paper used Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Decision Tree. The authors found that oversampling can improve model performance, although the exact strategy depends on the machine learning algorithm. However, the applicability of the model in real-life situations can be affected due to the computational overhead.
Tanouz et al. [25] extensively studied machine learning for credit card fraud classification. The Decision Trees classifier, Random Forest (RF), Logistic Regression (LR), and Naive Bayes (NB) were evaluated, with a focus on imbalanced datasets. This investigation showed that the Random Forest (RF) approach performed well, scoring 96.77%. Logistic Regression (LR), Naive Bayes (NB), and Decision Trees classifiers had accuracy scores of 95.16, 95.16, and 91.12%, respectively. The detailed investigation shows that Random Forest is effective at credit card fraud detection, which is vital to financial security. Nonetheless, the performance of the proposed models is hampered due to the lack of feature selection.
The fundamental objective of the study [26] undertaken by Ruttala et al. was to provide a comparative examination of the Random Forest and AdaBoost algorithms in the context of credit card fraud detection. The findings of their analysis demonstrated similar levels of accuracy when comparing the two algorithms. It is worth mentioning that the Random Forest method demonstrated higher performance in terms of precision, recall, and F1-score compared to Adaboost. However, the dataset used by the authors is skewed, with no clear mention of how the issue was addressed.
The primary objective of the research performed by Sadgali et al. [27] was to identify the most effective approaches for detecting financial fraud. The methodology employed in their paper involved the utilisation of a wide range of techniques, such as Support Vector Machine (SVM), Bayesian Belief Networks, Naive Bayes, Genetic Algorithm, Multilayer Feed Forward Neural Network (MLFF), and Classification and Regression Tree (CART). Significantly, as a comprehensive and evaluative investigation of previous scholarly studies, the present paper did not require the use of a particular dataset for analysis. Their results highlighted the dominant performance of Naive Bayes, which achieved the greatest accuracy rate of 99.02%. SVM closely followed it with an accuracy rate of 98.8%, and the genetic algorithm had an accuracy rate of 95%. Despite that, the authors limited their work to insurance fraud.
The study conducted by Raghavan et al. [28] aimed to detect anomalies or fraudulent actions using data mining techniques. They utilised three distinct datasets from Australia (AU), Germany, and the European (EU) to achieve this objective. Their work employed Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Random Forest algorithms, in addition to creating two separate ensembles: one integrating KNN, SVM, and Convolutional Neural Network (CNN) and another combining KNN, SVM, and Random Forest. Their findings highlighted the dominant performance of the Support Vector Machine (SVM) in terms of accuracy, achieving a notable rate of 68.57%. In comparison, Random Forest and KNN exhibited accuracy of 64.37% and 60.47%, respectively. The present paper offers a comprehensive examination that yields useful information regarding the effectiveness of various algorithms and ensemble tactics within the domain of fraud detection. However, the performance of the model was low for all the datasets used.
Saputra et al. [29] compare the effectiveness of Decision Tree, Naïve Bayes, Random Forest, and Neural Network machine learning approaches. SMOTE was used to solve the problems of imbalanced datasets. This study’s dataset was provided by Kaggle. At 0.093% of records, the dataset included few fraudulent transactions. The examination using confusion matrices revealed that the Neural Network had the highest accuracy (96%), followed by Random Forest (95%), Naïve Bayes (95%), and Decision Tree (91%). SMOTE enhanced the average F1-Score and G-Score performance measures and addressed skewed data, proving its benefits. However, the dataset used in the paper does not fully represent all the e-commerce platforms.
A comparative analysis of credit card fraud detection methods was conducted by Tiwari et al. [30]. The authors examined SVM, ANN, Bayesian Network, K-Nearest Neighbor (KNN), Hidden Markov Model, Fuzzy Logic-Based System, and Decision Trees. Analysis of the KDD dataset from the standard KDD CUP 99 Intrusion Dataset showed differing accuracy levels across approaches: SVM—94.65%, ANN—99.71%, Bayesian—97.52%, K-Nearest Neighbors—97.15%, Hidden Markov Model (HMM)—95.2%, Fuzzy Logic-Based System—97.93%, and Decision Trees—94.7%. This extensive assessment evaluated numerous credit card fraud detection methods. However, the dataset did not fully depict financial activities.
Naik et al. [31] evaluated and compared some machine learning algorithms, including Naïve Bayes, J48, Logistic Regression, and AdaBoost, in the domain of Credit Card Fraud Detection (CCFD). Their approach utilised an online dataset consisting of 1000 items that contained both fraudulent and non-fraudulent transactions. The results indicated high levels of accuracy, with Logistic Regression and AdaBoost having a perfect accuracy rate of 100%. Naïve Bayes and J48 also displayed noteworthy accuracies of 83% and 69.93%, respectively. The findings above highlighted the diverse abilities of different algorithms in tackling the complexities associated with credit card fraud detection situations, providing useful insights for the advancement of resilient fraud detection systems. Nevertheless, the dataset used by the authors was limited to 1000 credit card transaction records, which is not typical of the credit card user population.
Karthik et al. [9] introduced a novel model for credit card fraud detection that combines ensemble learning techniques such as boosting and bagging. The model incorporates the key characteristics of both techniques to obtain a hybrid model of bagging and boosting ensemble classifiers. The authors employed Adaboost for feature engineering of the behavioural feature space. The model’s predictive performance was analysed using the area under the precision-recall (AUPR) curve, showing marginal improvement in the range of 58.03–69.97% and 54.66–69.40% on the Brazilian bank dataset and UCSD-FICO dataset, respectively. Nevertheless, the paper did not provide an in-depth analysis of the computational complexity or resource requirements of the proposed model.
Similarly, Forough et al. [8] proposed an ensemble model based on the sequential modelling of data using deep recurrent neural networks and a novel voting mechanism based on an artificial neural network to detect fraudulent action. The proposed model uses several recurrent networks as the base classifier, either LSTM or GRU networks, and aggregates their output using a feed-forward neural network (FFNN) as the voting mechanism. The ensemble model based on GRU achieves its best results using two base classifiers on both the European cards dataset and the Brazilian dataset. It outperforms the solo GRU model in all metrics and the baseline ensemble model in most metrics. However, the authors did not discuss the limitations of the proposed ensemble model based on the sequential modelling of data using deep recurrent neural networks and a novel voting mechanism.
Esenogho et al. [32] proposed an efficient approach for credit card fraud detection using a neural network ensemble classifier and a hybrid data resampling method. The ensemble classifier was obtained using a long short-term memory (LSTM) neural network as the base learner in the adaptive boosting (AdaBoost) technique. The hybrid resampling technique used in this approach is the synthetic minority oversampling technique and modified nearest neighbour (SMOTE-ENN) method. SMOTE is an oversampling technique that balances the class distribution by adding synthetic samples to the minority class, while ENN is an under-sampling method that removes some majority class samples. SMOTE-ENN performs both oversampling and under-sampling to obtain a balanced dataset. However, the authors did not explore the impact of different hyperparameter settings or variations in the neural network architecture on the performance of the proposed method.
Table 1 presents a summary of ensemble machine-learning models used for credit card fraud detection.
Table 1. Comparison of ML Techniques Used in Credit Card Fraud Detection Research.

This entry is adapted from the peer-reviewed paper 10.3390/bdcc8010006

This entry is offline, you can click here to edit this entry!
Video Production Service