Customer Churn: Comparison
Please note this is a comparison between Version 1 by Milan Mirkovic and Version 2 by Rita Xu.

Customer churn is a problem virtually all companies face, and the ability to predict it reliably can be a cornerstone for successful retention campaigns. 

  • churn prediction
  • machine learning
  • B2B
  • predictive analytics

1. Background

Companies across virtually all industry branches have long since recognized the importance of keeping their customers engaged and active, as that directly translates into more revenue and reduces the overall costs, especially given the fact that it can be several times more expensive to attract a new customer than to retain an existing one [1]. However, since customers tend to explore different offers and options on the market and are always on the lookout for better deals and opportunities, understanding when they are about to terminate further transactions with a company is paramount for formulating effective and efficient strategies to try and persuade them otherwise. The phenomenon when a customer stops making purchases from a company (that is, when they stop buying products or paying for the services a company offers) is known as customer churn and the ability to predict it accurately can have significant implications on different processes across the organization (e.g., marketing, sales, procurement), as well as on the overall profitability [2]. However, even though identifying customers who are at risk of leaving is recognized as one of the key prerequisites for devising retention activities [3], there are many complexities pertinent to just defining churn, which stem from the fact that numerous contexts and business models exist when it comes to organizations operating in distinct domains and environments [4]. For example, companies leveraging contractual business models (such as those offering subscriptions to services or products) might be able to directly observe customer churn (when a subscription expires and is not renewed or is terminated by a customer), but need to decide whether to take into account all subscriptions a customer might have (total or complete churn) or just those pertinent to particular groups of services or products (partial churn) [5]. Companies operating in non-contractual environments (such as retail or wholesale) have an even more difficult task, since there is no way to explicitly observe churn due to the fact that customer purchasing frequencies or payments are not known in advance and they are free to transact with the company whenever they wish. This implies that one of the biggest challenges faced by organizations relying on this business model face is to determine a meaningful time period to use for defining a customer as lost (e.g., if no purchases are made in three consecutive months, then a customer is considered a churner), as this definition will affect all further modeling efforts and classification results [6]. It is also one of the main reasons for the disproportion that can be observed when the number of studies focusing on contractual business settings is compared to the number of those exploring cases where formal contracts between a company and their customers do not exist (i.e., non-contractual business settings) [7].
These complexities are further augmented by the fact that customer characteristics and behavior can vary quite substantially depending on whether a company is operating in a business-to-business (B2B) or a business-to-consumer (B2C) domain [8], which needs to be taken into account when devising churn prediction models and retention strategies. B2B companies usually have fewer customers that make larger and more frequent purchases compared to their B2C counterparts [9], so retaining even a single customer in this context can make a significant difference to the financial bottom line of a company [10]. This is at odds with findings that B2B companies have traditionally struggled with data gathering and analysis [11] and that they have exhibited inertness when it comes to utilizing modern customer relationship analytics that leverage ’big data’ [12]. However, changes in macro trends such as globalization of markets, rapid adoption of modern Information and Communication Technologies (ICT) for e-commerce [13], and a shift from the ’contractual-relationship dominant’ paradigm [14] in the B2B domain have caused an increase in efforts to adapt to the new environment [15] and apply knowledge and good practices demonstrated to yield tangible results in identifying customers at risk of leaving. Most notably, the feasibility of approaches to customer relationship analytics commonly leveraged in the B2C domain (which has received significantly more attention when predictive churn modeling is in question [16]) have been explored [17], indicating that some could be effectively used in B2B context as well. Such efforts are gaining increased interest from both academia and industry, but there is still a notable lack of studies where the results of field experiments with real-world data are reported.

2. Customer Churn Prediction

Customer churn prediction modeling has often been the focus of researchers, as evidenced by numerous studies published on this topic. Particularly well-explored are the contractual business settings in the B2C domain, such as those commonly encountered in the telecommunications [18][19][20][19,20,21], banking [21][22][22,23], and insurance [23][24][24,25] sectors, where customers at risk of terminating or not renewing their contracts are identified and targeted with retention campaigns in efforts to persuade them otherwise. Non-contractual settings have also often been studied, where efforts have been put towards predicting which retail customers are least likely to make a purchase in the future [25][26][26,27], which users are at most risk to stop playing mobile games [6], or which passengers are not planning to use a particular airline for their future flights [27][28]. The B2B domain, on the other hand, has received less attention so far. Within the contractual settings in this domain, approaches have been proposed to identify business clients who are likely to close all contracts with a financial service provider [28][29], business customers who are least likely to renew a subscription to a software service [29][30][31][30,31,32], or the probability of corporate users switching to a different B2B telecommunications service provider given a set of incentives [32][33]. Non-contractual B2B settings have started receiving more interest fairly recently, where efforts are being made to help companies identify customers at risk of leaving. However, even though some general guidelines in terms of the most promising approaches to the problem can be inferred from relevant studies, it may be difficult for practitioners to decide which approach (or combination of approaches) to use, as there is significant variability in methods used to create models (distinct algorithms and hyperparameter values used), leveraged data sources (spanning transactional, CRM, quality-of-service, and E-commerce systems), characteristics of raw datasets (in terms of the time span they cover, number of customers, and churn rates), and approaches to deriving features. This is best illustrated within Table 1, where reswearchers provide an overview of relevant studies with respect to:
  • Raw data characteristics (domain they come from, time period they span, number of customers included, and churn rates);
9] Gattermann-Itschert et al. [35]Gattermann-Itschert et al. [18] Jahromi et al. [12] Janssens et al. [36] This Study
  • Source systems the data were extracted from (transactional, quality-of-service (QoS), Customer Relationship Management (CRM), and web data);
Domain Logistics Logistics Wholesale (fast moving consumer goods) Wholesale (fast moving consumer goods) Retailer (fast moving consumer goods) Retailer (beverages) Wholesale (agricultural goods)
Dataset span
1 month 3, 7 months 12 months
  • Churn definitions used (single or multiple);
  • Types of features extracted (L—length, R—recency, F—frequency, M—monetary, P—profit);
  • Type of feature extraction window considered (fixed or variable);
  • Approach to creating the training dataset (single-slicing or multi-slicing).
Table 1. Relevant studies overview.
Study Chen et al. [33]Chen et al. [34] Schaeffer et al. [34]Schaeffer et al. [35] Gordini et al. [
29 months
40 months 12 months 30 months 12 months 31 months 38 months
# of Customers 69,170 1968 80,000 5000 11,021 41,739 3470
Churn definitions3 months 6 months 12 months 1, 2, 3 months
Churn rates 2% 4–19% 10% 7–15% 28% 4% 5–38%
Data sources Transactions, QoS Transactions Transactions, QoS, web data Transactions, QoS, CRM Transactions Transactions, QoS, CRM Transactions
Features extracted LRFMP F LRFM, QoS, platform usage LRFM, QoS RFM LRFM LRFM
Feature window Fixed Variable Fixed Fixed Fixed Fixed Variable
Training set creation Single-slicing Single-slicing Single-slicing Multi-slicing Single-slicing Single-slicing Multi-slicing
Chen et al. [33][34] examined the importance of length, recency, frequency, monetary, and profit (LRFMP) variables for predicting churn in the case of one of the largest logistics companies in Taiwan. The company defines lost business customers (i.e., churners) as those who did not engage in any transactions in the past month. The dataset (after applying business-domain knowledge and relevant filtering) comprised 69,170 business customers, among which 1321 were churners. The authors applied common binary classification techniques for the domain—Decision Tree (DT), feed-forward Multi-Layer Perceptron neural network (MLP), Support Vector Machines (SVM) and Logistic Regression (LR)—to assess their effectiveness in predicting churn. Their experiment showed that the DT model is able to achieve superior results compared to other models on all reported measures (accuracy, precision, recall, and F1) and they report that the top three most influential predictors were recency of purchase, length of the relationship (i.e., tenure), and monetary indicator (i.e., amount spent). Schaeffer et al. [34][35] considered the case of a Mexican company that sells parcel-delivery as a prepaid service to business clients. Clients are able to purchase the desired number of delivery units from the company at any point in time and then consume them at their discretion, thus making this a non-contractual B2B scenario. The authors experimented with different definitions of churn (i.e., inactivity of customers in consecutive future time periods) and used inventory level-based (i.e., amount of services available) time series of varying lengths to derive features that are fed to selected machine learning algorithms in order to predict whether a client will be active or not. In particular, the authors extracted trend and level, magnitude, auto-correlations, and Fourier coefficients (as derived by fast Fourier transform) and used them as features. The dataset comprised transactions made by 1968 clients who ordered and spent services in a period of just over three years (between January 2014 and April 2017), among which, depending on the churn definition used, there were between 56 and 346 churners. The authors reported that Random Forest (RF) outperforms SVM, AdaBoost, and k-Nearest Neighbors (kNN) classifiers for the majority of time series lengths and churn definitions used when evaluated on specificity, but that SVM also performs acceptably over the majority of combinations when balanced accuracy is considered. Gordini et al. [9] proposed a novel parameter-selection approach for an established classification technique (SVM), which they used to create a predictive churn model that was subsequently tested on real-world data obtained from a major Italian on-line fast moving consumer goods company. The dataset used was derived from the activities of clients on a B2B e-commerce website (as well as the customer-level information provided by the company) and comprised 80,000 business customers, with their transactional records spanning the period from September 2013 to September 2014. According to company business rules, customers who do not make a purchase in the period of one year are considered churners and labeled accordingly in the dataset. While the training set contained equal percentage of churners and non-churners, the test set was imbalanced and contained 10% churners and 90% non-churners (both sets comprised 40,000 customers). The authors proposed the area under the receiver operating characteristic curve (AUC) as a metric on which to optimize model parameters (during the cross-validation in the training phase) and reported that such an approach outperforms the commonly used accuracy measure when evaluated on the number of correctly classified churners. In terms of performance when compared to LR and MLP, this approach also yields higher AUC and top-decile lift (TDL) when evaluated on the test set (holdout sample). Finally, the authors reported that recency of the latest purchase, frequency of purchases, and the length of relationship (i.e., tenure) are the top variables in terms of importance for successfully identifying churners. Particularly relevant for the work presented in this a paper is a recent study conducted by Gattermann-Itschert and Thonemann [35][18], who demonstrated that the multi-slicing approach to creating the training dataset and testing on out-of-period data leads to superior churn prediction models when compared to the traditionally used single-slicing approach and testing on out-of-sample data. The authors obtained transactional data (invoicing, delivery, and CRM) from one of Europe’s largest convenience wholesalers selling goods (such as beverages, tobacco, food, and other essential supplies) to smaller retailers. The dataset comprised around 5000 active customers and spanned a period of 2.5 years (from January 2017 to June 2019). Then, instead of deriving features and churn labels only for the customers active in the fixed (i.e., most recent) observation period, they repeatedly shifted the origin of observation by one month backwards in time, thus yielding multiple snapshots of customer behavior (and corresponding labels) that they used for training predictive models. This approach is quite similar to the one presented by Mirkovic et al. in [37]. The churn definition used was three consecutive months of inactivity (i.e., no purchases made during that period by a customer) and the reported churn rate fluctuated around 10%, but exhibited seasonality (ranging from around 7% to 15%). The authors hypothesized that using multi-slicing will yield more robust and accurate models, as the behavior of customers changes over time, so this approach reduces the chances of overfitting (which models trained on a single slice of data might be more susceptible to). Experimental results confirm this and the authors reported that both the increased sample size and training on observations from different time slices enhances predictive performance of classifiers. In particular, LR, SVM, and RF were compared and recursive feature elimination (RFE) and hyperparameter tuning (grid search) for each classification method was applied, RF has exhibited the best performance, showing a significantly higher AUC score compared to the other two classifiers, and significantly higher TDL than LR. Jahromi et al. in [12] proposed a method for maximizing the total profit of a retention campaign and determining the optimum number of customers to contact within it. They calculated the potential profit to be made at a customer level, provided that they respond favorably to an offer within the retention campaign and maintain average spending levels in the prediction period, which they then use as a sorting criterion for creating lists of customers to offer incentives to. An integral part of that calculation is the probability of a customer to become a churner, which is obtained via predictive churn models devised using DT and LR classifiers (in case of DT, they consider simple, cost-sensitive, and boosted variants). Two other important components of the calculation are the probability that a customer accepts the offer (which is kept constant across entire customer base at 30%) and the magnitude of incentive (the authors operate within a scenario where a 5% discount is offered). They then proceeded to test the proposed approach on a real-world dataset of 11,021 B2B customers of a major Australian online fast moving consumer goods retailer who made transactions within the span of one calendar year. Churn is defined as inactivity (no purchases made) in 6 consecutive months, with a reported churn rate of 28%. The authors reported that the boosting approach outperforms LR and simple and cost-sensitive DT, and that using this method for sorting and selecting potential churners can lead to significant business effects. They also identified recency and frequency as the most important predictors of churn. Most recently, Janssens et al. [36] proposed a novel measure that can be used to increase the profitability of retention campaigns called EMPB (Expected Maximum Profit measure for B2B customer churn). Unlike in [12], where all customers are treated as equals, the authors of this study took into account the variability in customer base (i.e., high-value vs. low-value customers), which they proceeded to show can be leveraged to create retention campaigns that maximize expected profits. They compared the performance of customer churn prediction models devised with respect to the proposed measure and concluded that it can yield considerable and measurable business gains compared to traditionally-used metrics such as AUC. They used a dataset obtained from a large North American beverage retailer comprising purchases of 41,739 B2B customers spanning 12 months, out of which roughly 4% are churners, to create predictive models using algorithms such as XGBoost, ProfLogit, ProfTree, RF, and LASSO regression that leverage this measure to recommend customers to be included in retention campaigns to maximize profits. The most important features that the authors identified were monetary value and recency, as well as purchase quantity and the average difference in days with respect to the due date for handling reported issues (QoS).
Video Production Service