Click-Through Rate Prediction is a significant subject in e-commerce for both academia and industry. In order to accurately predict the customer's click intent, it is necessary to create a personalized customer representation. Learning such a customer representation is currently state-of-the-art.
1. Introduction
Recently, large deep learning models have dominated various domains such as natural language processing (NLP) and computer vision (CV) in academia and industry. Since the introduction of the transformer model
[1] in 2017, they have been repeatedly archiving state-of-the-art results. Recent examples like ChatGPT (
https://openai.com/blog/chatgpt/ (accessed on 26 September 2023)), GPT-3
[2], or Dall-E
[3] show what such deep models are capable of. A similar trend can be observed in e-commerce, especially with recommender models like “Wide & Deep”
[4] or Bert4Rec
[5] and Click-Through Rate (CTR) prediction (CTR-P) models like “Deep & Cross”
[6].
In the last years, CTR-P became a core task in online advertisement (also called ads)
[7][8]. This is mainly because search engines, and especially recommender systems, are playing a significant role in e-commerce businesses
[9][10][11][12]. Furthermore, predicting CTR accurately leads to a better user experience which has been shown to have a great impact on business effectiveness
[8][13]. Additionally, CTR is a key performance indicator for online ads and therefore, its prediction influences the ranking and price for online ads and revenue sponsored search
[13][14][15]. Although there is a huge amount of data in the e-commerce sector, unlike natural language or images which have recurring patterns, customer behavior is subject to constant change as it is highly dependent on a variety of factors such as season, inflation, and local as well as global developments. In addition, the data are typically use case- and user-specific and are therefore limited in their ability to be shared across organizations. These two reasons raise the question of the extent to which deep and wide models are suitable in the context of e-commerce. Besides that, another aspect is that deep learning models require a considerable amount of computing resources, which is an ever-growing concern in light of the rising energy costs in our modern world. Furthermore, companies have limited resources and need to plan them accordingly
[16]. Consequently, in the e-commerce sector, companies should ideally only use their resources on reactive customers, e.g., only display recommendations to those customers who are most likely to click on them. Lastly, advertising and recommendations can lead to negative experiences for certain customers, resulting in negative attitudes towards the operating company. This leads to shorter visit duration, fewer visits, fewer referral opportunities, and increased negative word-of-mouth. Therefore, it is crucial to only display advertising and recommendations when success is probable. Therefore, it is of great importance for a business to understand its customers’ intentions and engage them with personalized targeting.
2. Approaching Click-Through Rate Prediction
CTR-P received a lot of attention in industry and academia in the past years. It is approached as a binary classification problem, where the probability of an item click should be predicted regardless of the use case, e.g., retrieved item in a search, clicked ad, or clicked product. In the literature, there is not one CTR-P use case, but multiple kinds of use cases. For example, Chen et al.
[17], Ge et al.
[18], and Fan et al.
[9] propose a CTR-P model to optimize the retrieved items of a search engine. Others predict the CTR for shown ads
[6][19] or products in general
[10][20].
Table 1 presents a comprehensive overview of state-of-the-art CTR-P approaches including information of the authorship, publication year, proposed approach, and used dataset. All approaches are based on deep neural networks which means mixtures and ensembles of multi-layer perceptrons, recurrent layers, and attention layers that should capture customers’ behavioral information. Furthermore, all models contain an embedding input layer to embed available information, which is usually given by the use case and/or selected by the data engineers. Typical input information is the user id, target item id, additional user information, and additional target item information. The DIN
[21], DIEN
[10], TIEN
[22], and MARN
[8] approaches use sequential activity information, which Alves Gomes et al. also rely on their approach. They propose a decoupled approach consisting of an activity embedding that learns historical customer behavior from context in a self-supervised manner, and an LSTM that learns to predict whether the customer will click on a product or recommendation based on the embedded behavior. CTR approaches are evaluated on different datasets, some publications and approaches rely only on closed data
[13][18][23][24][25] which are not included in
Table 1. Others, as shown in
Table 1, use openly available datasets to evaluate their approach. Of all the reviewed publications, the Amazon review dataset is the most used.
Table 1. Overview of publications proposing CTR-P approaches with information on the datasets.
Overview of publications proposing CTR-P approaches with information on the datasets, evaluation metrics, and scores used.
Table 1. Overview of publications proposing CTR-P approaches with information on the datasets, evaluation metrics, and scores used.
Author |
Year |
Approach |
Dataset |
Score |
AUC |
F1 |
Logloss |
Fan et al. [9] |
2022 |
RACP |
Avito |
0.794 |
|
|
Taobao (closed) |
0.7623 |
|
|
C. Li et al. [20] |
2021 |
Mul-AN |
Criteo |
0.8 |
|
0.483 |
MovieLens-100k |
0.847 |
|
0.395 |
X. Li et al. [8] |
2020 |
MARN |
Amazon Review Electro |
0.803 |
|
|
Amazon Review Clothing |
0.791 |
|
|
Taobao (closed) |
0.749 |
|
|
X. Lie et al. [22] |
2020 |
TIEN |
Amazon Review Beauty |
0.8701 |
0.784 |
0.4479 |
Amazon Review Clothing |
0.7962 |
0.698 |
0.5476 |
Amazon Review Grocery |
0.8252 |
0.7524 |
0.5019 |
Amazon Review Phones |
0.839 |
0.7427 |
0.4949 |
Amazon Review Sports |
0.8266 |
0.7543 |
0.5101 |
Zeng et al. [26] |
2020 |
USRF |
RetailRocket datasets |
0.8888 |
0.8001 |
|
Amazon Review Digital Music |
0.7086 |
0.6709 |
|
MovieLense-1M |
0.9921 |
0.8445 |
|
Zhou et al. [10] |
2019 |
DIEN |
Amazon Review Electro |
0.7792 |
|
|
Amazon Review Books |
0.8453 |
|
|
Taobao |
0.6541 |
|
|
Zhou et al. [21] |
2018 |
DIN |
Amazon Review Electro |
0.8871 |
|
|
MovieLense-20M |
0.7348 |
|
|
Alibaba (closed) |
|
|
|
Wang et al. [27] |
2017 |
DCN |
Criteo |
|
|
0.4419 |
Author |
Year |
Approach |
Dataset |
Score |
AUC |
F1 |
Logloss |
Fan et al. [9] |
2022 |
RACP |
Avito |
0.794 |
|
|
Taobao (closed) |
0.7623 |
|
|
C. Li et al. [20] |
2021 |
Mul-AN |
Criteo |
0.8 |
|
0.483 |
MovieLens-100k |
0.847 |
|
0.395 |
X. Li et al. [8] |
2020 |
MARN |
Amazon Review Electro |
0.803 |
|
|
Amazon Review Clothing |
0.791 |
|
|
Taobao (closed) |
0.749 |
|
|
X. Lie et al. [22] |
2020 |
TIEN |
Amazon Review Beauty |
0.8701 |
0.784 |
0.4479 |
Amazon Review Clothing |
0.7962 |
0.698 |
0.5476 |
Amazon Review Grocery |
0.8252 |
0.7524 |
0.5019 |
Amazon Review Phones |
0.839 |
0.7427 |
0.4949 |
Amazon Review Sports |
0.8266 |
0.7543 |
0.5101 |
Zeng et al. [26] |
2020 |
USRF |
RetailRocket datasets |
0.8888 |
0.8001 |
|
Amazon Review Digital Music |
0.7086 |
0.6709 |
|
MovieLense-1M |
0.9921 |
0.8445 |
|
Zhou et al. [10] |
2019 |
DIEN |
Amazon Review Electro |
0.7792 |
|
|
Amazon Review Books |
0.8453 |
|
|
Taobao |
0.6541 |
|
|
Zhou et al. [21] |
2018 |
DIN |
Amazon Review Electro |
0.8871 |
|
|
MovieLense-20M |
0.7348 |
|
|
Alibaba (closed) |
|
|
|
Wang et al. [27] |
2017 |
DCN |
Criteo |
|
|
0.4419 |
3. Customer Representation
Traditionally, customer behavior is modeled by domain experts to make predictions of their intentions and future behavior. Therefore, data like clickstream data or demographic information are incorporated into the data analysis and feature engineering process
[28][29][30][31][32]. As shown by Alves Gomes et al.
[33] most customer representations are modeled with manual features extracted by experts or with the RFM analysis
[34]. For example, Perisic et al.
[35] and Friedrich et al.
[36] extracted RFM-based features by extending the RFM analysis from historical data for customer representation. Wu et al.
[37] modeled and analyzed customer behavior with an extended RFM approach by adding customer contribution time and repeat purchase attributes and combining it with a k-means clustering. K-means clustering is also used by Hamed Fazlollahtabar
[38]. The author chose different customer information gathered from their transactions and applied k-means clustering of different combinations of two features, e.g., gender and product or age and product. Wang et al.
[39] analyzed influence factors of second-hand customer-to-customer e-commerce platforms by using questioner and demographic information of customers. Esmeli et al.
[32] modeled customers based on twelve features solely based on session information. Berger et al.
[40] used features that describe the change in customer behavior based on actual session information and the information retrieved from previous session history. This manual customer representation process is time-consuming and expensive, especially since it needs to be repeated for each new use case or marketing campaign.
Recent approaches that use embedding layers simplify customer modeling by only inserting information into the learning model without a proper feature engineering process. Most of the aforementioned CTR-P approaches utilize embedding layers to learn customer behavior. Sheil et al.
[41] proposed an end-to-end three-layered LSTM to predict future customer behavior by learning patterns of the product the customer interacts with, the interaction time, and additional product-related information. Ni et al.
[11] proposed a Deep User Perception Network (DUPN) an end-to-end Long-Short Term Memory (LSTM) with an embedding input that is trained on multiple tasks for a general customer representation. Yang et al.
[42] and Wu et al.
[43] represented customers based on textual features like product names, categories, and reviews written by the customers. However, in addition to using an embedding layer for input data, embeddings can also be used to represent features. Especially in the e-commerce context, embeddings were used in recommendation scenarios. For this purpose, product embeddings were created and trained
[44][45][46][47]. A recent approach using pre-trained embedding features to represent customer behavior was proposed by Alves Gomes et al.
[48][49]. The authors pre-trained an embedding to encode customers’ behavior and used the representation to predict customers’ purchase intention.