CTR-P received a lot of attention in industry and academia in the past years. It is approached as a binary classification problem, where the probability of an item click should be predicted regardless of the use case, e.g., retrieved item in a search, clicked ad, or clicked product. In the literature, there is not one CTR-P use case, but multiple kinds of use cases. For example, Chen et al.
[17], Ge et al.
[18], and Fan et al.
[9] propose a CTR-P model to optimize the retrieved items of a search engine. Others predict the CTR for shown ads
[6][19][6,19] or products in general
[10][20][10,20].
Table 1 presents a comprehensive overview of state-of-the-art CTR-P approaches including information of the authorship, publication year, proposed approach, and used dataset. All approaches are based on deep neural networks which means mixtures and ensembles of multi-layer perceptrons, recurrent layers, and attention layers that should capture customers’ behavioral information. Furthermore, all models contain an embedding input layer to embed available information, which is usually given by the use case and/or selected by the data engineers. Typical input information is the user id, target item id, additional user information, and additional target item information. The DIN
[21], DIEN
[10], TIEN
[22], and MARN
[8] approaches use sequential activity information, which Alves Gomes et al.
[2653] also rely on their approach. They propose a decoupled approach consisting of an activity embedding that learns historical customer behavior from context in a self-supervised manner, and an LSTM that learns to predict whether the customer will click on a product or recommendation based on the embedded behavior. CTR approaches are evaluated on different datasets, some publications and approaches rely only on closed data
[13][18][23][24][25] [13,18,23,24,25] which are not included in
Table 1. Others, as shown in
Table 1, use openly available datasets to evaluate their approach. Of all the reviewed publications, the Amazon review dataset is the most used.
Table 1. Overview of publications proposing CTR-P approaches with information on the datasets.
Overview of publications proposing CTR-P approaches with information on the datasets.
Author |
Year |
Approach |
Dataset |
Alves Gomes et al. | [2653] |
2024 |
Decoupled Embedding + LSTM |
Amazon Review |
closed |
Fan et al. [9] |
2022 |
RACP |
Avito |
Taobao (closed) |
C. Li et al. [20] |
2021 |
Mul-AN |
Criteo |
MovieLens-100k |
X. Li et al. [8] |
2020 |
MARN |
Amazon Review Electro |
Amazon Review Clothing |
Taobao (closed) |
X. Lie et al. [22] |
2020 |
TIEN |
Amazon Review Beauty |
Amazon Review Clothing |
Amazon Review Grocery |
Amazon Review Phones |
Amazon Review Sports |
Zeng et al. [27][29] |
2020 |
USRF |
RetailRocket datasets |
Amazon Review Digital Music |
MovieLense-1M |
Zhou et al. [10] |
2019 |
DIEN |
Amazon Review Electro |
Amazon Review Books |
Taobao |
Zhou et al. [21] |
2018 |
DIN |
Amazon Review Electro |
MovieLense-20M |
Alibaba (closed) |
Wang et al. [28][30] |
2017 |
DCN |
Criteo |
3. Customer Representation
Traditionally, customer behavior is modeled by domain experts to make predictions of their intentions and future behavior. Therefore, data like clickstream data or demographic information are incorporated into the data analysis and feature engineering process
[29][30][31][32][33][31,32,33,34,35]. As shown by Alves Gomes et al.
[34][36] most customer representations are modeled with manual features extracted by experts or with the RFM analysis
[35][37]. For example, Perisic et al.
[36][38] and Friedrich et al.
[37][39] extracted RFM-based features by extending the RFM analysis from historical data for customer representation. Wu et al.
[38][40] modeled and analyzed customer behavior with an extended RFM approach by adding customer contribution time and repeat purchase attributes and combining it with a k-means clustering. K-means clustering is also used by Hamed Fazlollahtabar
[39][41]. The author chose different customer information gathered from their transactions and applied k-means clustering of different combinations of two features, e.g., gender and product or age and product. Wang et al.
[40][42] analyzed influence factors of second-hand customer-to-customer e-commerce platforms by using questioner and demographic information of customers. Esmeli et al.
[33][35] modeled customers based on twelve features solely based on session information. Berger et al.
[41][43] used features that describe the change in customer behavior based on actual session information and the information retrieved from previous session history. This manual customer representation process is time-consuming and expensive, especially since it needs to be repeated for each new use case or marketing campaign.
Recent approaches that use embedding layers simplify customer modeling by only inserting information into the learning model without a proper feature engineering process. Most of the aforementioned CTR-P approaches utilize embedding layers to learn customer behavior. Sheil et al.
[42][44] proposed an end-to-end three-layered LSTM to predict future customer behavior by learning patterns of the product the customer interacts with, the interaction time, and additional product-related information. Ni et al.
[11] proposed a Deep User Perception Network (DUPN) an end-to-end Long-Short Term Memory (LSTM) with an embedding input that is trained on multiple tasks for a general customer representation. Yang et al.
[43][45] and Wu et al.
[44][46] represented customers based on textual features like product names, categories, and reviews written by the customers. However, in addition to using an embedding layer for input data, embeddings can also be used to represent features. Especially in the e-commerce context, embeddings were used in recommendation scenarios. For this purpose, product embeddings were created and trained
[45][46][47][48][47,48,49,50]. A recent approach using pre-trained embedding features to represent customer behavior was proposed by Alves Gomes et al.
[51,52,[49][50]53]]. The authors pre-trained an embedding to encode customers’ behavior and used the representation to predict customers’ purchase intention.