Recommender systems are one of the great improvements in Internet technology and e-commerce, and the origins of modern recommender systems date back to the early 1990s when they were mainly applied experimentally to personal email and information filtering. Later, recommender systems went through numerous improvements to facilitate users’ navigation through fashion, videos, books, papers, and especially e-commerce.
1. Introduction
Recommender systems are one of the great improvements in Internet technology and e-commerce, and the origins of modern recommender systems date back to the early 1990s when they were mainly applied experimentally to personal email and information filtering
[1]. Later, recommender systems went through numerous improvements to facilitate users’ navigation through fashion, videos, books, papers, and especially e-commerce. Recommender systems have grown to be an essential part of all large Internet retailers, driving up to 35% of Amazon sales
[2] and over 80% of the content watched on Netflix
[3]. E-commerce retailers started implementing fashion recommendation systems in the early 2000s
[4]. Recently, e-commerce has been growing, especially from 2013 until today, as the recommendation systems facilitate the navigation for users to find fashion products relevant to their interests, and the recommendation systems have grown to be a hot topic and essential tool for the huge number of Internet retailers. In 2013, the total turnover for e-commerce in Europe expanded by 17% in comparison to the 12 months before, and huge organizations could have hundreds of products, or even more, from which users can select on their websites
[5]. Customers and the business enterprise want the client to discover relevant fashion items when they are searching, and this is where recommender systems come into the picture
[6]. In 2024, the size of the domestic fashion market will have grown by 8.8% compared to its size in 2020
[7,8,9][7][8][9]. Being one of the hottest study topics currently in progress at both the national and international levels, recommender systems are the ideal solution for suitable fashion discovery in e-commerce, and the recommender systems possess several strong points; nevertheless, recommender systems also possess some weaknesses in the accuracy of their results. However, content-based information retrieval (CBIR) has become one of the hotspot topics; it retrieves items based entirely on the image content, which decreases some of the weaknesses, especially in fashion, by computing similarities among items and users through a set of features associated with them
[10]. For a fashion item, the proposed features can be a group such as short-sleeve tops, long-sleeve tops, short-sleeve outerwear, long-sleeve outerwear, vests, slings, shorts, trousers, skirts, short-sleeve dresses, long-sleeve dresses, vest dresses, and sling dresses, considering gender. Many of the current recommendation systems have difficulties with full body images, and many customers cannot retrieve what they want because the images showing the full body may contain more than one item, such as a t-shirt with pants, a blouse with shorts, and a shirt with a skirt. Additionally, there are many clients who try to inquire about similar items, but they sometimes retrieve similar results for the opposite gender.
2. Fashion Recommender Systems
P. Bellini worked on fashion retail based on a multi-clustering approach of items and users’ profiles in online and physical stores, which relies on mining techniques, allowing one to predict the purchase behavior of newly acquired customers
[10]. S. Chakraborty reviewed the state-of-the-art fashion recommendation systems and the corresponding filtering techniques and explored various potential models that could be implemented to develop fashion recommendations
[4]. T.H. Nobile focused on the categories of Design and Production and Culture and Society, which collectively gathered indicatively 48% of the selected literature
[11]. M. Khalid designed and implemented a two-stage deep learning-based model that recommends a clothing fashion style to extract various attributes from images with clothes to learn the user’s clothing style and preferences, and he used a convolutional neural network (CNN) as a visual extractor of image objects
[5]. L. Liao developed an EI tree, which organizes the fashion concepts into multiple semantic levels and augments the tree structure with exclusive as well as independent constraints
[12]. X. Yang proposed a new solution named the Attribute-based Interpretable Compatibility (AIC) method, which injected interpretability into the compatibility modeling of items. Specifically, given a corpus of matched pairs of items. They also learn the interpretable patterns that lead to good matches, e.g., a white T-shirt matches with black trousers. Z. Cheng proposed a general feature-level attention method for ICF models. The key to the method is to distinguish the importance of different factors when computing the item similarity for a prediction. They designed a light attention neural network to integrate both item-level and feature-level attention for neural ICF models. F. Liu proposed a novel Interest-aware Message-Passing GCN (IMP-GCN) recommendation model, which performs high-order graph convolution inside subgraphs. The subgraph consists of users with similar interests and their interacted items. They designed an unsupervised subgraph generation module, which can effectively identify users with common interests by exploiting both user features and graph structures. MLGFRS focuses on the content of the query image. MLGFRS recommends similar products for each object in the query image
[15][13]. Taobao and Shein are examples of e-commerce that allow users to browse and purchase a wide range of products, from clothing to accessories, making online shopping convenient and accessible for customers worldwide. In 2006, Taobao sellers tended to maintain stabilized transaction volumes while the market growth slowed. This reflects the effect of the competition in China’s online C2C market on Taobao’s performance
[16][14]. Shein, a Chinese online fast-fashion retailer, is known for its affordably priced apparel
[17][15]. However, Taobao and Shein detect only a single object from the query image, and in many instances, they detect unwanted objects.
3. Gender-Aware Detection
It is simple for humanity to recognize female fashion compared to male fashion. Many works have been conducted to recognize gender automatically, and machine learning (ML) facilitated this idea. K. Khan
[19][16] proposed a framework that first segments a face image into face parts and then performs automatic gender classification, and he segmented a face image into six different classes—mouth, hair, eyes, nose, skin, and back. Boosted by face detection, in the past few decades, the work has witnessed the thriving of gender detection based on face specification
[20,21][17][18]. Segmenting the face into parts facilitated the work of gender recognition, but it is weak for images with full bodies wearing different clothes. H. Chen
[22][19] attempted to build a list of image attributes to describe clothes in images taken in an unconstrained environment, including gender as an attribute, and he only sampled patches from the top of the body.
4. Object Detection
Detecting objects is one of the most important aspects of computer vision, and detecting objects in query images is important for generating embeddings and recommending similar products. There are various methods for object detection like Darknet architectures, which are based on single detectors like YOLO
[25][20], which is one of most recently used and most accurate tools in real-time object detection, while the need it to use it in query images it still counted as the ideal solution.
5. Similarity Learning
Similarity learning is a hot topic in deep learning, which helps in grouping items to find the embedding vectors that are closest to obtaining and recommending similar items for detected objects from query images. F. Schroff presented a system called FaceNet, which directly learns a mapping from face images in a compact Euclidean space where distances directly correspond to a measure of face similarity
[27][21]; for MLGFRS, the embedding learning will be used to group the products based on the similarity to recommend the most relevant products.