Multi-label classification is typically used in different data mining applications, like labeling videos, images, music, and texts. Multi-label classification classifies documents into various classes simultaneously based on their properties. This task is different from the conventional single label, which correlates every document to a single class. The classification task of the single label can also be regarded as multi-class or binary classification
[1]. In multi-class classification, every document can fall under multiple label categories, but only one label category is designated. However, the multi-label classification, generalizes the multi-class and binary classification since it does not emphasize any constraints on the number of components that are imposed on the outputs
[2]. Techniques of multi-label classification are affected by a higher level of class imbalance, and due to this, they will not be able to operate with efficiency.
Multiple item identification in images, gene expression prediction, tag prediction for audio, and categorizing papers into specified categories are just a few examples of the many fields where multi-label classification is a common challenge in artificial intelligence research
[3][4]. Additionally, classifiers that can develop a global model for a class perform worse when multi-label datasets are classified
[5]. To concentrate on the decision limits of the classifier(s) in each area, clustering has previously been used to extract the distribution of data completely or separately for each class
[6][7]. In other words, one classifier is trained for each cluster of data after the initial clustering of the data. The labels of the data are not used in clustering, which is an unsupervised classification
[8][9]. There is no method to collect the relationship between labels since many algorithms that address the multi-label classification issue neglect the connection between labels.
2. Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning
Mikolov et al.
[10] the Recurrent Neural Network Language Model’s Contextual Real-Valued Input Vector for Every Word, which enhances performance. However, there are a few drawbacks to the techniques at varied levels. On the usage of the CNN technique, post the calculation of convolutions, k-max or average pooling must still be used to determine the length of each document’s representation. The RNN model is biased, with later words dominating earlier words.
Lai et al.
[11] suggested the RCNN model, which can overcome RNN bias, for document categorization. While learning the feature representation of texts using CNN, the recurrent structure’s benefits are successfully used by the RCNN, which facilitates the gathering of contextual data since this approach will also encounter problems as a result of vanishing gradients and exploding gradients while the error is updated with back-propagation.
Lin et al.
[12] proposed that Label Space Dimension Reduction (LSDR) is performed using a unique technique called End-to-End Feature-Aware Label Space Encoding (E 2FE). In an early study, E 2FE learns a code matrix directly from training example code vectors without an encoding algorithm. The coding matrix is obtained by concurrently optimizing the restoration and the predictability of the latent space, which explains E2FE feature awareness. E2FE additionally teaches a linear decoding matrix to quickly find an unknown instance’s label vector from its estimated code vector. E2FE trains predictive models to convert instance attributes into code vectors according to the learned code matrix.
The multi-label categorization must be handled, and an enhanced Convolutional Neural Network through Hierarchical Dirichlet Process (HDP) model has been put out by Wang et al.
[13] for issues in NLP. An HDP model is applied first to exclude a few less semantically significant terms. After that, convert words into vectors using word-embedding techniques. The last step is to train CNN using word vectors.
A multi-label ranking approach for document categorization based on LSTM was suggested by Yan et al.
[14], specifically LSTM22, which combines a unified learning-ranking approach called rank LSTM with rep LSTM, an adaptive data representation process. The supervised LSTM in repLSTM incorporates the document labels to learn document representation.
Multi-label text may now be effectively and automatically classified using a revolutionary technique given by Jindal
[15]. Lexical and semantic ideas serve as the foundation for the suggested approach. A common IEEE taxonomy is used to identify tokens in text sources. Using the well-known lexical database WorldNet, the semantic links between tokens are examined. A database of 150 computer science research studies from the IEEE Xplore digital library is used to test the suggested technique. With a 75% accuracy rate, it has shown noticeably excellent performance.
The greatest resampling methods, such as random oversampling, heuristic under-sampling, and synthetic sample generation approaches, were combined in a suggested strategy by Charte et al.
[16]. To understand how the label decoupling procedure affects the behavior of these hybrid approaches, empirical analysis is performed. Therefore, the experimentation may provide a remarkable set of recommendations for combining various strategies.
Latent class models combined with novel algorithms in classification were proposed by Alyousef et al.
[17]. The new approach clusters patients into groups using latent class models, which enhances classification and makes it easier to identify the fundamental distinctions between the groups that are found. Data from individuals with Systemic Sclerosis, an uncommon and possibly deadly illness, are used to evaluate the approaches. Results reveal that when compared to competing comparable approaches, accuracy is improved using the “Latent Class Multi-Label Classification Model”.
An AC
kEL paradigm was introduced by Wang et al.
[18]. Based on active learning, a label-selection condition assesses class differentiability and balance. Subsequently, the suggested condition is used to randomly choose the first label or label subset and iteratively select the others. ACkEL uses pool-based and stream-based models in disjoint and overlapping modes, respectively. Comparing the techniques shows that they are possible and successful.
An innovative approach referred to as FL-MLC, Che et al.
[19] suggested a method for multi-label learning that takes feature–label interdependence considered. The intrinsic link between the feature variable and the label variable is originally presented as the discriminant weight of any feature to label. The label’s feature distribution for inputs illustrates the distinctive weights of characteristics; kernel alignment and multiple kernel learning increase computation. The feature distribution-based label correlation uses two aggregation processes to combine feature distributions on various labels. The random label variables with almost identical feature distributions should be carefully considered. The feature distribution-based label correlation is used in this way to alter the distance among the parameters for various labels in the FL-MLC approach’s predictive learner. Finally, it can be shown from the results of the tests performed on twelve real-world datasets that the method produces improved multi-label classification results in terms of efficiency and variety.
An innovative classification margin-based MNRS model was presented by Sun et al.
[20]. A filter-wrapper pre-processing strategy for feature selection utilizing the modified Fisher score model reduces the spatiotemporal complexity of multi-label data and improves classification performance. It is confirmed from the results of experiments conducted that the suggested technique with thirteen multi-label datasets is efficient in the selection of salient features, showing its remarkable classification potential in multi-label datasets.
Huang et al.
[21] proposed a novel BLS-MLL multi-label classifier with two new mechanisms: correlation-based label thresholding and a kernel-based feature-reduction module. The feature-mapping layer, enhancement-nodes layer, and feature-reduction layer are the three levels that make up the kernel-based feature-reduction module. Elastic network regularization solves feature randomness to increase performance at the feature-mapping layer. For efficient high-dimensional nonlinear conversion in the enhancement-nodes layer, the kernel approach is used. BLS-MLL may build a label-thresholding function to convert final decision values to logical outputs using correlation-based label thresholding, boosting classification performance. Eventually, on ten datasets, BLS-MLL is superior to six cutting-edge multi-label classifiers. BLS-MLL beats the comparable algorithms in 86% of instances and has greater training efficiency in 90% of cases, according to classification performance findings.
Bayati et al.
[22] suggested a subspace learning-based memetic method for global and local search in multi-label data. For multi-label feature selection, this is the first attempt using a filter-based memetic algorithm. Reconstruction error and sparsity regularization are the objective function’s conflicting objectives. The suggested multi-label feature-selection approach is compared against nine filter-based methods. Classification accuracy, hamming loss, average precision, and one mistake are used to evaluate multi-label classification performance. The suggested strategy outperforms comparable methods across all assessment criteria in eight real-world datasets.
Zhu et al.
[23] suggested Multi-Label Classification with Dynamic Ensemble Learning (MLDE), an innovative approach. MLDE predicts each unseen occurrence with the most competent ensemble of base classifiers. Classification accuracy and ranking loss as basis classifier competency measurements to create dynamic choices for the multi-label issue and improve performance. Classification accuracy is decomposable to numerous labels and differentiates a classifier’s ability to distinguish labels, whereas ranking loss focuses on a classifier’s total performance on the label set and completely analyses the connection among many labels. In comprehensive tests on 24 publicly available datasets, MLDE exceeds modern techniques.
Zhang et al.
[24] suggested Relief-LIFT, a multi-label classification approach. Relief-LIFT uses LIFT to produce new features, then changes Relief to choose informative features for the classification model.
Table 1 shows the comparative analysis between the available techniques of multi-label classification.
Table 1. Comparative analysis of the existing approaches.
Author |
Approaches |
Discussion |
Demerits |
Wang [13] et al. (2018) |
Multi-label Classification Utilizing an Improved Convolutional Neural Network Algorithm |
It achieves an average labeling accuracy above 93% |
Quite difficult the classification of images in various positions |
Jindal [15] (2018) |
A novel method for multi-label text document categorization that is both automatic and effective |
Yields reasonable performance achieving an accuracy of 75%. |
However, word embedding does not discriminate against different senses. |
Charte [16] et al. (2019) |
Tackling Multi-label Imbalance using Label Decoupling and Data Resampling Hybridization |
Hamming Loss and Ranking Loss are minimized. |
Performance degradation in using the high dataset |
Alyousef [17] et al. (2019) |
Identification of Illness Subclasses Using Latent Class Multi-label Classification for Better Prediction |
Results show that the “Latent Class Multi-label Classification Model” increases the accuracy in comparison with contemporary potential techniques. |
The primary disadvantage is that it is undesirable for datasets having a huge number of labels, owing to the massive exploration space |
Wang et al. [18] (2020) |
Active k-label sets ensemble |
Feasible and effective |
How to further improve the training efficiency will be an important issue |
Che et al. [19] (2021) |
FL-MLC |
Is effective and diverse for multi-label classification |
Increases the time complexity |
Sun et al. [20] (2021) |
Margin-based MNRS model |
Effective and feasible |
Increases the false positive rate |
Huang et al. [21] (2022) |
Correlation-based label thresholding |
Produces better performance |
Does not evaluated on high volume data |
Bayati et al. [22] (2022) |
Subspace learning and memetic algorithm |
Superior to comparing methods |
Increases the false positive rate |
Zhu et al. [23] (2023) |
Dynamic Ensemble learning |
Outperforms the state-of-the-art methods. |
Time-consuming nature |
Zhang et al. [24] (2023) |
Relief-LIFT |
Achieve better performance |
This does not apply to all applications |