Multi-Label Classification Based on Associations

Multi-Label Classification Based on Associations: Comparison

Please note this is a comparison between Version 1 by Raed Hasan Alazaidah and Version 3 by Catherine Yang.

Associative classification (AC) has been shown to outperform other methods of single-label classification for over 20 years. In order to create rules that are both more precise and simpler to grasp, AC combines the rules of mining associations with the task of classification.

associative classification
classification
machine learning
multi-label classification

1. Introduction

In data mining, classification is a common activity. The goal is to properly anticipate the class label of unseen instances using the rules or functions learned from a labeled set, or training set ^[1][2][1,2]. Many researchers ^{[3][4][5][6][7][8]}[3,4,5,6,7,8] have been attracted to classification in recent decades, and have used a wide variety of learning approaches and strategies, including decision trees, neural networks, fuzzy logic, Bayesian and statistical approaches, rule-set induction, and more to create highly accurate classifiers [9]. In categorization, there are three major categories [10]. Each data point in the first two categories must match only one of the predefined classes. The third category [11], on the other hand, enables numerous class labels to be assigned to specific dataset instances. The first, referred to as a “binary classification” has just two class labels, but the second, referred to as a “multi-class classification” contains more than two ^[12][13][12,13]. The more general multi-label classification (MLC) system ^[11][14][11,14] is the third classification scheme. This study focuses on a particular categorization strategy that employs a single-label classification (SLC) to handle the multi-label problem.

Associative classification (AC) is one of the primary approaches that has been actively used in addressing the classification problem [15]. AC is a rule-set induction approach that uses the Association Rule Mining (ARM) task to solve the cassification issue [1]. In general, the AC approach has several distinguishable features over other learning approaches, such as the highly accurate rules produced by AC algorithms, the simplicity of representing the learned rules through the “IF-THEN” format, and its applicability to a wide range of real-life classification problems, i.e., medical diagnosis, e-mail phishing, fraud detection, and software defects [16]. Most AC-based methods have only been used for binary and multi-class classification problems [17]. In contrast, only a few efforts have been presented to apply AC in a broader form of classification termed MLC [16].

2. MLC

MLC is a general classification type with distinguishable features over conventional single-label classification (binary and multi-class classification) ^[18][19][20][19,20,21]. First, in MLC, an instance could be associated with more than one class label simultaneously, whereas single-label classification requires each instance to be associated with only one class label ^[21][22]. Second, because more than one class label could apply to the same instance simultaneously, the labels in MLC are not mutually exclusive to each other as they are in single-label classification ^[21][22]. Finally, the complexity of SLC is very low compared with MLC ^[22][23]. MLC has recently attracted the interest of numerous researchers due to its applicability to a wide variety of contemporary domains, including video and image annotation ^[23][24][25][24,25,26], classifying songs based on the invoked emotions ^[26][27], prediction of gene functionality ^[27][28][29][28,29,30], protein functionality detection ^[30][31][31,32], drug discovery ^[32][33], mining social networks ^[33][34][35][34,35,36], direct marketing ^[36][37], and Web mining ^[37][38]. Two main strategies are being used to address the MLC issue. The first strategy involves converting the input multi-label dataset into a single-label dataset or several single-label datasets. The modified dataset(s) are then used to train single-label classification algorithm ^[22][23]. This strategy has been referred to as the problem transformation method (PTM). Very few AC-based algorithms have been utilized as a basis classifier in this method, according to the literature [15]. The second method [6] extends a classification algorithm for an SLC to a dataset with multiple labels. This strategy is known as the algorithm adaptation method (AAM). Several single-label classification algorithms, including C4.5 ^[37][38], k-nearest neighbor (KNN) ^[38][39], back propagation ^[39][40], AdaBoost ^[40][41], and naive Bayes (NV) ^[41][42], have been modified to address the MLC issue. Unfortunately, according to the literature [15], no AC-based algorithm has been modified to address the MLC issue.

3. Utilizing AC in MLC

According to the previous studies, relatively few efforts to solve the MLC issue have used AC. Multi-class multi-label associative classification (MMAC) is among the first methods ^[42][43] to try to use AC in MLC. MMAC turns the original multi-label dataset into a single-label one by replicating each instance associated with more than one class label a number of times equals to the number of the class label it is associated with, using or without using a weight. Hence, the dataset becomes SL dataset but, with more instances than the original one. After that, MMAC applies any SL classifier such as CBA or msCBA on the newly transformed dataset. MMAC then generates its rules by combining the outcomes of single-label rules with the same antecedent ending with multi-label rules. Unfortunately, MMAC has only been tested on datasets with single label, and it may be too complicated if the original dataset has many labels as well as high number of instances ^[43][44]. A novel multi-label method based on AC is presented in ^[44][45]. The multi-label classifier based on associative Classification (MCAC) developed a revolutionary rule discovery approach that creates multi-label rules from a single-label dataset without the need for learning. These multi-label rules reflect important information that most earlier AC algorithms often disregard. The correlative lazy associative classifier (CLAC) method, described in ^[45][46], is a hybrid algorithm that combines the principles of AC and lazy learning. CLAC generates classification association rules (CARs) that are graded according to their support and confidence ratings. Each class predicted by CLAC is immediately modified as a new characteristic to predict a different class. In comparison to the BoosTexter method, CLAC performed well on three textual datasets. The authors of ^[46][47] presented an identical AC-based method to the MMAC algorithm. In contrast to MMAC, the suggested method has been examined using one multi-label dataset (Scene) and emphasizes the importance of adopting AC in addressing the MLC issue.

4. CBA and msCBA Algorithms

CBA is one of the earliest algorithms that merge the ARM and classification tasks. CBA was introduced in ^[47][48]. Since then, several more techniques based on the combination of ARM and classification have been presented. The MMAC algorithm ^[42][43] and the multi-class associative classification (MAC) algorithm ^[48][49] are examples of algorithms that adhere to the AC methodology. CBA employs the a priori method in a classification dataset by the use of three key phases. At first, all continuous attributes are discretized.discretization is the step of converting any continuous variable or attribute into a discrete one. This step is compulsory for any AC-based classifier. Then, CARs are generated. CARs consider rules with arbitrary combinations of elements on antecedent (the left-hand side) and a single class on the consequent (the right-hand side). CARs are chosen using two metrics (support and confidence). The objective of the final phase is to construct a classifier using the best CARs ^[49][50]. CBA was subsequently enhanced in ^[50][18] by removing two flaws in the original CBA algorithm. The first problem is the use of a single minsup (minimum support) threshold value, which may result in an unbalanced class distribution. Using several minsup criteria, the modified version has addressed this problem. The exponential increase in the number of rules issued by CBA is the second flaw of the original CBA. This problem was fixed by combining CBA to a decision tree, as in C4.5, resulting in more precise rules. The modified version of CBA is referred to as CBA2 or msCBA, which is short for multiple support classification based on associations. Algorithm 1 illustrates the first CBA algorithm. Although msCBA demonstrated higher performance in single-label classification compared to other classifiers from different learning strategies [16], it is incapable of handling multi-label datasets. The msCBA method assumes that each instance input has a single class label associated with it. Hence, it generates single-label rules with a single class label as the rule’s consequence. When extending the msCBA method to accommodate multi-label datasets, this assumption should thus be discarded. In addition, the msCBA method captures the global relationships between features (attributes) and class labels, despite the fact that local dependencies and associations outperform global dependencies and associations ^[51][52][51,52].

Algorithm 1 CBA algorithm.

F_{1} = {l a r g e 1 - r u l e i t e m s}

;

C A R_{k} = g e n R u l e s (F_{1})

;

p r C A R_{1} = p r u n e R u l e s (C A R_{1})

;

4: for

(k = 2; F_{k - 1} \neq ϕ; k + +)

C_{k} = c a n d i d a t e G e n t (F_{k - 1})

;

6: for each data case

d \in D

C

=ruleSubset(

C_{k}

,d);

8: for each candidate

c \in C

9: c.condsupCont++;

10: if

d . c l a s s = c . c l a s s

then

11: c.rulesupCount++;

12: end if

13: end

14: end

15:

F_{k} = {C \in C_{k} | c . r u l e s u p C o u n t \geq m i n s u p}

;

16:

C A R_{k} = g e n R u l e s (F_{k})

;

17:

p r C A R_{k} = p r u n e R u l e s (C A R_{k})

;

18: end for

19:

C A R_{s} = U_{k} C A R_{k}

;

20:

p r C A R_{s} = U_{k} p r C A R_{k}

;