Hierarchical Clustering Algorithms

Hierarchical Clustering Algorithms: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

Toni Anwar

Zubair Sharif

In the field of data mining, clustering has shown to be an important technique. Hierarchical clustering algorithms are a type of clustering algorithm where data items will be divided into sections in a hierarchical form. To create a dendrogram that shows the formulated cluster’s hierarchical structure, in a top-down or bottom-up way clusters are created iteratively.

clustering algorithms
data
cluster

1. Introduction

Hierarchical clustering algorithms enables data exploration at various granularity levels ^[1]. One is the divisive method where the top-down strategy is followed, and another is an agglomerative method where the bottom-up approach is followed. The process agglomerative technique follows clusters which are formed from identical items by properly combining them repeatedly into bigger clusters to establish the hierarchy’s various levels. This process continues until the full object is transformed into a particular cluster or the stopping criteria is met. The opposite is true when using a polarizing strategy.

Iteratively, the cluster comprising all the objects is dispersed until either the halting requirement is satisfied, or each object creates its own cluster. The cluster element’s closeness or dissimilarity is used to determine whether to merge or split the data.

The distance among the points of subgroups is calculated from the distance of individual points, hierarchy clustering allows for the merging or splitting of subsets of a point. The linkage metric, which measures proximity, is used to ascertain this. There are three kinds of linkages: one of them is single linkage, the second one is average connection and the last is complete linkage which is usually used in hierarchical clustering ^[1]^[2]^[3]^[4]^[5]. The algorithm for hierarchical clustering utilizes n*n the linkage metrics utilized for the clustering are created in connectivity matrix form. Finding the similarities between each pair of data points allows for the building of the similarity matrix. When deciding on a linking criterion, it is common practice to measure the pairwise distance between each cluster. Using the measure of similarity, researchers may determine the separation between the groups of clusters. It is also utilized to answer the question of how the clusters themselves take form.

2. Agglomerative Clustering

In unsupervised machine learning, hierarchical, agglomerative clustering is a significant and well-established approach. Agglomerative clustering methods begin by dividing the data set into singleton nodes and gradually combining the two currently closest nodes into a single node until only one node is left, which contains the whole data set. This process serves as a common description for several clustering systems, but they vary in how the measure of inter-cluster dissimilarity is updated after each step ^[6]. The objective function’s optimal value serves as the criterion for selecting the pair of clusters to merge at each phase. Instead of binary data, this clustering algorithm is best suited for quantitative variables. The research of ^[7] devised a non-parametric hierarchy, with a conventional closest neighbor approach, an agglomerative clustering method determines a sample point’s mutual neighborhood value (MNV) and mutual nearest neighbors (MNN). Agglomerative hierarchical clustering is further subdivided into the following categories.

Single-linkage clustering: this type of clustering is also known as the minimal, connectedness, or nearest-neighbor approach. The closest distance between any two cluster members of any cluster is measured. By calculating the closest distance between a single element pair, it calculates the similarity between two clusters. The chaining effect of the single linkage clustering has the propensity to produce extended clusters ^[8].
Average linkage clustering: the minimum-variance linkage is another name for average linkage clustering ^[1]^[9]. It determines the average or median distance between each cluster of data points ^[8].
Complete linkage: the complete linkage, often referred to as the maximum, diameter, or farthest neighbor method, measures the longest distance between any member of one cluster and any member of the other cluster in order to calculate the distance between two clusters. Compared to single-linkage clustering, the complete-linkage algorithm clusters are smaller and more closely linked ^[9]. The three-proximity metrics that were described earlier take into account all the points in a pair of clusters when calculating the inter-cluster distances. They are thought of as graph techniques ^[10]^[11].

SLINK is an implementation of the single linkage hierarchical clustering technique ^[12]; the authors of ^[13]^[14] developed CLINK, which is an implementation of the complete linkage clustering algorithm and are examples of the average link clustering algorithm. Other geometrical techniques were created using the center point as a proximity measure based on the same concept. These comprised the minimum variance linkage metrics, centroid linkage, and median linkage metrics ^[15]^[16]^[17]. While similarity metrics capture intra-cluster connectedness, a distance-based proximity measure captures inter-cluster closeness. The adjustable amount of granularity and any similarity metric can be handled by the hierarchical clustering techniques ^[8].

3. Divisive Hierarchical Clustering

The agglomerative clustering process breaks each cluster into smaller groups starting with each item in a single cluster and continuing until the necessary number of clusters is reached and it is reversed by the process known as “divisive hierarchical clustering.” The divisive approach, in contrast to the agglomerative clustering method, employs the top-down method, where the data objects are initially thought of as a fused cluster that gradually separates depending on when the cluster number is collected ^[18]^[19]^[20]. In order to divide a cluster into two subsets that each contain one or more components, the usual procedure takes into account all potential bipartitions. Even though it is common practice to examine all potential bipartitions in which each cluster is capable of being divided into two smaller clusters it is clear that the entire enumeration procedure provides a universal optimum but is quite costly in terms of computation cost.

Diverse divisive clustering methods that do not take into account all bipartitions have been researched. For instance, ^[21] compared the conventional K-Means or agglomerative method, and a bisecting K-Means divisive clustering method was presented. Another study ^[22] combined it with the divisive clustering approach to investigate a unique clustering technique dubbed “reference point-based dissimilarity measure” (DIVFRP) for the aim of dataset division.

The author in ^[23] proposed an improved particle optimizer (IDPSO) to identify the most convenient optimal partition hyperplane for dividing the chosen clusters into two parts. This dividing method is a practical and effective component of the divisive hierarchical approach. The authors in ^[24]^[25] investigated the iterative division technique using the average dissimilarity between an object and a set of objects. A different strategy, however, focuses on optimization criteria that include partitioning or bi-partitioning and uses a dissimilarity matrix as input ^[26]^[27]. There are two main categories of divisive clustering: monothetic and polythetic approaches. When a set of logical qualities are both required and sufficient for inclusion in a cluster, researchers refer to that cluster as monothetic ^[28].

Monothetic divisive clusters are formed by dividing items based on a single variable in each breaking, such as whether or not they have a certain object value. The “association analysis approach” has a version called monothetic; the author of ^[29] developed it specifically for binary information. Several researchers have used monothetic clusters to solve problems. For instance, the authors of ^[30] provided an approach that gives an arrangement of things and a monothetic description of each cluster. Similarly, the author in ^[31] developed three monothetic techniques and principal component analysis (PCA) for inter-valued data. The initial PCA method utilized inter-valued data. The author’s second approach relied on symbolic data, while their third and final algorithm was derived from the terminal values of intervals. In the end, the author tested their model using real-world data to ensure its accuracy.

Contrarily, polythetic divisive clustering is a method that uses all parameters concurrently by calculating distances or resemblance values. Rather than relying on the relative positions of variables, it relies solely on distance values, which in turn indicate the dissimilarity between all of the variables simultaneously ^[32].

This entry is adapted from the peer-reviewed paper 10.3390/app13063529

References

Saxena, A.; Prasad, M.; Gupta, A.; Bharill, N.; Patel, O.P.; Tiwari, A.; Er, M.J.; Ding, W.; Lin, C.-T. A review of clustering techniques and developments. Neurocomputing 2017, 267, 664–681.
Olson, C.F. Parallel algorithms for hierarchical clustering. Parallel Comput. 1995, 21, 1313–1325.
Jain, A.K.; Murty, M.N.; Flynn, P.J. Data clustering: A review. ACM Comput. Surv. 1999, 31, 264–323.
Murtagh, F. A survey of algorithms for contiguity-constrained clustering and related problems. Comput. J. 1985, 28, 82–88.
Sharif, Z.; Jung, L.T.; Ayaz, M. Priority-based Resource Allocation Scheme for Mobile Edge Computing. In Proceedings of the 2022 2nd International Conference on Computing and Information Technology (ICCIT), Tabuk, Saudia Arabia, 25–27 January 2022; pp. 138–143.
Müllner, D. Modern hierarchical, agglomerative clustering algorithms. arXiv 2011, arXiv:1109.2378.
Gowda, K.C.; Krishna, G.J.P.R. Agglomerative clustering using the concept of mutual nearest neighbourhood. Pattern Recognit. 1978, 10, 105–112.
Rathore, P. Big Data Cluster Analysis and Its Applications. Ph.D. Thesis, University of Melbourne, Parkville, Victoria, Australia, 2018.
Jain, A.K.; Dubes, R.C. Algorithms for Clustering Data; Prentice-Hall, Inc.: Hoboken, NJ, USA, 1988.
Benabdellah, A.C.; Benghabrit, A.; Bouhaddou, I. A survey of clustering algorithms for an industrial context. Procedia Comput. Sci. 2019, 148, 291–302.
Xu, R.; Wunsch, D. Survey of clustering algorithms. IEEE Trans. Neural Netw. 2005, 16, 645–678.
Sibson, R. SLINK: An optimally efficient algorithm for the single-link cluster method. Comput. J. 1973, 16, 30–34.
Defays, D. An efficient algorithm for a complete link method. Comput. J. 1977, 20, 364–366.
Voorhees, E.M. Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Inf. Process. Manag. 1986, 22, 465–476.
Murtagh, F. A survey of recent advances in hierarchical clustering algorithms. Comput. J. 1983, 26, 354–359.
Day, W.H.; Edelsbrunner, H. Efficient algorithms for agglomerative hierarchical clustering methods. J. Classif. 1984, 1, 7–24.
Sharif, Z.; Jung, L.T.; Razzak, I.; Alazab, M. Adaptive and priority-based resource allocation for efficient resources utilization in mobile edge computing. IEEE Internet Things J. 2021.
Savaresi, S.M.; Boley, D.L.; Bittanti, S.; Gazzaniga, G. Cluster Selection in Divisive Clustering Algorithms. In Proceedings of the 2002 SIAM International Conference on Data Mining, Arlington, VA, USA, 11–13 April 2002; pp. 299–314.
Boley, D. Principal direction divisive partitioning. Data Min. Knowl. Discov. 1998, 2, 325–344.
Chavent, M.; Lechevallier, Y.; Briant, O. DIVCLUS-T: A monothetic divisive hierarchical clustering method. Comput. Stat. Data Anal. 2007, 52, 687–701.
Karypis, G.; Kumar, V. Multilevel k-way hypergraph partitioning. In Proceedings of the 36th Annual Acm/Ieee Design Automation Conference, New Orleans, LA, USA, 21–25 June 1999; pp. 343–348.
Zhong, C.; Miao, D.; Wang, R.; Zhou, X. DIVFRP: An automatic divisive hierarchical clustering method based on the furthest reference points. Pattern Recognit. Lett. 2008, 29, 2067–2077.
Feng, L.; Qiu, M.-H.; Wang, Y.-X.; Xiang, Q.-L.; Yang, Y.-F.; Liu, K. A fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recognit. Lett. 2010, 31, 1216–1225.
Macnaughton-Smith, P.; Williams, W.; Dale, M.; Mockett, L.J.N. Dissimilarity analysis: A new technique of hierarchical sub-division. Nature 1964, 202, 1034–1035.
Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2009.
Wang, Y.; Yan, H.; Sriskandarajah, C. The weighted sum of split and diameter clustering. J. Classif. 1996, 13, 231–248.
Guénoche, A.; Hansen, P.; Jaumard, B. Efficient algorithms for divisive hierarchical clustering with the diameter criterion. J. Classif. 1991, 8, 5–30.
Sneath, P.H. Thirty years of numerical taxonomy. Syst. Biol. 1995, 44, 281–298.
Williams, W.T.; Lambert, J.M. Multivariate methods in plant ecology: I. Association-analysis in plant communities. J. Ecol. 1959, 47, 83–101.
Brito, P.M.; Chavent, M. Divisive Monothetic Clustering for Interval and Histogram-Valued Data. In Proceedings of the ICPRAM 2012-1st International Conference on Pattern Recognition Applications and Methods, Algarve, Portugal, 6–8 February 2012; pp. 229–234.
Zhu, J. Divisive Hierarchical Clustering for Interval-Valued Data. Ph.D. Thesis, University of Georgia, Athens, GA, USA, 2019.
Kim, J.; Billard, L. A polythetic clustering process and cluster validity indexes for histogram-valued objects. Comput. Stat. Data Anal. 2011, 55, 2250–2262.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.