In the field of data mining, clustering has shown to be an important technique. Hierarchical clustering algorithms are a type of clustering algorithm where data items will be divided into sections in a hierarchical form. To create a dendrogram that shows the formulated cluster’s hierarchical structure, in a topdown or bottomup way clusters are created iteratively.
1. Introduction
Hierarchical clustering algorithms enables data exploration at various granularity levels ^{[1]}. One is the divisive method where the topdown strategy is followed, and another is an agglomerative method where the bottomup approach is followed. The process agglomerative technique follows clusters which are formed from identical items by properly combining them repeatedly into bigger clusters to establish the hierarchy’s various levels. This process continues until the full object is transformed into a particular cluster or the stopping criteria is met. The opposite is true when using a polarizing strategy.
Iteratively, the cluster comprising all the objects is dispersed until either the halting requirement is satisfied, or each object creates its own cluster. The cluster element’s closeness or dissimilarity is used to determine whether to merge or split the data.
The distance among the points of subgroups is calculated from the distance of individual points, hierarchy clustering allows for the merging or splitting of subsets of a point. The linkage metric, which measures proximity, is used to ascertain this. There are three kinds of linkages: one of them is single linkage, the second one is average connection and the last is complete linkage which is usually used in hierarchical clustering ^{[1]}^{[2]}^{[3]}^{[4]}^{[5]}. The algorithm for hierarchical clustering utilizes n*n the linkage metrics utilized for the clustering are created in connectivity matrix form. Finding the similarities between each pair of data points allows for the building of the similarity matrix. When deciding on a linking criterion, it is common practice to measure the pairwise distance between each cluster. Using the measure of similarity, researchers may determine the separation between the groups of clusters. It is also utilized to answer the question of how the clusters themselves take form.
2. Agglomerative Clustering
In unsupervised machine learning, hierarchical, agglomerative clustering is a significant and wellestablished approach. Agglomerative clustering methods begin by dividing the data set into singleton nodes and gradually combining the two currently closest nodes into a single node until only one node is left, which contains the whole data set. This process serves as a common description for several clustering systems, but they vary in how the measure of intercluster dissimilarity is updated after each step ^{[6]}. The objective function’s optimal value serves as the criterion for selecting the pair of clusters to merge at each phase. Instead of binary data, this clustering algorithm is best suited for quantitative variables. The research of ^{[7]} devised a nonparametric hierarchy, with a conventional closest neighbor approach, an agglomerative clustering method determines a sample point’s mutual neighborhood value (MNV) and mutual nearest neighbors (MNN). Agglomerative hierarchical clustering is further subdivided into the following categories.

Singlelinkage clustering: this type of clustering is also known as the minimal, connectedness, or nearestneighbor approach. The closest distance between any two cluster members of any cluster is measured. By calculating the closest distance between a single element pair, it calculates the similarity between two clusters. The chaining effect of the single linkage clustering has the propensity to produce extended clusters ^{[8]}.

Average linkage clustering: the minimumvariance linkage is another name for average linkage clustering ^{[1]}^{[9]}. It determines the average or median distance between each cluster of data points ^{[8]}.

Complete linkage: the complete linkage, often referred to as the maximum, diameter, or farthest neighbor method, measures the longest distance between any member of one cluster and any member of the other cluster in order to calculate the distance between two clusters. Compared to singlelinkage clustering, the completelinkage algorithm clusters are smaller and more closely linked ^{[9]}. The threeproximity metrics that were described earlier take into account all the points in a pair of clusters when calculating the intercluster distances. They are thought of as graph techniques ^{[10]}^{[11]}.
SLINK is an implementation of the single linkage hierarchical clustering technique ^{[12]}; the authors of ^{[13]}^{[14]} developed CLINK, which is an implementation of the complete linkage clustering algorithm and are examples of the average link clustering algorithm. Other geometrical techniques were created using the center point as a proximity measure based on the same concept. These comprised the minimum variance linkage metrics, centroid linkage, and median linkage metrics ^{[15]}^{[16]}^{[17]}. While similarity metrics capture intracluster connectedness, a distancebased proximity measure captures intercluster closeness. The adjustable amount of granularity and any similarity metric can be handled by the hierarchical clustering techniques ^{[8]}.
3. Divisive Hierarchical Clustering
The agglomerative clustering process breaks each cluster into smaller groups starting with each item in a single cluster and continuing until the necessary number of clusters is reached and it is reversed by the process known as “divisive hierarchical clustering.” The divisive approach, in contrast to the agglomerative clustering method, employs the topdown method, where the data objects are initially thought of as a fused cluster that gradually separates depending on when the cluster number is collected ^{[18]}^{[19]}^{[20]}. In order to divide a cluster into two subsets that each contain one or more components, the usual procedure takes into account all potential bipartitions. Even though it is common practice to examine all potential bipartitions in which each cluster is capable of being divided into two smaller clusters it is clear that the entire enumeration procedure provides a universal optimum but is quite costly in terms of computation cost.
Diverse divisive clustering methods that do not take into account all bipartitions have been researched. For instance, ^{[21]} compared the conventional KMeans or agglomerative method, and a bisecting KMeans divisive clustering method was presented. Another study ^{[22]} combined it with the divisive clustering approach to investigate a unique clustering technique dubbed “reference pointbased dissimilarity measure” (DIVFRP) for the aim of dataset division.
The author in ^{[23]} proposed an improved particle optimizer (IDPSO) to identify the most convenient optimal partition hyperplane for dividing the chosen clusters into two parts. This dividing method is a practical and effective component of the divisive hierarchical approach. The authors in ^{[24]}^{[25]} investigated the iterative division technique using the average dissimilarity between an object and a set of objects. A different strategy, however, focuses on optimization criteria that include partitioning or bipartitioning and uses a dissimilarity matrix as input ^{[26]}^{[27]}. There are two main categories of divisive clustering: monothetic and polythetic approaches. When a set of logical qualities are both required and sufficient for inclusion in a cluster, researchers refer to that cluster as monothetic ^{[28]}.
Monothetic divisive clusters are formed by dividing items based on a single variable in each breaking, such as whether or not they have a certain object value. The “association analysis approach” has a version called monothetic; the author of ^{[29]} developed it specifically for binary information. Several researchers have used monothetic clusters to solve problems. For instance, the authors of ^{[30]} provided an approach that gives an arrangement of things and a monothetic description of each cluster. Similarly, the author in ^{[31]} developed three monothetic techniques and principal component analysis (PCA) for intervalued data. The initial PCA method utilized intervalued data. The author’s second approach relied on symbolic data, while their third and final algorithm was derived from the terminal values of intervals. In the end, the author tested their model using realworld data to ensure its accuracy.
Contrarily, polythetic divisive clustering is a method that uses all parameters concurrently by calculating distances or resemblance values. Rather than relying on the relative positions of variables, it relies solely on distance values, which in turn indicate the dissimilarity between all of the variables simultaneously ^{[32]}.