1. Introduction
The concept of autonomous or driverless vehicles refers to vehicles that are intelligent in their operation and are intended to minimize the need for human assistance. Exteroceptive and proprioceptive sensors on these vehicles allow them to monitor their environment and internal states simultaneously
[1][2]. With heterogeneous sensors, such as cameras, light detection and ranging (LiDAR), radar, global positioning system (GPS), etc., the vehicle is able to learn different tasks and can use its understanding of the context in which it operates
[3]. For autonomous vehicles (AVs) to operate safely and reliably in environments that are complex and dynamic, they must be able to perceive the environment accurately and localize themselves precisely
[4][5]. It is necessary to acquire and process high-quality, information-rich data obtained from actual environments to accomplish both of these tasks
[6]. Multiple sensors, such as LiDAR and cameras, are used on AVs to capture target context. Digital camera data have traditionally been the most popular source of perception data because they provide two-dimensional (2D) appearance-based representations, are low cost, and are highly efficient
[7]. Due to the lack of three-dimensional (3D) geo-referenced information in image data, the dense, geo-referenced, and accurate 3D point clouds generated by LiDAR are exploited. In addition, LiDAR is not sensitive to changes in lighting conditions and can be used at any time of the day or night, even when glare or shadows are present
[8].
Using LiDAR to generate 3D points can be challenging due to the density of the points. As a result, pre-processing is used to remove noise and extract useful information from the data. It is extremely beneficial to cluster LiDAR data in a wide variety of applications, particularly in those with real-time edge-based data, such as object detection and classification
[9]. Three-dimensional data allow people to determine the shape, size, and other properties of the objects with great precision. However, the task of segmenting 3D point clouds is challenging. It is common for point cloud data to be noisy, sparse, and disorganized. As a result of the scanner’s varying linear and angular rates, the sampling density of points is also typically uneven. Aside from this, the shape of the surface is arbitrary, with sharp features, and the data do not follow a statistical distribution. It is also important to note that, because of the limitations of the 3D sensors, the foreground and background are frequently very entangled. Designing an algorithm to deal with these problems presents a significant challenge
[10].
A further challenge for autonomous vehicles is to perceive their surrounding environment, e.g., when performing complex maneuvers in urban environments for successful navigation
[11]. These maneuvers include merging into or taking out of a lane, following or overtaking the vehicle in front, and crossing an intersection simultaneously with vehicles from other directions. Without the ability to perceive the motion of other objects, it is difficult to manage these situations. Thus, detecting and tracking moving objects on the road is an essential task for intelligent vehicles
[12]. In modern tracking systems, Multi-Target Tracking (MTT) is usually employed, which adopts a single or multiple sensors to produce detections from multiple targets, as well as one or more tracks for the estimation of their states. Prior to updating tracks, MTTs must assign detections to tracks. However, there are a number of challenges in data association that need to be considered by MTT as mentioned in
[13].
-
The assignment of a target to a detection or a nearby detection becomes ambiguous if they are densely distributed;
-
Sensors with a small field of view (fov) might not be able to detect the true target during a sensor scan;
-
It is possible for two targets in close proximity to be detected as a single object if the sensor resolution is low;
-
The possibility of false alarms increases the complexity of data assignment by introducing additional possible assignments.
There have been several state-of-the-art techniques developed to address the challenges associated with clustering and Multi-Target Tracking. Ref.
[14] provides a complete system for detecting and tracking vehicles based solely on 3D LiDAR information. Using previously mapped LiDAR point clouds to reduce processing time, ref.
[15] describes real-time dynamic object detection algorithms. In
[16], a skeleton-based hierarchical method was proposed, which is capable of automatically detecting pole-like objects using mobile LiDAR point clouds. The authors have proposed a compression approach based on a convolutional long-short-term memory network (LSTM) for multi-line LiDAR point clouds
[17]. These different techniques produce promising results for clustering and tracking multiple objects.
2. A General Overview of Clustering Techniques
It is essential for most autonomous solutions, such as robotics and self-driving cars, to have light detection and ranging (LiDAR) sensors
[18]. Because of the dense 3D points generated by LiDAR, it can be difficult to work directly with them. In many applications, clustering LiDAR data are extremely beneficial, particularly those based on real-time edge-based data, such as the detection and classification of objects
[9]. Cluster analysis is a quantitative method of comparing multiple characteristics of individuals of a population to determine their membership in a particular group. Clustering algorithms are designed to identify natural groupings in unlabeled data by developing a technique that recognizes these groups
[19].
2.1. Partitioning-Based Clustering
Based on distances from the cluster center, this is an iterative approach that discovers similarities among intra-cluster points. Two assumptions are considered in the partitioning-based clustering process.
The initialization of cluster centers is the first step in this method. The distances between data points and all centers are calculated based on a particular metric. Data points are assigned to clusters with the closest centroid and the centroid of those clusters is reassigned. This category includes algorithms such as
K-means and
K-mods
[20][21][22].
2.2. Hierarchical Clustering
According to this model, two approaches are employed: the agglomerative approach (bottom-up) and the divisive approach (top-down). Data points are considered to be clusters in the first approach. Following the selection of a distance metric, the nearest pair of points is grouped into a single cluster. Clusters are formed by combining the data points iteratively until all of them have been combined. In the second approach, all the data points are clustered into a single cluster. As the distance between them increases, they are subsequently split into separate groups
[23][24].
2.3. Density-Based Clustering
Using this algorithm, clusters are formed depending on the density of data points in the data space. The dense regions are grouped as clusters, whereas the low-density regions are partitioned. As a result, this algorithm limits the impact of outliers or noise on data. In this algorithm, arbitrary data points that have not yet been visited are selected and their neighborhood is checked. The formation of a cluster occurs only when a sufficient number of points are located within a certain distance, epsilon. An outlier will be marked if the data point does not conform to the normal distribution. Each set of points that have not been visited is processed iteratively
[25][26].
2.4. Grid-Based Clustering
Algorithms based on grids do not directly access databases. The data are gathered from the database using statistical methods and then a uniform grid is created based on that data. In this case, the performance of the algorithm is determined by the size of the grid rather than the size of the data space itself. Since the algorithm operates with a smaller grid size, it requires fewer computational resources than directly accessing the database. Once the grid has been formed, it computes each cell’s density. A cell is discarded if its density is below the threshold value. As a final step, clusters are created from groups of dense cells that are contiguous
[27][28].
2.5. Model-Based Clustering
An algorithm based on model-based clustering employs statistical or mathematical models to generate clusters without requiring the number of clusters to be predetermined. The algorithms assume that the data are generated from a mixture of underlying probability distributions. This algorithm partitions the data points into clusters by estimating the parameters of these distributions. By using this approach, cluster formation can be flexible and data-driven, which enables it to adapt to various data characteristics and structures, ultimately leading to a deeper understanding of the underlying patterns
[29][30].
3. Clustering of Point Clouds for Autonomous Driving
Academic research teams have been using LiDAR for its extensive range and satisfactory accuracy. Moreover, recent hardware advancements promising superior, more affordable, and compact sensors have garnered interest from the industry. An autonomous vehicle is equipped with LiDAR that perceives the surrounding environment, and the task is to analyze and extract meaningful information, such as, for example, the number of obstacles
[31], their location and velocity with respect to the vehicle, and their classification as vehicles, pedestrians, poles, etc. Similarly, fine-segmenting the input data into meaningful clusters is the first step in this type of analysis
[32]. It is in this context that this section examines how the various clustering approaches have been employed in recent years to process point clouds within the context of autonomous driving.
Several studies have made significant advancements in the field of point cloud processing for autonomous vehicles, which will be discussed in detail in the following paragraphs. The purpose of these studies is to address a variety of issues relating to autonomous vehicle point cloud processing. A study
[9] investigates efficient parameter estimation for Density-Based Spatial Clustering of Applications with Noise (DBSCAN), implementing automatic background removal. Furthermore, researchers investigate adaptive clustering by using elliptical neighborhoods to avoid over-segmentation and under-segmentation
[33]. Additionally, point clouds and camera data are merged for real-time object detection in another study
[34]. Additionally, a work investigates dynamic clustering algorithms for point clouds generated by LiDAR, which adapt to non-uniform spatial distributions
[35]. The authors also discuss how to segment objects quickly and accurately using the InsClustering technique
[36]. A novel prediction method is employed in another study to address the challenges associated with the compression of LiDAR data
[37]. According to one study, obstacle fragmentation is the result of occlusions or oblique surface orientations that lead to the fragmentation of obstacles
[38]. In
[39] researchers use a hybrid machine learning approach, processing 3D point cloud data into a bird’s-eye view image, identifying objects using deep learning and DBSCAN clustering. There is a two-stage clustering method
[40] that combines ground plane extraction and an adaptive DBSCAN algorithm to reduce oversegmentation problems. Ref.
[41] introduced a TLG clustering technique for point clouds using range graphs, segmentation standards, and a search algorithm.
The work
[9] aimed to automatically estimate the parameters of DBSCAN by leveraging the structure of the point cloud. This technique implements a field of view division and empirical relations that allow each point to be independently estimated. A DNN is used before beginning parameter estimation to remove the background points. Based on a decoder/encoder structure, the network learns features that allow it to differentiate between, for example, foreground and background points. A feature vector is derived from the resulting points and is employed to classify them as foreground or background. With the background filtration process, the point cloud size can be reduced and clustering can be performed more quickly. This scheme involves dividing the field of view into equal-sized regions and further dividing each region into cells of equal size. By dividing the LiDAR point cloud into cells, local information about the point cloud can be calculated, such as the density of points within the cells.
The study in
[33] was carried out in two parts. First, the points were projected onto a grid map, ground points were removed by determining the maximum height difference, and roadsides were detected with the Hough transform to determine the dynamic region of interest (ROI). In the second part, a DBSCAN-based adaptive clustering method (DAC) has been proposed to reduce the risk of over-segmentation and under-segmentation due to the variations in spatial density of point clouds in relation to their positions. The elliptic neighborhood is designed to match the distribution properties of the point cloud to avoid the possibility of over-segmentation and under-segmentation. To handle the uniformity of points in different ranges, the parameters of the ellipse are adaptively adjusted with respect to the location of the sample point.
A real-time fusion framework was proposed in
[34] to detect all objects on the road in real time. This involves fusing point clouds with camera data. LiDAR point clouds are pre-processed to remove points corresponding to the ground or higher than expected objects’ heights from the sampled point clouds. The method propagates the concept of ground along the point cloud by classifying all neighboring points adjacent to each other as ground, by considering them in order. In other words, this classification is performed by considering the points sequentially, in accordance with their arrangement within the point cloud.
Based on DBSCAN, the clustering algorithm operates only in two dimensions, i.e., latitude and longitude, ignoring height. It follows that this solution is appropriate if objects are never stacked on top of one another. A window-based Lidar Clustering (WBLC) system is proposed that receives a 2D point cloud as input and generates a report of the clusters identified from that point cloud. There are two components of the algorithm: the search for neighbors and the merging of clusters. After finding the maximum neighbors for a specified distance criterion, the window is closed for reducing memory usage and computer complexity in the neighbor search method. Iteratively, the neighbor lists of each point are inspected to merge all lists belonging to the same cluster. If a list of neighbors contains fewer than a minimum number of points minCluster then the condition is true and considered as noise.
According to
[35], a dynamic clustering algorithm can be used to adapt autonomous vehicles to non-uniform spatial distributions of LiDAR point clouds. Based on the position of the core point, the algorithm employs an elliptical function to describe the neighbor. The KITTI dataset
[42] was used to develop clustering parameters and the effectiveness of the algorithm was explored using comparisons between clustering methods and projection planes, using three IBEO LUX 8 LiDARs mounted on an electric sedan.
In paper
[36], “InsClustering” was presented—a fast and accurate method of clustering point clouds for autonomous vehicles using LiDAR. Consequently, it provides an efficient means of segmenting the ground and clustering the objects within the limited amount of time available. With the use of Velodyne UltraPuck LiDAR range images in spherical coordinates, the method is capable of maintaining clustering accuracy and minimizing over-segmentation due to a coarse-to-fine segmentation process.
It has become increasingly important in recent years to compress point cloud data, especially in the context of autonomous vehicles, where accurate and efficient LiDAR data processing is crucial. Due to their limitations in encoding floating-point numbers and handling distance information inherent in LiDAR data, traditional image and video compression algorithms, such as JPEG2000, JPEG-LS, and High Efficiency Video Coding (HEVC), do not suit point cloud data compression. The proposed method
[37] addresses these challenges by employing a lossless compression scheme based on point cloud clustering as well as exploring lossy compression techniques. Rather than relying on traditional image prediction methods, this approach involves a novel prediction method based on correlations between distance information among points. As a result, spatial redundancies can be eliminated without compromising the integrity of the dataset.
The authors of
[38] investigated the problem of obstacle fragmentation, which can occur in LIDAR-based perception systems as a result of occlusion or when the detected object’s surface orientation is oblique to the LIDAR beams. To obtain a more accurate representation of each object, the proposed algorithm detects and joins fragmented segments. There is no restriction on the size or convexity of the objects to enable the detection of objects of any shape using this approach. It is focused on ‘L’ shaped objects (such as cars) and ‘I’ shaped objects (such as walls). Using the Ramer–Douglas–Peucker algorithm, each segment is evaluated to determine whether it can be approximated by one or two lines with closer distances to the points. In such a case, the segment is considered to be ‘L’ or ‘I’ shaped. The result of this process is a set of component lines for each segment. As long as two segments meet the following physical requirements, they are regarded as potentially belonging to the same object:
-
Segments located at a greater distance from the two segments are not considered.
-
As a result of joining, the resulting set is shaped as either an ‘L’ or an ‘I’.
The research in
[39] employed a hybrid approach to point cloud processing for object detection, using both traditional machine learning and deep learning techniques. To extract multi-scale features, the 3D point cloud data are first transformed into a bird’s eye view (BEV) rasterized image, which is then processed by a custom 2D convolutional feature pyramid network, a deep learning model. It identifies objects by detecting known anchors, learning a category-agnostic embedding space, and performing DBSCAN clustering. Through the use of this innovative method, it has been demonstrated that machine learning and deep learning can be applied synergistically to the processing of point clouds.
A two-stage clustering method for LiDAR data is presented in
[40]. First, ground line fitting reduces the data load by extracting the ground plane from the data. In the case of the non-ground data, a range image-based method is used, in which subclusters are initially created through a sliding window approach, and then refined through an adaptive DBSCAN algorithm. It effectively reduces over-segmentation and adapts to variances in object distances.
A method of clustering point clouds based on Two-Layer-Graphs (TLG)
[41] was proposed to improve the accuracy and speed of segmenting point clouds. This involves dividing the task into storage structures, segmentation standards, and category updates. Range graphs and point cloud set graphs were used to enable fast access to and relationships between neighboring points. Standards for segmentation include distance and angle characteristics in the horizontal direction and distance in the vertical direction. The category update was accomplished through the use of a search algorithm traversing the two layers of the graph. The results of the experiments demonstrated that clustering and differentiation of objects in traffic scenes can be achieved effectively.
4. Challenges
4.1. Data Complexity
When clustering LiDAR point clouds, data complexity poses a significant challenge, especially when addressing the need for real-time processing and the associated computational requirements. This complexity can be attributed to several factors. In LiDAR sensors, for example, high-dimensional data are captured in multiple dimensions, including spatial coordinates (x, y, z), intensity, and time. The computational requirements for the processing and clustering of the data increase as the dimensionality of the data increases. As a result, processing time and memory usage may increase, making real-time performance difficult
[43]. A LiDAR sensor can produce millions of points per second, which is a significant amount of data
[44]. Autonomous driving systems require algorithms that are capable of processing large data sets in real time to ensure accurate and timely decision-making. Furthermore, LiDAR data can be affected by many factors, including sensor noise, sensor imperfections, and environmental conditions, which may lead to errors and inconsistencies. To cluster points accurately, clustering algorithms must be robust enough to handle this noise and variability.
4.2. Over-Segmentation and Under-Segmentation
It is possible to over-segment an object by splitting it into multiple segments or clusters, leading to an inaccurate representation of the object. In contrast, under-segmentation occurs when multiple objects are clustered together, which may lead to misinterpretation of the scene. A LiDAR point cloud generally exhibits a non-uniform spatial distribution, with the point density varying from one region to another
[45]. As a result of this variability, it is difficult for clustering algorithms to accurately group points into meaningful groups. Regions with a high density may result in over-segmentation, while regions with a low density may result in under-segmentation
[32]. There can be a variety of noise sources that can affect LiDAR data, such as sensor errors, environmental factors, or surface reflections. Noise and outliers can confuse clustering algorithms, resulting in over-segmentation or under-segmentation
[46]. For clustering algorithms to be accurate and to improve the quality of the data, proper noise filtering and outlier detection methods are necessary.
4.3. Object Size and Shape Variations
Autonomous driving systems rely heavily on LiDAR data to accurately identify and track a diverse range of objects, such as vehicles, pedestrians, and infrastructure components
[47]. This diversity in object size and shape can pose challenges for clustering algorithms, as they must be able to differentiate between objects with varying characteristics. Consequently, it becomes crucial to develop algorithms that are not only robust but also adaptable to handle the wide array of size and shape variations encountered in real-world driving scenarios
[48]. To successfully navigate through complex environments, the clustering algorithms need to take into account several factors, such as the scale, orientation, and perspective of the detected objects. Moreover, these algorithms should be designed to handle occlusions, wherein some objects may be partially hidden or obstructed from view by other objects in the scene. This further complicates the process of accurate object identification and segmentation.
4.4. Occlusions and Fragmentation
In the case of an occluded scene, objects are partially hidden from the LiDAR sensor, resulting in an incomplete representation of the point cloud
[49]. A fragmented point cloud
[50] represents multiple disconnected segments of an object, which can be caused by a limited sensor resolution or the orientation of the surface of the object. The presence of both of these issues may hinder the accurate identification and clustering of objects within a point cloud, leading to possible misinterpretations of the scene. As a result of addressing these issues, algorithms must be designed in a way that improves the accuracy and reliability of object representation and recognition in LiDAR point cloud data for autonomous driving.
4.5. Dynamic Environments
Objects and their positions are constantly changing in dynamic environments, making clustering LiDAR point clouds challenging. It is crucial for safe and efficient navigation to be able to accurately identify and track objects in these dynamic contexts
[51]. Algorithms must be developed that can deal with continuous changes in velocities, orientations, and positions of objects in real time.
4.6. Ground Segmentation
In the context of autonomous vehicles and robotics applications, ground segmentation is a critical pre-processing step, as it involves accurately separating ground points from non-ground points within the LiDAR point cloud
[52]. Precise ground segmentation is essential for tasks such as object recognition, mapping, and navigation. Despite this, accurate ground segmentation can be challenging due to factors such as varying terrain, non-uniform point densities, and noise in the data. Furthermore, the diverse shapes and sizes of objects, as well as the complexity of urban environments, may make the task even more challenging. The segmentation problem needs to be addressed by developing techniques that incorporate machine learning approaches and additional sensor data to improve the accuracy of segmentation.
4.7. Evaluation Metrics
The selection of appropriate evaluation metrics and benchmark datasets is essential for assessing the performance of clustering algorithms in the context of LiDAR point cloud analysis for autonomous vehicles. As a consequence, this task may be challenging, since different metrics and datasets may emphasize different aspects of an algorithm’s performance, for example, accuracy, robustness, and computational efficiency. In addition, the suitability of a metric or dataset may depend on the specific scenario or application in which it is employed. Efforts are being made to establish standardized evaluation protocols and diverse benchmark datasets covering a wide range of scenarios, environments, and objects
[8]. Researchers are advised to carefully consider the selection of evaluation metrics and datasets when evaluating clustering algorithms. This will facilitate the comparison of different clustering algorithms and the identification of areas that need improvement in the future.