Semantic Segmentation of Key Railroad Structures

Semantic Segmentation of Key Railroad Structures: Comparison

Please note this is a comparison between Version 1 by Junjie Chen and Version 2 by Lindsay Dong.

To ensure efficient railroad operation and maintenance management, the accurate reconstruction of railroad Building Information Modeling (BIM models is a crucial step. The segmentation of railroad point cloud data is often challenging due to the large volume and complex structure of the data, making manual division a time-consuming and labor-intensive task. However, the dominant structures in railroads are typically linear, such as tracks and power lines, which allows for segmentation by leveraging geometric features and corresponding algorithms.

point cloud semantic segmentation
Railroad systems

1. Introduction

Railroad systems have long been recognized as vital components of transportation networks, playing a crucial role in driving economic growth and facilitating social development ^[1][2][3][1,2,3]. However, the operation of railroads is susceptible to various factors such as geological changes, line degradation, and train-induced vibrations, which pose risks to their safe operation ^[4][5][6][4,5,6]. To ensure the stability and safety of railroads, it is essential to establish a real-time monitoring and maintenance system that replaces the conventional manual inspection methods, known for being inefficient and time-consuming [7].

The foundation of such a system lies in the railroad model, which serves as a platform for displaying diverse data. However, the complexity of railroad infrastructure, extensive track networks, and intricate structures make the reconstruction of accurate railroad models challenging and labor-intensive ^[8][9][8,9]. Therefore, there is a pressing need for digital construction techniques to efficiently capture and represent engineering structures. Moreover, as the demand for modifications and expansions continues to rise, exploring more efficient and precise management approaches becomes increasingly critical.

Digital construction not only facilitates subsequent maintenance and transformation processes by providing comprehensive data sources but also significantly enhances the efficiency of maintenance tasks while streamlining data collection and decision-making procedures ^[10][11][12][10,11,12].

With the advancement of technology, the combination of Building Information Modeling (BIM) and point cloud technology has found extensive applications in railroad maintenance and operations within the transportation sector ^[13][14][15][13,14,15]. By utilizing laser scanners to capture surface information of railroad infrastructure, a vast amount of precise three-dimensional point cloud data with coordinates and intensity information are obtained, facilitating the rapid and accurate reconstruction of large-scale Building Information Modeling (BIM M models ^[16][17][18][16,17,18]. This integration addresses various issues in railroad projects, such as incomplete drawing preservation, inaccuracies in construction descriptions, and variations during operational phases, which would otherwise hinder the precise establishment of BIM models ^[19][20][21][19,20,21]. Consequently, these issues lead to increased operational difficulties, rising costs, and reduced efficiency in information dissemination and scheduling ^[22][23][22,23].

The integration of BIM and point cloud technology provides the railroad engineering domain with digital twin systems that accelerate information sharing, enhance maintenance effectiveness, simulate scenarios, acquire health status information, and offer other advantages ^[24][25][26][24,25,26].

In the railroad domain, point cloud data collection is commonly achieved through the utilization of inspection vehicles mounted on the tracks ^[27][28][27,28].

2. Semantic Segmentation of Key Railroad Structures

The segmentation of railroad point cloud data is often challenging due to the large volume and complex structure of the data, making manual division a time-consuming and labor-intensive task. However, the dominant structures in railroads are typically linear, such as tracks and power lines, which allows for segmentation by leveraging geometric features and corresponding algorithms. Existing algorithms primarily rely on heuristic approaches and utilize external contour features and intensity information of rail tracks as the basis for segmentation. For instance, Sánchez-Rodríguez et al. [28] proposed a heuristic method that successfully segmented various parts of a railroad tunnel by exploiting the geometry features and intensity information of rail tracks. The method effectively extracted structures like the ground and tracks. In a subsequent study, M. Soilán et al. ^[29][48] employed a heuristic point cloud processing step to reliably extract rail track point clouds. They detected linearity through equation fitting and converted the data into a format compliant with the Industry Foundation Classes (IFC) standard for BIM modeling. This approach successfully achieved the reconstruction of a BIM model from point cloud data. However, it should be noted that the effectiveness of this method decreases when applied to more complex scenes, such as multi-line tracks and other ground facilities. The intensity information in point clouds is influenced by numerous factors, and more importantly, it is relative and can exhibit significant variations across different point cloud datasets ^[30][49]. Hence, it is advisable to minimize the reliance on intensity information during the point cloud segmentation process. On the other hand, the geometry information of steel rails remains relatively consistent, making the extraction of geometry information more stable and easier to verify and evaluate. Consequently, the crucial aspect of extracting steel rails from diverse point cloud data lies in effectively handling the ground information in different scenes. Previous research, such as that conducted by Yun-Jian Cheng [22], successfully extracted track vertices from relatively flat tunnels using solely the height difference information of steel rails. The extracted line form was then employed for track model reconstruction. However, such methods become ineffective when confronted with more complex ground information. To the best of the author’s knowledge, there is currently no universal approach capable of accurately separating railroad tracks from ground surfaces in complex environments. Existing ground filtering algorithms, such as morphological operations, normal differences, and region growing, lack theoretical support when addressing these challenges ^[31][32][33][50,51,52]. In recent years, progressively morphological filters (PMF) ^[34][53] and cloth simulation filters (CSF) ^[35][54], which are scale-invariant and terrain-adaptive, have been widely utilized in combination with irregular triangulated networks (TIN) or differential digital elevation models (DEM) ^[36][37][38][55,56,57] to process digital terrain models (DTM) obtained from airborne LiDAR. These methods often employ native techniques and integrate them with other approaches to separate the ground from large-scale scenes and extract structures like trees, buildings, and power lines. However, they typically have low requirements for detailed results. When faced with the separation of specific structures, such as railroad vegetation filtering and shield tunnel bolt-hole extraction ^[27][39][27,58], a higher level of detail is required, necessitating adaptive modifications to the CSF method. Despite these adaptations, these methods still primarily focus on extracting a particular type of outward protruding structure from the space, which demonstrates the versatility of the approach. Currently, there are no studies that have employed the CSF method for railroad structure extraction. Hence, there is value and rationale in enhancing the CSF method to suit the extraction of railroad structures. In the segmentation of overhead line-type structures, a common approach is the adoption of a mixed model fitting method. Liang et al. utilized the least squares method (LSM) to identify power lines and reconstruct them based on the spatial distribution characteristics of adjacent point clouds ^[40][59]. Yadav et al. employed the Hough transform (HT) to successfully separate power lines from diverse scenes, including urban and rural areas, achieving an accuracy of 98.84% ^[41][60]. Furthermore, by combining principal component analysis (PCA) with the RANSAC algorithm, M. Lehtomäki et al. extracted column and power line data with 93.6% completeness from various complex environments ^[42][61]. These methods have demonstrated their effectiveness in extracting different power line models. However, their performance may decline in the presence of uneven point cloud distribution and a significant amount of noise. Therefore, further consideration is necessary to address these limitations in future work. Moreover, the existing point cloud segmentation methods heavily rely on the device trajectory information during the scanning process as the basis for line segmentation ^[28][29][28,48]. However, such devices are subject to certain limitations during the occupation time of railroad works and track inspection equipment, as well as being relatively expensive. For the purpose of railroad maintenance and operation, handheld laser scanners have the advantages of being lightweight, low-cost, and flexible, allowing workers to scan the railroad structure flexibly during non-occupation periods. The device’s precision is also sufficient to extract key information about the railroad line. However, there are some structural occlusion issues during the scanning process, and scanning personnel need to move left and right along the railroad line to complete the scanning of the railroad structure, which renders the trajectory information of limited value.

Deep Learning

In the past five years, deep learning networks have been extensively employed for processing three-dimensional point cloud data, owing to their robust generalization capability and high classification accuracy. Different deep learning methods have been proposed based on the specific application domains. In ^[43][62], existing methods are categorized as follows: Multi-view-based methods: These techniques project the point cloud into multiple desired views and subsequently process the resulting 2D images using deep learning to represent the 3D shape of objects. This approach finds wide application in the classification of 3D objects ^[44][63]. However, it faces challenges in handling large-scale scene data, as it struggles to fully utilize spatial information and address geometric relationships between structures effectively. Voxel-based methods: These approaches divide the original point cloud into uniformly discrete data using a regular 3D grid, generating corresponding voxel data where each voxel contains a group of corresponding points. Subsequently, multi-scale convolutions with deep learning are used to extract local features ^[45][64] and handle relationships among voxels for classification and segmentation. Nevertheless, factors such as voxel grid size selection, potential empty areas in the scene, and varying scales of 3D shapes greatly impact the processing results, making this method unsuitable for large-scale point cloud processing. Point cloud-based methods: These methods directly process the point cloud coordinates, aggregating local and global features of discrete points to achieve classification and segmentation. It is not limited by structural scales, thus finding extensive application in large scene segmentation. Two prominent networks, PointNet++ and RandLa-Net ^[46][47][39,41], have demonstrated excellent performance in point cloud scene segmentation. However, based on practical point cloud segmentation in the railroad domain, PointNet++ tends to lose global information while segmenting the point cloud into local regions, and the Farthest Point Sampling (FPS) algorithm exhibits lower efficiency in large-scale scenes. On the other hand, RandLa-Net addresses large-scale point cloud segmentation by employing random sampling and aggregating local features, resulting in faster processing speed and more comprehensive global information ^[48][49][65,66], making it more suitable for point cloud segmentation tasks in railroad environments. Currently, there have been some achievements in the semantic segmentation of complex railroad scenes ^[50][67]. This approach successfully performs key structural segmentation for various elements such as “Rails, Background, Informative Signs,” and other large-scale components. However, it is acknowledged that solely relying on deep learning methods for railroad scene segmentation presents challenges in handling noise issues, and the resulting model might not be readily applicable to other scenarios. Therefore, to ensure segmentation quality, deep learning can be used as a semi-automatic segmentation method to replace certain manual labor, while dedicated segmentation algorithms for specific structures should also be considered.