Underground sewerage systems (USSs) are a vital part of public infrastructure that contributes to collecting wastewater or stormwater from various sources and conveying it to storage tanks or sewer treatment facilities. A healthy USS with proper functionality can effectively prevent urban waterlogging and play a positive role in the sustainable development of water resources. Since it was first introduced in the 1960s, computer vision (CV) has become a mature technology that is used to realize promising automation for sewer inspections.
1. Introduction
1.1. Background
Underground sewerage systems (USSs) are a vital part of public infrastructure that contributes to collecting wastewater or stormwater from various sources and conveying it to storage tanks or sewer treatment facilities. A healthy USS with proper functionality can effectively prevent urban waterlogging and play a positive role in the sustainable development of water resources. However, sewer defects caused by different influence factors such as age and material directly affect the degradation of pipeline conditions. It was reported in previous studies that the conditions of USSs in some places are unsatisfactory and deteriorate over time. For example, a considerable proportion (20.8%) of Canadian sewers is graded as poor and very poor. The rehabilitation of these USSs is needed in the following decade in order to ensure normal operations and services on a continuing basis
[1]. Currently, the maintenance and management of USSs have become challenging problems for municipalities worldwide due to the huge economic costs
[2]. In 2019, a report in the United States of America (USA) estimated that utilities spent more than USD 3 billion on wastewater pipe replacements and repairs, which addressed 4692 miles of pipeline
[3].
1.2. Defect Inspection Framework
Since it was first introduced in the 1960s
[4], computer vision (CV) has become a mature technology that is used to realize promising automation for sewer inspections. In order to meet the increasing demands on USSs, a CV-based defect inspection system is required to identify, locate, or segment the varied defects prior to the rehabilitation process. As illustrated in
Figure 1, an efficient defect inspection framework for underground sewer pipelines should cover five stages. In the data acquisition stage, there are many available techniques such as closed-circuit television (CCTV), sewer scanner and evaluation technology (SSET), and totally integrated sonar and camera systems (TISCITs)
[5]. CCTV-based inspections rely on a remotely controlled tractor or robot with a mounted CCTV camera
[6]. An SSET is a type of method that acquires the scanned data from a suite of sensor devices
[7]. The TISCIT system utilizes sonar and CCTV cameras to obtain a 360° view of the sewer conditions
[5]. As mentioned in several studies
[6,8,9[6][8][9][10],
10], CCTV-based inspections are the most widely used methods due to the advantages of economics, safety, and simplicity. Nevertheless, the performance of CCTV-based inspections is limited by the quality of the acquired data. Therefore, image-based learning methods require pre-processing algorithms to remove noise and enhance the resolution of the collected images. Many studies on sewer inspections have recently applied image pre-processing before examining the defects
[11,12,13][11][12][13].
Figure 1. There are five stages in the defect inspection framework, which include (a) the data acquisition stage based on various sensors (CCTV, sonar, or scanner), (b) the data processing stage for the collected data, (c) the defect inspection stage containing different algorithms (defect classification, detection, and segmentation), (d) the risk assessment for detected defects using image post-processing, and (e) the final report generation stage for the condition evaluation.
2. Defect Inspection
In this section, several classic algorithms are illustrated, and the research tendency is analyzed.
Figure 2 provides a brief description of the algorithms in each category. In order to comprehensively analyze these studies, the publication time, title, utilized methodology, advantages, and disadvantages for each study are covered. Moreover, the specific proportion of each inspection algorithm is computed in
Figure 3. It is clear that the defect classification accounts for the most significant percentages in all the investigated studies.
Figure 2. The classification map of the existing algorithms for each category. The dotted boxes represent the main stages of the algorithms.
Figure 3. Proportions of the investigated studies using different inspection algorithms.
2.1. Defect Classification
Due to the recent advancements in ML, both the scientific community and industry have attempted to apply ML-based pattern recognition in various areas, such as agriculture
[32][14], resource management
[33][15], and construction
[34][16]. At present, many types of defect classification algorithms have been presented for both binary and multi-class classification tasks.
2.2. Defect Detection
Rather than the classification algorithms that merely offer each defect a class type, object detection is conducted to locate and classify the objects among the predefined classes using rectangular bounding boxes (BBs) as well as confidence scores (CSs). In recent studies, object detection technology has been increasingly applied in several fields, such as intelligent transportation
[75[17][18][19],
76,77], smart agriculture
[78[20][21][22],
79,80], and autonomous construction
[81,82,83][23][24][25]. The generic object detection consists of the one-stage approaches and the two-stage approaches. The classic one-stage detectors based on regression include YOLO
[84][26], SSD
[85][27], CornerNet
[86][28], and RetinaNet
[87][29]. The two-stage detectors are based on region proposals, including Fast R-CNN
[88][30], Faster R-CNN
[89][31], and R-FCN
[90][32].
2.3. Defect Segmentation
Defect segmentation algorithms can predict defect categories and pixel-level location information with exact shapes, which is becoming increasingly significant for the research on sewer condition assessment by re-coding the exact defect attributes and analyzing the specific severity of each defect. The previous segmentation methods were mainly based on mathematical morphology
[112,113][33][34]. However, the morphology segmentation approaches were inefficient compared to the DL-based segmentation methods. As a result, the defect segmentation methods based on DL have been recently explored in various fields.
3. Dataset and Evaluation Metric
The performances of all the algorithms were tested and are reported based on a specific dataset using specific metrics. As a result, datasets and protocols were two primary determining factors in the algorithm evaluation process. The evaluation results are not convincing if the dataset is not representative, or the used metric is poor. It is challenging to judge what method is the SOTA because the existing methods in sewer inspections utilize different datasets and protocols. Therefore, benchmark datasets and standard evaluation protocols are necessary to be provided for future studies.
3.1. Dataset
3.1.1. Dataset Collection
Currently, many data collection robotic systems have emerged that are capable of assisting workers with sewer inspection and spot repair.
Table 1 lists the latest advanced robots along with their respective information, including the robot’s name, company, pipe diameter, camera feature, country, and main strong points.
Figure 4 introduces several representative robots that are widely utilized to acquire images or videos from underground infrastructures. As shown in
Figure 4a, LETS 6.0 is a versatile and powerful inspection system that can be quickly set up to operate in 150 mm or larger pipes. A representative work (Robocam 6) of the Korean company TAP Electronics is shown in
Figure 4b. Robocam 6 is the best model to increase the inspection performance without the considerable cost of replacing the equipment.
Figure 4c is the X5-HS robot that was developed in China, which is a typical robotic crawler with a high-definition camera. In
Figure 4d, Robocam 3000, sold by Japan, is the only large-scale system that is specially devised for inspecting pipes ranging from 250 mm to 3000 mm. It used to be unrealistic to apply the crawler in huge pipelines in Korea.
Figure 4. Representative inspection robots for data acquisition. (
a) LETS 6.0, (
b) Robocam 6, (
c) X5-HS, and (
d
Table 1. The detailed information of the latest robots for sewer inspection.
| Name |
|
| Company |
|
| Pipe Diameter |
|
| Camera Feature |
|
|
Table 3. Overview of the evaluation metrics in the recent studies.
| Metric |
|
| Description |
| Country |
|
| Strong Point |
|
| Ref. |
|
|
| Ref. |
|
|
Accuracy (%) |
---|
|
| Processing Speed |
|
| CAM160 (https://goolnk.com/YrYQob accessed on 20 February 2022) |
|
| Sewer Robotics |
|
| 200–500 mm |
|
| NA |
|
1 |
| USA |
|
| ● Auto horizon adjustment |
| ● Intensity adjustable LED lighting |
| ● Multifunctional |
|
| Broken, crack, deposit, fracture, hole, root, tap |
|
| NA |
|
| NA |
|
| 4056 |
|
| Canada |
|
[9]
|
| LETS 6.0 (https://ariesindustries.com/products/ accessed on 20 February 2022) |
|
|
| 2 |
| ARIES INDUSTRIES |
|
| Connection, crack, debris, deposit, infiltration, material change, normal, root |
| 150 mm or larger |
|
| Precision |
|
| The proportion of positive samples in all positive prediction samples |
|
[9]
|
|
Recall |
| 1440 × 720–320 × 256 |
| Self-leveling lateral camera or a Pan and tilt camera |
|
|
| RedZone® |
| Solo CCTV crawler |
| USA |
|
|
| The proportion of positive prediction samples in all positive samples |
|
[48]
|
| ● Slim tractor profile |
| ● Superior lateral camera |
| ● Simultaneously acquire mainline and lateral videos |
|
[ |
| 12,000 |
|
| USA |
|
[48]
|
[36]
|
36]
|
AUPR: 6.8 |
|
| NA |
|
| [48]
|
[36]
|
| wolverine® 2.02 |
|
|
| 3 |
| ARIES INDUSTRIES |
|
| Attached deposit, defective connection, displaced joint, fissure, infiltration, ingress, intruding connection, porous, root, sealing, settled deposit, surface |
| 3 |
|
| 150–450 mm |
|
| NA |
|
| USA |
|
|
| Dataset 1: 2 classes |
| 1040 × 1040 |
| ● Powerful crawler to maneuver obstacles |
|
| Two-level hierarchical CNNs |
| Front-facing and back-facing camera with a 185∘ wide lens |
|
| Classification |
|
| 2,202,582 |
|
| Accuracy: 94.5 |
| Precision: 96.84 |
| Recall: 92 |
| ● Minimum set uptime |
| ● Camera with lens cleaning technique |
|
F1-score: 94.36 |
| The Netherlands |
|
|
| [49]
|
|
| 1.109 h for 200 videos[37] |
|
|
[69]
|
[38]
|
| X5-HS (https://goolnk.com/Rym02W accessed on 20 February 2022) |
|
| EASY-SIGHT |
|
| 4 |
| 300–3000 mm |
|
|
| ≥2 million pixels |
|
| Dataset 1: defective, normal |
|
| NA |
| Dataset 2: 6 classes |
|
| China |
|
| NA |
|
|
| Accuracy: 94.96 |
| Precision: 85.13 |
| Recall: 84.61 |
| 40,000 |
| ● High-definition |
| ● Freely choose wireless and wired connection and control |
| ● Display and save videos in real time |
|
F1-score: 84.86 |
| China |
|
|
| [69]
|
|
| 4 |
|
| 8 classes |
|
| Deep CNN |
|
| Classification |
|
| Accuracy: 64.8 |
|
| NA |
|
[70]
|
[39]
|
| True accuracy |
|
| The proportion of all predictions excluding the missed defective images among the entire actual images |
|
[58]
|
[65]
|
| AUROC |
|
| 5 |
|
| 6 classes |
|
| CNN |
|
| Classification |
|
| Accuracy: 96.58 |
|
| NA |
|
[71]
|
[41]
|
| Area under the receiver operator characteristic (ROC) curve |
|
| 6 |
|
[49]
|
[37]
|
| 8 classes |
|
| CNN |
|
| Classification |
|
| Accuracy: 97.6 |
|
| 0.15 s/image |
|
[52]
|
[42]
|
| AUPR |
|
| 7 |
| Area under the precision-recall curve |
|
|
| 7 classes |
| [49]
|
[37]
|
| Multi-class random forest |
|
| Classification |
|
| Accuracy: 71 |
|
| 25 FPS |
|
[66]
|
[43]
|
| mAP |
|
| mAP first calculates the average precision values for different recall values for one class, and then takes the average of all classes |
| 8 |
|
[9]
|
| 7 classes |
|
| SVM |
|
| Classification |
|
| Accuracy: 84.1 |
|
| NA |
|
[41]
|
[40 |
| Detection rate |
|
| The ratio of the number of the detected defects to total number of defects |
|
[106]
|
[57]
|
| Error rate |
|
| The ratio of the number of mistakenly detected defects to the number of non-defects |
|
[106]
|
[57]
|
| PA |
|
| Pixel accuracy calculating the overall accuracy of all pixels in the image |
|
[116]
|
[48]
|
[ | 48 | ] |
| 1 |
|
| 3 classes |
|
| Multiple binary CNNs |
|
| Classification |
|
|
| Accuracy: 86.2 |
| Precision: 87.7 |
| Recall: 90.6 |
|
| NA |
|
[48]
|
[36]
|
] |
|
|
mPA |
|
| The average of pixel accuracy for all categories |
|
[116]
|
|
| mIoU |
|
| The ratio of intersection and union between predictions and GTs |
|
[116]
|
[48]
|
| fwIoU |
|
| Frequency-weighted IoU measuring the mean IoU value weighing the pixel frequency of each class |
|
[116]
|
[48]
|
Table 4. Performances of different algorithms in terms of different evaluation metrics.
| ID |
|
| Number of Images |
|
| Algorithm |
|
| Task |
|
| Performance |
| 2 |
|
| 12 classes |
|
| Single CNN |
|
| Classification |
|
| AUROC: 87.1 |
|
| Accuracy |
|
| The proportion of correct prediction in all prediction samples |
|
[48]
|
[36]
|
| [ | 38 | ] |
|
| Robocam 6 (https://goolnk.com/43pdGA accessed on 20 February 2022) |
|
| TAP Electronics |
|
| 600 mm or more |
|
| Sony 130-megapixel Exmor 1/3-inch CMOS |
|
| Korea |
|
| ● High-resolution |
| ● All-in-one subtitle system |
|
| RoboCam Innovation4 |
|
| TAP Electronics |
|
| 600 mm or more |
|
| Sony 130-megapixel Exmor 1/3-inch CMOS |
|
| Korea |
|
| ● Best digital record performance |
| ● Super white LED lighting |
| ● Cableless |
|
| Robocam 30004 |
|
| TAP Electronics’ Japanese subsidiary |
|
| 250–3000 mm |
|
| Sony 1.3-megapixel Exmor CMOS color |
|
| Japan |
|
| ● Can be utilized in huge pipelines |
| ● Optical 10X zoom performance |
|
3.1.2. Benchmarked Dataset
Open-source sewer defect data is necessary for academia to promote fair comparisons in automatic multi-defect classification tasks. In this survey, a publicly available benchmark dataset called Sewer-ML
[125][35] for vision-based defect classification is introduced. The Sewer-ML dataset, acquired from Danish companies, contains 1.3 million images labeled by sewer experts with rich experience.
Figure 5 shows some sample images from the Sewer-ML dataset, and each image includes one or more classes of defects. The recorded text in the image was redacted using a Gaussian blur kernel to protect private information. Besides, the detailed information of the datasets used in recent papers is described in
Table 2. This paper summarizes 32 datasets from different countries in the world, of which the USA has 12 datasets, accounting for the largest proportion. The largest dataset contains 2,202,582 images, whereas the smallest dataset has only 32 images. Since the images were acquired by various types of equipment, the collected images have varied resolutions ranging from 64 × 64 to 4000 × 46,000.
Figure 5. Sample images from the Sewer-ML dataset that has a wide diversity of materials and shapes.
Table 2. Research datasets for sewer defects in recent studies.
| ID |
|
| Defect Type |
|
| Image Resolution |
|
| Equipment |
|
| Number of Images |
|
| Country |
|
| Ref. |
|
|
Dataset 2: barrier, deposit, disjunction, fracture, stagger, water |
|
| 15,000 |
|
| 5 |
|
| Broken, deformation, deposit, other, joint offset, normal, obstacle, water |
|
| 1435 × 1054–296 × 166 |
|
| NA |
|
| 18,333 |
|
| China |
|
[70]
|
[39]
|
| 6 |
|
| Attached deposits, collapse, deformation, displaced joint, infiltration, joint damage, settled deposit |
|
| NA |
|
| NA |
|
| 1045 |
|
| China |
|
[41]
|
[40]
|
| 7 |
|
| Circumferential crack, longitudinal crack, multiple crack |
|
| 320 × 240 |
|
| NA |
|
| 335 |
|
| USA |
|
[11]
|
| 8 |
|
| Debris, joint faulty, joint open, longitudinal, protruding, surface |
|
| NA |
|
| Robo Cam 6 with a 1/3-in. SONY Exmor CMOS camera |
|
| 48,274 |
|
| South Korea |
|
[71]
|
[41]
|
| 9 |
|
|
| 9 |
| Broken, crack, debris, joint faulty, joint open, normal, protruding, surface |
|
| 3 classes |
| 1280 × 720 |
|
|
| SVM |
| Robo Cam 6 with a megapixel Exmor CMOS sensor |
|
| 115,170 |
|
| Classification |
| South Korea |
|
|
| Recall: 90.3 |
| Precision: 90.3 |
|
[52]
|
[42]
|
| 10 FPS |
|
|
| 10 |
|
| Crack, deposit, else, infiltration, joint, root, surface |
|
| NA |
|
| Remote cameras |
|
| 2424 |
|
| F1-score: 96.6 |
| UK |
|
| 15 min 30 images |
|
[66]
|
[43]
|
| [ | 73 | ] |
|
| [51]
|
| 11 |
|
|
| 11 |
| Broken, crack, deposit, fracture, hole, root, tap |
|
| 3 classes |
| NA |
|
| RotBoost and statistical feature vector |
| NA |
|
| 1451 |
|
| Canada |
|
|
| Classification |
|
| Accuracy: 89.96 |
| [104 |
| 1.5 s/image |
| ]
|
[44]
|
| [ | 61 | ] |
|
| [52]
|
| 12 |
|
| Crack, deposit, infiltration, root |
|
|
| 12 |
|
| 7 classes |
| 1440 × 720–320 × 256 |
|
| Neuro-fuzzy classifier |
| RedZone® Solo CCTV crawler |
|
| Classification |
| 3000 |
|
| Accuracy: 91.36 |
| USA |
|
| NA |
|
[98]
|
[45]
|
| [ | 56 | ] |
|
| [53]
|
| 13 |
|
| Connection, fracture, root |
|
| 1507 × 720–720 × 576 |
|
| 13 |
|
| Front facing CCTV cameras |
|
| 3600 |
|
| USA |
|
| 4 classes |
|
| Multi-layer perceptions |
|
[ |
| Classification |
|
| Accuracy: 98.2 |
| 99]
|
[46]
|
| NA |
|
| 14 |
|
| Crack, deposit, root |
|
| 928 × 576–352 × 256 |
|
| NA |
|
| 3000 |
|
| USA |
|
[97]
|
[47]
|
| [ | 54 | ] |
|
| 14 |
|
| 2 classes |
|
| Rule-based classifier |
|
| Classification |
|
| Accuracy: 87 |
| FAR: 18 |
| Recall: 89 |
|
| NA |
|
[57]
|
[55]
|
| 15 |
|
| 15 |
| Crack, deposit, root |
|
| 2 classes |
| 512 × 256 |
|
| OCSVM |
| NA |
|
|
| Classification |
| 1880 |
|
| Accuracy: 75 |
| USA |
|
|
[ |
| NA |
| 116]
|
[48]
|
| [ | 65 | ] |
|
| [58]
|
| 16 |
|
| Crack, infiltration, joint, protruding |
|
| 1073 × 749–296 × 237 |
|
| NA |
|
| 1106 |
|
| China |
|
[122] |
| 16 |
|
| 4 classes |
|
|
[49]
|
| CNN |
|
| Classification |
|
| Recall: 88 |
| Precision: 84 |
| Accuracy: 85 |
|
| NA |
|
[67]
|
[59]
|
| 17 |
|
| Crack, non-crack |
|
| 64 × 64 |
|
| 17 |
|
| 2 class |
|
| Rule-based classifier |
| NA |
|
| Classification |
| 40,810 |
|
| Accuracy: 84 |
| FAR: 21 |
| True accuracy: 95 |
|
| NA |
|
[58]
|
[65]
|
|
| 4000 × 46,000–3168 × 4752 |
|
|
| 18 |
|
| 4 classes |
|
| RBN |
| Canon EOS. Tripods and stabilizers |
|
| Classification |
| 294 |
|
| Accuracy: 95 |
| China |
|
| NA |
|
[73]
|
[51]
|
[ | 11 | ] |
| [ | 59 | ]
|
[64]
|
| 19 |
|
| Collapse, crack, root |
|
| 19 |
| NA |
| 7 classes |
|
|
| YOLOv3 |
| SSET system |
|
| Detection |
| 239 |
|
| mAP: 85.37 |
| USA |
|
|
[61]
|
[52]
|
|
| 33 FPS |
|
| [ | 9]
|
| 20 |
|
| Clean pipe, collapsed pipe, eroded joint, eroded lateral, misaligned joint, perfect joint, perfect lateral |
|
| 20 |
| NA |
|
|
| 4 classes |
|
| Faster R-CNNSSET system |
|
| 500 |
|
| USA |
|
| Detection |
|
[56]
|
| mAP: 83 |
|
| 9 FPS |
[53]
|
|
| [ | 98 | ] |
|
[45] |
| 21 |
|
| Cracks, joint, reduction, spalling |
|
| 512 × 512 |
|
| CCTV or Aqua Zoom camera |
|
| 1096 |
|
| Canada |
|
[54]
|
|
| 21 |
|
| 3 classes |
|
| Faster R-CNN |
|
| Detection |
|
| 22 |
|
| Defective, normal |
|
| NA |
|
| CCTV (Fisheye) |
|
| 192 |
|
| USA |
|
[57]
|
[55]
|
| mAP: 77 |
|
| 110 ms/image |
|
[99]
|
[46]
|
| 23 |
|
| Deposits, normal, root |
|
| 1507 × 720–720 × 576 |
|
| Front-facing CCTV cameras |
|
| 3800 |
|
| USA |
|
[72]
|
[56]
|
| 24 |
|
| Crack, non-crack |
|
| 240 × 320 |
|
| CCTV |
|
| 200 |
|
| South Korea |
|
[106]
|
|
| F1-score |
|
| Harmonic mean of precision and recall |
|
[69]
|
[38]
|
| 10 |
|
| 3 classes |
|
| CNN |
|
| Classification |
|
| Accuracy: 96.7 |
| Precision: 99.8 |
| Recall: 93.6Australia |
|
[109]
|
[50]
|
| 18 |
|
| Crack, normal, spalling |
| 22 |
|
| 3 classes |
|
| Faster R-CNN |
|
| Detection |
|
| Precision: 88.99 |
| Recall: 87.96 |
| F1-score: 88.21 |
|
| 110 ms/image |
|
[97]
|
[47]
|
[ |
| 23 |
|
| 2 classes |
|
| CNN |
|
| Detection |
|
| Accuracy: 96 |
| Precision: 90 |
57]
|
| 0.2782 s/image |
|
| [ | 109 | ]
|
[50]
|
| 25 |
|
| Faulty, normal |
|
| NA |
|
| CCTV |
|
| 8000 |
|
| UK |
|
[65]
|
[58]
|
| 26 |
| 24 |
|
| 3 classes |
|
| Faster R-CNN |
|
| Detection |
|
| mAP: 71.8 |
|
| 110 ms/image |
|
[105]
|
[63]
|
| SSD |
|
| mAP: 69.5 |
|
| 57 ms/image |
|
|
| Blur, deposition, intrusion, obstacle |
|
|
| YOLOv3 |
|
| mAP: 53 |
| NA |
|
| 33 ms/image |
| CCTV |
|
| 27 |
|
| Crack, deposit, displaced joint, ovality |
|
| NA |
|
| CCTV (Fisheye) |
|
| 32 |
|
| Qatar |
|
[ |
| 25 |
|
| 2 classes |
| 103]
|
[60]
|
| 29 |
|
| Crack, non-crack |
|
| 320 × 240–20 × 20 |
|
| CCTV |
|
| 100 |
|
| NA |
|
| FAR |
|
| False alarm rate in all prediction samples |
|
[57]
|
[55] |
| 12,000 |
|
| NA |
|
[67]
|
[59]
|
| Rule-based detector |
|
| Detection |
|
| Detection rate: 89.2 |
| Error rate: 4.44 |
|
| 1 FPS |
|
[106]
|
[57]
|
|
| 26 |
|
| 2 classes |
|
| GA and CNN |
|
| Detection |
|
| Detection rate: 92.3 |
|
|
| NA[100]
|
[61]
|
|
| [ | 100 | ] |
|
[61]
|
| 30 |
|
| Barrier, deposition, distortion, fraction, inserted |
|
| 600 × 480 |
|
| CCTV and quick-view (QV) cameras |
|
| 27 |
|
| 5 classes |
|
| SRPN |
| 10,000 |
| Detection |
|
| mAP: 50.8 |
| Recall: 82.4 |
| China |
|
| 153 ms/image |
|
[110]
|
[62]
|
| [ | 110 | ] |
|
| [62]
|
| 31 |
|
| 28 |
| Fracture |
|
| 1 class |
| NA |
|
| CNN and YOLOv3 |
| CCTV |
|
| 2100 |
|
| USA |
|
| Detection |
|
| AP: 71 |
|
[ |
| 65 ms/image |
| 105]
|
[63]
|
| [ | 108 | ] |
|
| [66]
|
| 32 |
|
| Broken, crack, fracture, joint open |
|
| 29 |
| NA |
|
| 3 classes |
| CCTV |
|
| DilaSeg-CRF |
| 291 |
|
| China |
|
[59]
|
[64]
|
3.2. Evaluation Metric
The studied performances are ambiguous and unreliable if there is no suitable metric. In order to present a comprehensive evaluation, multitudinous methods are proposed in recent studies. Detailed descriptions of different evaluation metrics are explained in
Table 3.
Table 4 presents the performances of the investigated algorithms on different datasets in terms of different metrics.
|
Segmentation |
|
|
PA: 98.69 |
|
mPA: 91.57 |
|
mIoU: 84.85 |
|
fwIoU: 97.47 |
|
|
107 ms/image |
|
|
[ |
116 |
] |
|
|
[ |
48 |
] |
|
| 30 |
|
| 4 classes |
|
| PipeUNet |
|
| Segmentation |
|
| mIoU: 76.37 |
|
| 32 FPS |
|
[122]
|
[49]
|
As shown in
Table 4, accuracy is the most commonly used metric in the classification tasks
[41,48,52,54,56,57,58,61,65,66,67,69,70,71,73][36][38][39][40][41][42][43][51][52][53][54][55][58][59][65]. In addition to this, other subsidiary metrics such as precision
[11,48,67,69,73][11][36][38][51][59], recall
[11,48[11][36][38][51][55][59],
57,67,69,73], and F1-score
[69,73][38][51] are also well supported. Furthermore, AUROC and AUPR are calculated in
[49][37] to measure the classification results, and FAR is used in
[57,58][55][65] to check the false alarm rate in all the predictions. In contrast to classification, mAP is a principal metric for detection tasks
[9,98,99,105,110][9][45][46][62][63]. In another study
[97][47], precision, recall, and F1-score are reported in conjunction to provide a comprehensive estimation for defect detection. Heo et al.
[106][57] assessed the model performance based on the detection rate and the error rate. Kumar and Abraham
[108][66] report the average precision (AP), which is similar to the mAP but for each class. For the segmentation tasks, the mIoU is considered as an important metric that is used in many studies
[116,122][48][49]. Apart from the mIoU, the per-class pixel accuracy (PA), mean pixel accuracy (mPA), and frequency-weighted IoU (fwIoU) are applied to evaluate the segmented results at the pixel level.