Moreover, overcoming the difficulties of increasing traffic and micro-mobility solution proposals, which are released as a last-mile delivery, have created new challenges in traffic flow. For example, e-scooter users have many wrong driving profiles, such as lousy parking, using pedestrian roads, and bad lane switching that will endanger traffic flow. To summarize, it is evident that disabled people, pedestrians, and autonomous vehicles use the same traffic flow. Therefore, solutions to some of the problems this union will bring should be proposed. The United Nations General Assembly has committed to halving global deaths and injuries from road traffic accidents by 2030 [
2]. In line with this goal, researchers have to act to find solutions to problematic areas in traffic networks and adaptation to current and future technologies.
2. Automatic Detection of Pedestrian Crosswalk with Faster R-CNN and YOLOv7
Object detection techniques and object detection using deep neural networks are developing fields [
8].
Crossing a street is not only difficult for visually impaired individuals but also has some difficulties for other users. Nevertheless, partial solutions to this problem can be offered with effective pedestrian crosswalk detection. Brief information about some of the studies in the literature is given below.
Se [
9] first grouped crosswalk lines and proposed a crosswalk detection method using escape point constraint. However, the high computational complexity of this proposed model highly affects the efficiency of use. Huang and Lin [
10] identified areas with alternating black and white stripes using bipolarity segmentation and achieved a good performance in standard lane zones. With this approach, they detected similar zebra crossings in [
11,
12]. Chen et al. [
13] proposed a crosswalk detection method based on Sobel edge extraction and Hough Transform. This approach provides a good balance between model accuracy and speed. Cao et al. [
14] achieved high accuracy in recognizing disabled roads and pedestrian crosswalks. Moreover, their study was planned to help visually impaired people facilitate orientation and perceive the environment. A lightweight semantic segmentation network, which was used to segment both pedestrian and disabled paths, combined with depthwise separable convolution, was used as a basic module to reduce the number of parameters of the model and increase the speed of semantic partitioning. In addition, an atrous spatial pyramid pooling module was used to improve the network accuracy. Finally, a dataset was collected from a natural environment to verify the effectiveness of the proposed method. As a result, it was observed that the proposed approach gives better or similar results when compared to other methods.
Ma et al. [
15] emphasized the difficulties of people with disabilities in obtaining external information and provided some suggestions for overcoming these difficulties. Their main goal was to investigate the effectiveness of tactile paving at pedestrian crosswalks. Data were collected using unmanned aerial vehicles (UAV), such as drones and a three-axis accelerometer. A before–after comparative analysis of the quantitative index results revealed that the tactile coating helps people with visual impairments maintain a straight passageway, avoid directional deviations, reduce transit times, and improve gait patterns and symmetry.
Romić et al. [
16] proposed a method based on an image column and row structure analysis to detect pedestrian crossings to facilitate crossing. The technique was also tested with real data input, and it was found that its performance depends on the image quality and resolution of the dataset.
Tian et al. [
17] proposed a new system for understanding dynamic crosswalk scenes; detecting key objects, such as pedestrian crosswalks, vehicles, and pedestrians; and defining pedestrian traffic light status. The proposed system that was implemented on a device worn on the head of a person, which transmits scene information to disabled individuals via an audio signal, was proven to be beneficial.
Tümen and Ergen [
18] emphasized that crossroads, intersections, and pedestrian crosswalks are essential areas for autonomous vehicles and advanced driver assistance systems because the probability of traffic accidents in these areas is relatively high. In this context, a deep learning-based approach over real images was proposed to provide instant information to drivers and autonomous vehicles. This approach used CNN-based VggNet, AlexNet, and LeNet to classify the data. As a result, high classification accuracy was achieved, and it was shown that the proposed method is a practical structure that can be used in many areas.
Dow et al. [
19] designed an image processing-based human recognition system for a pedestrian crossing. The system aims to reduce accidents and accident possibilities and to increase the level of safety. In order to improve the accuracy of pedestrian detection and reduce the system’s error rate, a dual-camera mechanism was proposed. The experimental results showed that the obtained prototype system performs well.
Pedestrian crosswalks are an essential component of urban transportation, mainly because these road sections are the regions where pedestrian and vehicle accidents occur more frequently. It is an undeniable fact that developing countries experience many problems in these regions [
20]. Many parameters, such as driver–pedestrian behavior profile, mutual respect, low tendency to obey the rules, penalties, and flexibility in the rules, are shown as the reasons for these problems.
In addition to the detection of pedestrian crosswalks, object detection processes are also applied in many fields nowadays. Object detection plays an active role in sectors such as health [
21], safety [
22], transportation [
23,
24], and agriculture [
25,
26]. Many researchers enlarge their dataset to achieve high detection accuracy and experiment with differences in the network structure of their detection model. In fact, studies continue for the detection of two-dimensional materials [
27]. A few of the existing studies in different fields are detailed in this section and an in-depth analysis framework is presented.
Researchers such as [
28,
29,
30,
31,
32] have carried out many studies to detect helmets used for dangerous works, such as construction. SSD, Faster R-CNN, and other YOLO models were used in these studies. Among these studies, [
28] had the highest mAP value of 96 percent. Even in recent years, object detection in the agricultural and livestock sector provides great ease of work. Fast counting of animals on a farm and detection of dead animals increase efficiency. Different detection models were used in duck counting by [
33]. The YOLOv7 achieved a better detection rate than other models, with a 97.57 percent mAP value.
Detection of weeds and product counting in the agricultural sector provide great opportunities in the marketing part. In a study conducted by [
25], the YOLOv7 model reached a mAP value of 61 percent during the detection of weeds in a field. This value might vary depending on the dataset and the detected object. Since weeds are small objects and a training dataset is difficult to collect, detection accuracy decreases.
The works on driver-assistance systems continue. Using the phone and drinking beverages, which cause distraction while driving, cause accidents and endanger traffic safety. In a study conducted by [
34], YOLOv7 was used to detect driving distraction behaviors. Four different detection results were obtained, including danger, drinking, phone usage, and safety. A mAP value of 73.62 percent was obtained. Creating a dataset by obtaining four different driver states from different drivers is a difficult and time-consuming process. It is clear that the accuracy rate will increase if the variety of data is increased.
As the use of object detection progresses as a sector, it offers very useful solution suggestions. An early smoke warning system for fires that cause ecological problems will protect forests. In order to prevent the spread of fires, a study was carried out for a smoke warning system. The dataset was defined with three different distance scales. With YOLOv5x, an accuracy rate of 96.8 percent was achieved. Despite the irregularity of the smoke distribution data, a high accuracy rate was obtained [
35].
As stated above, object detection continues to be a solution to some existing problems. Different object detection studies, such as cancer polyp detection [
21], internal canthus temperature detection in the elderly [
36], construction waste detection [
37], in situ sea cucumber detection [
38], ship detection in satellite images [
39], and citrus orchard detection [
40], are available in the literature.
All the studies examined showed that the accuracy rate is directly related to the dataset used. The dimensional properties of a detected object, its state in the image, and its variability reduce the detection accuracy. In order to eliminate this situation, the number and diversity of data should be increased. However, this process is quite difficult. In addition, the most suitable model for the dataset should be selected by using different detection models. In this study, the dimensional properties of the detected object are large. Thus, the accuracy rate is high. In particular, local municipalities should use object detection applications more frequently. The use of these models, most of which are open source, especially in urban transportation applications, would be correct within the scope of smart cities. These systems, which are quick solutions to problems, such as parking areas, wrong parking violation penalties, and red light violations, should be supported by politicians.