Rural Single-Vehicle Crashes: Comparison
Please note this is a comparison between Version 1 by Zhenggan Cai and Version 2 by Conner Chen.

The impact of traffic crashes on sustainable development cannot be ignored because accidents will cause significant property damage and personal injury. Among various crash types, single-vehicle (SV) crashes account for a high fatality rate; according to the National Highway Traffic Safety Administration, the number of SV crashes and fatalities accounted for 16% and 36.9% of the respective totals for all crashes. The impact of SV crashes on road safety is daunting. This phenomenon is particularly obvious in rural areas of China because road infrastructure and medical assistance in these areas are worrisome.

  • traffic safety
  • latent class clustering
  • spatial correlation
  • single-vehicle crashes
  • heterogeneity

1. Safety Covariates of Rural Single-Vehicle Crashes

Various risk factors will affect the severity of rural SV crashes, and corresponding research was conducted to clarify the relationship between risk factors and crash severity [1][2][8,9]. According to the characteristics of risk factors, three components can be roughly identified: driver characteristics, crash characteristics, and environmental characteristics.
There is a significant correlation between driver characteristics and rural SV crash severity, which has been widely recognized by transportation professionals. Several driver-related risk factors, such as gender, age, seatbelt usage, drunk driving, and speeding, were investigated in the existing literature. For example, a positive correlation between male driver and serious crashes was revealed [3][6]. This phenomenon is related to the aggressive driving behavior of male drivers. The study also noted that older and younger drivers were significantly more likely to have severe SV crashes than middle-aged drivers. Older and younger drivers are susceptible to distracted driving [4][19]. The impact of distracted driving on crash risk was analyzed, and a positive correlation between distracted driving and crash risk was found [5][20]. Further, the importance of using seatbelts and maintaining a safe speed has been widely emphasized, as they can effectively prevent drivers from being involved in fatal crashes [6][7][14,21]. A latent class logit function was established, and three risk variables—seatbelt not used, driving under the influence of alcohol, and fatigue driving—are expected to result in a significant increase in the probability of severe crashes [8][22].
Crash characteristics mainly include two components: crash type and vehicle type. Specifically, three crash types—collision with fixed object, run-off-road crashes, and collision with animals—have an important impact on the severity of SV crashes [9][23]. It was found that these variables are negatively correlated with crash severity. However, this finding could not be supported by Haq et al. [10][24]. A hierarchical Bayesian random intercept approach was established to link collision with fixed object and crash severity. The regression results show that (1) collision with fixed object will cause a significant increase in the possibility of serious crashes and (2) one of the serious consequences of collision with fixed object is rollover, which will increase the risk of serious crashes in some extent [11][25].
In addition, existing studies have found that the severity of rural SV crashes varies across vehicle types. Three vehicle types—passenger vehicle, motorcycle, and large vehicle—were classified, and the large vehicle was used as a reference variable to establish an ordered logistic regression. The results indicate that compared with large-vehicle crashes, the risk of serious crashes caused by passenger vehicle reduced significantly (odds ratio was 0.553) and caused by motorcycle increased significantly (odds ratio was 3.907) [12][26]. Considering the substantial differences between different vehicle types, cross-vehicle-types modeling was recommended. For example, a crash severity regression function related to motorcycle and truck crashes was established, respectively, to explore the risk factors [13][14][27,28]. Establishing a regression model across vehicle types can obtain targeted findings and improve the model fit; this may be due to the fact that a crash dataset containing a specific vehicle type has more homogeneity compared with a crash dataset containing all vehicle types [15][29].
Many significant risk variables are included in the environmental characteristics [16][30], such as road surface, weather, light condition, and intersection, etc. Previous studies have found that there is a negative correlation between adverse road surface conditions and crash severity [9][23]. It indicates that the reduction in crash severity under adverse conditions is due to drivers being more cautious. A more detailed finding related to adverse road surface was drawn by Yu et al. [17][10]. Compared with dry road surface, the probability of fatal crashes caused by wet, snow, and ice is reduced by 2.73%, 0.85%, and 2.78%, respectively. A severity prediction function was established by Hou et al. [18][31] to investigate the influence of environmental factors on crash severity. It was found that the probability of severe injury is expected to be significantly reduced under the conditions of visibility less than 200 m and wet road surface. Meanwhile, both roads with intermediate barriers and dark unlighted conditions will cause a significant increase in the tendency of serious crashes. Further, illumination conditions were divided into five categories to establish a logistic regression. Compared with the daylight condition, the probability of serious collision under dusk, dawn, dark lighted roadway, and dark unlighted roadway conditions increased by 1.57 times, 2.02 times, 1.63 times, and 2.61 times, respectively [19][32]. This shows that the possibility of serious injury in the early morning is of the most concern, and the provision of street lighting can reduce the occurrence of serious traffic collisions.

2. Statistical Techniques for Unobserved Heterogeneity and Spatial Correlation

Unobserved heterogeneity has been recognized as a primary issue in crash severity analysis [20][21][22][33,34,35] because several risk factors affecting the possibility or severity of the crash cannot be observed [23][24][36,37]. Such fact has driven the development of statistical techniques to identify the unobserved heterogeneity. To accommodate the discrete nature of crash severity (no injury, slight injury, serious injury, and fatality), various regression approaches—random parameters logit (RP-logit) model [25][26][38,39], random parameters probit model [27][40], random intercept logit model [28][41], latent class logit model [17][10], and finite mixture random parameters model [29][30][16,42]—have been widely recommended due to their high flexibility [31][32][33][43,44,45]. Alternatively, random parameters ordered logit model [34][46] and random parameters ordered probit model [35][47] were applied to handle the intuitive ordering of crash severity. For example, Wu et al. [1][8] established the RP-logit model to analyze the risk factors of single- and multi-vehicle crash severity on rural highways. The results show that the RP-logit model can accommodate the unobserved heterogeneity satisfactorily, and some substantial differences in the risk factors between urban and rural were clarified.
Rural SV crash data in the United States were extracted to calibrate the RP-logit model and the latent class logit model. The regression results indicate that both of these statistical models can effectively identify risk factors, but the latent class logit model has a slightly better fit performance than the RP-logit model [36][5]. Similar findings were obtained by Cerwick et al. [37][48]. A hierarchical random intercept function was proposed to accommodate cross-level interaction in crash data. The regression results indicate that the fitting performance of the proposed function is comparable to the RP-logit model, but its generalization is expected to be limited by high complexity [38][49]. Meanwhile, a random effect tobit model and a random parameters tobit model were established to accommodate the unobserved heterogeneity in crash rate analysis. The importance of capturing the unobserved heterogeneity and the superiority of the random parameters function were confirmed [39][50].
In addition, most crash severity functions were developed using a whole-dataset; however, this modeling approach has some drawbacks. The crash analysis model can identify unobserved heterogeneity, but it cannot reduce them. In addition, extensive unobserved heterogeneity may cause the regression results to deviate from the real situation [21][40][34,51]. If the whole-dataset is divided into several sub-datasets, called clustering, and both the homogeneity effect within the sub-dataset and the heterogeneity effect between the sub-datasets are maximized, then the severity prediction functions are established for each sub-dataset. This two-step modeling approach is expected to mitigate the impact of unobserved heterogeneity on model estimation.
Currently, the two-step analysis approach is receiving increasing attention. A latent class clustering (LCC) model and an RP-logit model were combined to investigate motorcycle crash severity. It was found that the LCC model is an efficient crash data clustering technology and can effectively reduce the unobserved heterogeneity in the sub-dataset [41][52]. Based on crash data from New Mexico, Li et al. [42][43][53,54] modeled not only intersection-related crashes but also SV crashes and found that the two-step approach can improve the fitting performance of the severity prediction function. The severities of rain-related rural SV crashes and head-on crashes were investigated respectively, and both of them suggested that the two-step analysis approach can reveal more valuable risk factors [36][44][5,55]. A similar conclusion can be found in Yu et al. [17][10].
Further, the severity prediction functions mentioned above can not only provide valuable risk factors but also effectively identify unobserved heterogeneity in crash data. However, these functions can hardly identify spatial correlation across crashes. The existence of spatial correlation is reasonable. Some characteristics, such as road width and road surface conditions, may be shared in adjacent crashes. This phenomenon has been confirmed by much research [45][46][47][56,57,58]. Yang et al. [48][59] considered the spatial correlation of crashes in road safety assessment; the study focused on investigating the interrelationship among crash frequencies. The results show that the model simultaneously considering both spatial correlation and unobserved heterogeneity outperforms the model considering only the unobserved heterogeneity. Risk factors of freeway crashes were explored using a spatial generalized ordered logit model. It was found that the model fit can be improved by accounting for spatial effect and the spatial error term can be effectively estimated by introducing the Gaussian conditional autoregressive (CAR) technique [49][60]. Klassen et al. [50][61] obtained the same conclusion by proposing a spatial random intercept logit model and pointed out that ignoring spatial correlation may lead to biased inferences. To examine pedestrian injury severity in bicyclist–pedestrian crashes, a Gaussian CAR spatial Poisson-lognormal model was adopted. The regression results highlight the effectiveness of jointly modeling multiple crash severities to improve fit performance [51][62].
The spatial functions mentioned above are evolved from traditional regression models by considering a spatial error term, including the multinomial logit model, ordered logit model, random intercept logit model, etc. The spatial function evolved from random parameters models is less common. Several studies have proved that the random parameters model outperforms the fixed-parameter model and the random intercept model. Identifying the spatial correlation by extending the random parameters model is expected to exhibit satisfactory fitting performance. Recently, a spatial random parameters Poisson-lognormal model, a spatial random parameters tobit model, and a spatial random parameters Poisson-lognormal model with a mixture component were proposed to capture the spatial correlation across crashes [52][53][54][63,64,65]. These three advanced crash prediction functions are employed to predict crash count, crash rate, and crash frequency, respectively. However, no spatial function evolved from the random parameters model, especially the RP-logit model, for predicting crash severity was found. This is a methodology gap and corresponding studies should be conducted to supplement existing severity prediction functions.