Human-Likeness of Artificial Driver Models: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , , , ,

Several applications of artificially modeled drivers, such as autonomous vehicles (AVs) or surrounding traffic in driving simulations, aim to provide not only functional but also human-like behavior. The development of human-like AVs is expected to improve the interaction between AVs and humans in traffic, whereas, in a driving simulation, the objective is to create realistic replicas of real driving scenarios to investigate various research questions under safe and reproducible conditions. There is no unique definition of human-likeness thus various related research areas such as psychology or computer science provide different approaches to quantify, define, or evaluate behavior.

  • traffic data
  • driver models
  • autonomous driving

1. Introduction

There are various applications for artificially modeled road users. In driving simulations, for example, the objective is to recreate a realistic driving scenario including artificially modeled drivers. In Driver-in-the-Loop (DiL) applications, the aim is to achieve a high degree of presence of participants during the experiment by providing realistic interactions between the driver and artificially modeled drivers [1]. Software-in-the-Loop (SiL) applications, on the other hand, require the human-like behavior of surrounding traffic to investigate individual research questions involving interactions between an AV function and other road users. The constantly growing field of autonomous driving also ultimately faces the challenge of replacing the human driver with a complex model that can safely and meaningfully handle the complex task of driving. The development of human-like driving capabilities in AVs is expected to enhance the ability of surrounding drivers to understand and anticipate the behavior of AVs, resulting in more natural interactions [2]. As a result, AVs are required to mimic human-like driving behavior [3]. Following, human-like behavior should be considered in the automated driving design, as mentioned by Hang et al. [4] and in driver models noted by Lindorfer et al. [5]. Due to the complexity of such tasks, most likely there will not be a single perfect solution for all applications, and there is still a long development journey ahead. Therefore, meaningful evaluation strategies are important to determine the capabilities and limitations of developed models.
In urban traffic, driving behavior is highly dependent on the situation, which introduces even more complexity to both modeling and evaluation. For the resulting model behavior, provided by a driver model, the planning or prediction module can be described by the spatiotemporal movement, the trajectory. However, current metrics for assessing the quality of trajectories rarely consider situational context and are often bound to specific ground truth (GT) data for comparison. Common evaluation strategies for trajectory prediction models, for example, usually rely on spatiotemporal distance measures to compare GT and artificially generated trajectories for quantifying model performance [6,7]. Previously published research identified cases in which behavior deviates from the real trajectory but is still plausible [8]. In some cases, for example, the error value was large because the model chose a longer time gap than the individual human in a right-of-way situation. Both trajectories did not lead to a collision and were not critical. Thus, when comparing artificial trajectories to any individual human-driven trajectory, the result may show large error values but be still plausible, and vice versa. This can be remedied by evaluating a trajectory detached from individual behavior using a general metric that provides insight into how the artificial trajectory fits within the range of human behavior in similar situations.

2. Human-Likeness of Artificial Driver Models

Common evaluation strategies can be categorized into objective and subjective approaches. In the area of AV development, most of the objective metrics rely on the direct comparison of modeled driving data to a unique single driving sample using any distance measure [10]. Common metrics for evaluating prediction or planning frameworks employ displacement errors, measured, for example, as the distance between the actual and predicted trajectories [7]. Such metrics indicate how accurately the predicted trajectory matches the individual, human-driven trajectory. However, in the case of larger displacement errors, no conclusions can be drawn as to whether the trajectory was still plausible and only the behavior deviated with regard to safety, for example, or whether the trajectory exhibited functional problems. Therefore, in some individual cases, more sophisticated evaluation strategies are applied, e.g., taking into account functional errors such as road violations [11] or unrealistic headways [12]. To quantify the similarity between driver models and human traffic behavior in driving simulation, macroscopic analyses are performed. With the help of endurance tests, synthetic data are generated and compared with real traffic data in typical highway scenarios, such as cut-in maneuvers [13]. Typical indicators to describe human behavior in related works are average and maximal velocity, frequency and exceeding of speeding [14], acceleration and headway [15,16], as well as Time-to-Collision (TTC) and longitudinal distance [17]. Based on such parameters, the relative validity of the macroscopic behavior of driver models can be determined [18,19]. Such methods compare observed macroscopic parameters of artificial vehicles with a distribution of respective parameters among real vehicles. For comparing distributions, statistical approaches such as Kolmogorov–Smirnov in the study conducted by Wang et al. [20], or Kullback–Leibler divergence as in research by Kuefler et al. [6] are applied.
However, most approaches focus on highway traffic and do not consider contextual influences, which in turn raises doubts about the applicability of such methods to more complex urban traffic since driving behavior is affected by various external influences [21]. Subjective approaches, on the other hand, measure human-likeness using questionnaires, interviews, or surveys that automatically consider behavior within an individual context. The underlying assumption of such methods is that either what is perceived as real or can not be distinguished from artificial behavior defines human-likeness. Y. Zhang et al., for example, adapted the Turing test and asked participants to classify the driving behavior of another vehicle into either artificial or human-driven [22]. Similarly, Dumbuya et al. asked subjects to rate how realistic they perceived a drive completed by different driver models and how likely it was that the drive was conducted by a real human driver [23]. Further research investigates the human-likeness of driver models and the extent to which the perceived realism of a VE is affected by the behavior [1,24]. Since subjective methods always require an experiment involving participants, such methods are not suitable for an iterative development process due to the effort associated with the evaluation.
The demand for mobility in urban environments is increasing due to trends such as urbanization and thus driver models for urban traffic are gaining in importance. Since meaningful evaluation strategies are the indispensable basis for future developments, current research aims to overcome the aforementioned challenges and addresses the question of how to quantify the human similarity of artificial models' behavior in urban environments.

This entry is adapted from the peer-reviewed paper 10.3390/app131810218

This entry is offline, you can click here to edit this entry!
Video Production Service