Trajectory Prediction of Vehicle–Pedestrian and Vehicle–Pedestrian Interactions: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

The conflict between pedestrians and vehicles is an important safety issue, not only in the USA but everywhere in the world. This issue is even worse in developing countries. Road accidents claim over 1.3 million lives annually, which translates to more than two lives lost every minute. Shockingly, around ninety percent of these tragedies happen in countries with limited resources. 

  • trajectory prediction
  • pedestrian behavior prediction
  • vehicle–pedestrian interaction
  • autonomous vehicle
  • connected vehicle

1. Pedestrian Trajectory Prediction Methods

Over the last few years, numerous techniques and algorithms have surfaced for predicting pedestrian trajectories, owing to their importance in creating a secure environment for autonomous vehicles and other applications. The research on this topic can be broadly classified into three groups [36,37]:
  • Physics-based models.
  • Planning-based models.
  • Pattern-based models.

1.1. Physics-Based Models

Physics-based models leverage motion properties such as speed and location to predict future movements by applying physical laws. For example, Kim et al. utilized a Kalman filter and machine learning-based approach that used velocity-space reasoning to compute the desired velocity of pedestrians, which achieved good performance [38]. Zanlungo et al. proposed a social force-based model that predicts pedestrian locations while modeling walking behaviors using the social force paradigm and physical constraints. However, the model’s performance tended to suffer when pedestrian density was low [39]. A. Martinelli et al. proposed a pedestrian dead-reckoning method that relies on step-length estimation [40]. Using classifications of walking behavior, an individual’s step length is estimated and used to infer their position. Similarly, W. Kang et al. demonstrated a smartphone-based method for pedestrian position inference that uses step-length estimation-based inference. The authors found that the method was effective in indoor environments but accrued errors over long distances [41]. Additionally, Gao et al. developed a probabilistic method for indoor position estimation that relies on Wi-Fi signal fingerprints and smartphone signals, improving accuracy and overcoming signal changes [42]. However, most physics-based models rely on manually specified parameters or rules, which limits their application to scenarios such as predicting trajectories in a closed space. In contrast, our proposed model (HSTGA) learns trajectory patterns from historical trajectory profiles without relying on manually specified parameter values.

1.2. Planning-Based Models

In the realm of pedestrian trajectory prediction, planning-based models are typically geared toward reaching a specific destination. Ziebart et al. [43] devised a planning-based model that incorporates a distribution of destinations and utilized a Markov decision process to plan and predict trajectories. Their model outperformed a variable-length Markov model in predicting 3-second trajectories [44]. Deo and Trivedi implemented a probabilistic framework called the variational Gaussian mixture model (VGMM) [45] that utilizes trajectory clustering to predict pedestrian paths. Their model outperformed a monolithic VGMM. Rehder et al. utilized deep neural networks in their planning-based approach, inferring a mixture density function for possible destinations to conduct goal-directed planning [46]. However, this method may not perform well in long-term horizon predictions. Dendorfer et al. proposed a two-phase strategy called goal-GAN, which estimates goals and generates predicted trajectories [47]. Yao et al. improved the performance of their model by using a bidirectional multi-modal setting to condition pedestrian trajectory prediction on goal estimation [48]. Tran et al. separated their model into two sub-processes: a goal process and a movement process, enabling good performance in long-term trajectory prediction [49]. However, these models’ reliance on guessing a pedestrian’s future goals may hinder their performance in longer-horizon predictions, unlike our proposed model, which does not speculate about future goals or destinations, thus improving prediction accuracy and generalization ability.

1.3. Pattern-Based Models

In recent years, pattern-based models have gained popularity thanks to advances in deep learning. Most studies have focused on creating modules to learn about the social features and interactions among pedestrians, which directly contribute to individuals’ movements. One notable model is the social LSTM, proposed by Alahi et al., which can predict human trajectories in crowded spaces with high accuracy [50]. It captures social interactions using a social pooling strategy to identify patterns, and it assumes that interactions among pedestrians can be captured with pooling layers in the model’s architecture. In a comparable manner, the authors of [51] implemented a distinct scaling technique to apprehend the impact of the surroundings on a particular pedestrian. Another model, social GAN, was introduced by Gupta et al., which uses generative adversarial networks (GAN) to learn about interaction patterns among pedestrians and predict their trajectories [52]. This model predicts multiple possible future trajectories and chooses the best one. Zhang et al. proposed the state refinement module, SR-LSTM, to decode implicit social behaviors among pedestrians [53], whereas Zhao et al. proposed the multi-agent tensor fusion model (MATF) to identify social and interactive relationships by aligning spatial encoding with agent encoding [54]. The multi-agent fusion model (MATF) synchronizes the spatial encoding of scenes with the encoding of each agent present within the scene and then utilizes a GAN model to acquire knowledge of patterns and make predictions. Nikhil and Morris also presented a CNN-based model that is computationally efficient and enables fast parallel processing, achieving competitive performance [55]. Huang et al. extended the temporal correlation concept to produce more socially plausible trajectories [56]. Xu et al. devised a cutting-edge methodology based on deep neural networks that harnesses the intricate nature of social behaviors to anticipate pedestrian movements [57]. The researchers deftly employ encoding schemes to distinguish the varying degrees of influence exerted by different social interactions on the trajectories of passersby. Song et al. devised a complex LSTM network that uses deep convolutional techniques [58]. The algorithm utilizes tensors to represent environmental features and incorporates a specially designed convolutional LSTM to predict sequences of trajectories. Quan et al. introduced an innovative perspective in trajectory forecasting using a model based on Long Short-Term Memory (LSTM) [59]. Their approach features a distinctive LSTM mechanism that accurately identifies pedestrians’ intentions and generates corresponding trajectory predictions. Existing models require information from all pedestrians in the scene but do not consider the impact of surrounding vehicles and the interaction between these vehicles and pedestrians on pedestrian trajectory prediction. Our approach considers these factors and uses minimal information and a decentralized method, only utilizing the pedestrian’s trajectory profile for whom the prediction is being made. The model assumes that all other factors affecting the pedestrian’s movement are unknown or uncertain, and it learns to adapt accordingly. This decentralized approach ensures that our model can provide high-quality predictions in various environments, not just crowded spaces, making it an ideal choice for practical pedestrian safety applications.

2. Vehicle–Pedestrian Interaction

Vehicle–pedestrian interactions present a critical concern in urban environments and transportation research. In the urbanization era, the safety of pedestrians has become a pressing matter. Academic studies have delved into various aspects of this complex dynamic, investigating pedestrian behavior, driver awareness, and the impact of built environments on interaction patterns. Scientists have utilized advanced approaches, such as observational investigations, simulation techniques, and data-centric analyses, to untangle the complexities of these interactions. The various findings have emphasized the significance of certain factors, such as pedestrian visibility, crossing behavior, and driver response times, in determining the safety outcomes of such encounters. Understanding these interactions is instrumental in devising effective strategies to minimize collisions and enhance pedestrian safety in our cities. As autonomous vehicles become more prevalent, ensuring seamless and safe interactions between autonomous vehicles and pedestrians assumes paramount importance. Scholars have investigated the challenges inherent in developing algorithms that can accurately predict pedestrian behavior and adapt to the dynamic nature of urban environments. The integration of cutting-edge sensor technologies, such as LiDAR and computer vision, has endowed autonomous vehicles with enhanced perception capabilities, enabling them to adeptly discern their surroundings and anticipate pedestrian actions. However, the intricacies of pedestrian behavior and the diversity of pedestrian actions continue to pose significant obstacles. Researchers have sought to address these challenges by employing machine learning techniques and reinforcement learning algorithms to enhance pedestrian detection, recognition, and trajectory prediction. The future of autonomous vehicle–pedestrian interaction rests on the successful integration of advanced AI technologies and comprehensive research insights to ensure a safer and more efficient transportation landscape. The coexistence of a dynamic vehicular entity in proximity to a pedestrian has been demonstrated to exert a substantial influence on pedestrian dynamics. Consequently, it constitutes a critical consideration in the process of pedestrian trajectory modeling and prediction [26,60]. The vehicle–pedestrian interaction has been the subject of diverse modeling approaches in the extant literature, contingent upon the employed trajectory generation model, which may encompass expert-driven or data-informed methodologies [60]. From a holistic standpoint, the interaction effects between vehicles and pedestrians can be classified into two main categories: explicit and implicit modeling.

2.1. Explicit Interaction Modeling

In explicit interaction modeling approaches, the influence of a vehicle on a pedestrian’s dynamics is directly incorporated through explicit terms within the formulation of the pedestrian’s movement [61,62,63]. An illustration of this can be observed in the utilization of explicit forces, as presented in the social force model, where the vehicle’s effect on the pedestrian’s trajectory is explicitly represented [64,65]. The authors of [66] categorized explicit modeling approaches into four methods, namely repulsive forces, the social force model (SFM) with other collision-avoidance strategies, direct coupling of motions, and other methods.
In the repulsive forces method, the original social force model (SFM) was proposed by Helbing and Molnar [67]. The focus in the original model was on pedestrians’ social interactions. However, subsequent work has extended this model to incorporate pedestrian–vehicle interactions [68,69]. These extensions propose additional forces to account for such interactions. In these extended models, each vehicle imposes a distancing effect on pedestrians, considering their relative proximity and direction. The impact of the relative interaction distance is encompassed in what is commonly referred to as the decaying function [70]. Typically, this function is chosen as an exponential decay based on the distance [71,72]. An additional component incorporated into certain formulations of social force models (SFM) is the anisotropy function [70,73]. This function accounts for the impact of various interacting directions on the strength of the repulsive force. As an example, the model considers that a pedestrian approaching a vehicle will experience a greater impact than another pedestrian moving away from the vehicle [70,73]. Certain works have employed circular representation for vehicles, similar to the modeling of pedestrians in SFM, but with a notably increased radius [64]. Different models have been proposed to account for the danger zone around a vehicle and the interaction force experienced by pedestrians. Some models use an ellipse with one focus at the rear of the vehicle and the other extended depending on the vehicle’s speed [71]. Other models use a fixed ellipse or a rectangular shape contour to enclose the vehicle, with the magnitude of the repulsive force adjusted based on the distance and orientation of the pedestrian [70].
The second method in explicit modeling is the social force model in combination with other collision-avoidance methods [61,64,72,74]. In this approach, the SFM is combined with other collision-avoidance strategies to handle potential collisions and conflicts. In [61,72], a long-range collision-avoidance method was proposed to predict conflicts by projecting the pedestrian’s shadow and calculating the minimum speed and direction change to avoid a collision. In [74], the authors presented a force that is defined to keep the pedestrian in a safe zone by modeling their tendency to walk parallel to the vehicle. In [64], a decision model based on the time-to-collision parameter was used alongside the SFM to determine actions for different types of interactions with a vehicle. The capability of the SFM to seamlessly link perception with action was effectively applied in [62,75,76] to address straightforward reactive interactions. Nonetheless, to tackle more intricate interactions involving decision making among multiple alternative actions, an additional game-theoretic layer was introduced above the SFM.
The third method in explicit modeling is the direct coupling of motions approach. Modeling the interactions can involve coupling the motion equations of both agents, taking into direct account the impact of an agent’s actions on the motion decisions of the other. Zhang et al. utilized a constant turn rate and velocity model (CTRV) to represent the vehicle’s motion [77]. In this proposed method, a correlation between the state of the pedestrian and the coordinate system of the ego vehicle was created. Additionally, alternative approaches exist that explicitly consider the vehicle’s influence on pedestrians’ future states. In [78], the pedestrian’s speed and direction are selected at each time step to ensure a collision-free trajectory when their paths intersect with the vehicle. In [79], the impact of the vehicle on the pedestrian’s velocity is considered by incorporating an assessment of the collision risk. In [64,65], Time to Collision (TTC) was applied along with the social force model to track vehicle–pedestrian interactions. In [80], a factor of collaboration pertaining to pedestrians was introduced. This factor stands as a manifest interaction component delineating the relationship between a pedestrian and a nearby vehicle.

2.2. Implicit Interaction Modeling

Conversely, the implicit interaction modeling approach leverages the vehicle’s trajectory as an additional input to the model along with the target pedestrian’s trajectory data [30,81]. These models are usually trained on real-world scenario datasets, which helps the models learn vehicle–pedestrian interactions from these scenarios. Various approaches have been suggested for integrating the trajectories of distinct agents within the interaction module. These approaches encompass techniques like pooling mechanisms or utilizing graph neural networks. Some papers that focus on predicting the trajectory of a single pedestrian from the egocentric view of a moving vehicle try to account for the interaction between the pedestrian and the ego vehicle, using some moving features from the vehicle in the data-driven prediction model [59,82,83,84,85]. The interaction formulation in each of these three models is discussed in the following subsections. Based on the literature [59,66,82,83,85], the implicit modeling of vehicle–pedestrian interaction can be divided into three models, namely the pooling model, graph neural network model, and ego vehicle–pedestrian interaction model.
  • Pooling Models
In [81,86,87], an occupancy grid map is constructed using the target vehicle’s or pedestrian’s position as its center. This map is then employed to aggregate the hidden states of all adjacent agents. Within these occupancy maps, the concealed state of all agents situated within the same grid cell is aggregated. This process constructs a tensor that encapsulates data regarding all collaborative agents capable of influencing the forthcoming trajectory of the pedestrian under consideration. Subsequently, this tensor is employed in conjunction with the spatial latent state of the target agent as the primary input for the LSTM network utilized in the trajectory prediction process. In [86], Cheng et al. introduced a circular polarization occupancy representation. This method utilizes the orientation and distance of the agents relative to the target pedestrian to define the cells that are considered occupied. In [88], a comprehensive iteration of these spatial feature maps was proposed. This is accomplished by partitioning the bird’s-eye view of the scenario into distinct grid cells. Within this map, the feature representation of each agent is seamlessly incorporated into a tensor, which accounts for the precise agent placement. Subsequently, the two-dimensional tensor at each sequential time instant is channeled into a convolutional neural network (CNN) architecture. Concurrently, a distinct LSTM architecture is employed to examine the temporal interdependencies among these spatial maps as they evolve over time. In [27], a dual-map approach was proposed for each agent, involving horizon and neighbor maps that encompass prioritized interactions and neighboring agents’ embedding, respectively. These maps are processed using convolutional neural networks and their outputs are combined with the target agent’s embedding to predict the agent’s future trajectory [27,28].
B.
Graph Neural Network Model
In graph neural networks, spatial edges model the interaction between agents and their effect on future positions, using message-passing and attention mechanisms to encode the importance of connected edges. The act of extracting information from interconnected nodes in a graph and using it to enhance the representation of the node is known as message propagation. This approach finds application in defining the influence of interacting entities on a target pedestrian’s dynamics within graph neural networks (GNNs). Usually, these frameworks employ an attention mechanism to capture the proportional importance of connected edges concerning the specific agent of interest. In [30,89,90], a widely accepted criterion was introduced centered on spatial separation. This criterion entails establishing a link between two agents in a graph, defined as a spatial edge, when their proximity reaches a specified threshold distance. Although certain articles employ a set criterion to determine connected edges, others opt to initiate with a completely connected graph [91]. In simpler terms, this entails considering all agents present within the scene. In [92], a reinforcement learning approach was used to investigate the existence of these edges between agents. Actions entail switching the state of each edge on or off, while rewards are based on the overall accuracy of trajectory predictions associated with the particular graph link. Several studies have employed directed graphs instead of undirected versions to address interaction asymmetry [89,91,93,94,95]. The authors of [89] employed encoded interactions in a graph-based context to predict the short-term intentions of agents using a probability distribution function. Then, this predicted intention, in conjunction with the inherent graph arrangement, facilitates the future trajectory for individual agents. Several scholars have employed the graph convolutional network (GCN), applying it directly to graphs. They formulate an adjacency matrix to represent connections within the graph, where the matrix’s weights reflect the reciprocals of agents’ relative speeds or distances [30,94,96,97]. Other researchers have proposed alternative GNN techniques that utilize recurrent neural networks (RNNs), such as LSTMs, to capture the time-evolving characteristics of the edges within the graph [93,98,99].
C.
Ego Vehicle–Pedestrian Interaction Model
Typically, these interactions are represented by incorporating certain attributes of the ego vehicle’s movement along with the positional sequences of the pedestrian. One common attribute employed for this purpose is the speed of the ego vehicle, which can significantly influence the choices and movement of the pedestrian engaged in the interaction. In [59,82,83,84,85], the speed of the ego vehicle was employed to anticipate the subsequent actions of the pedestrian within the camera’s image. Certain proposals have arisen that advocate the utilization of a separate network to forecast the future speed of the ego vehicle. This projected speed can then be employed in predicting the trajectories of pedestrians [82,83]. Additional studies have incorporated elements such as the pedestrian’s relative distance from the ego vehicle [100] or the geographical coordinates of the host vehicle’s location [101] in combination with the motion attributes of other pedestrians. Kim et al. extended this approach by incorporating the pedestrian’s viewpoint [102]. They considered interaction aspects such as the relative positioning of the pedestrian and the vehicle, the orientation of the pedestrian’s head in relation to the vehicle, and the speed of the vehicle. Nonetheless, observing the scenario through the view of an ego vehicle entails that the motion sequences of all pedestrians discussed in the aforementioned works are in relation to the relative positions. Hence, incorporating vehicle attributes as an additional input to the model serves as a method for compensating for the influences of a moving frame, rather than exclusively a factor related to interactions within the model.
In brief, the modeling of interactions between vehicles and pedestrians is typically an intricate undertaking, and this intricacy is amplified in road settings lacking well-defined lanes, crosswalks, and strict traffic protocols [103,104]. In [105], the authors found that there are substantial differences in pedestrian movement patterns between structured and unstructured roads [105]. Limited research has been conducted on the interaction between pedestrians and vehicles in trajectory prediction on unstructured roads. Previous works have mostly focused on social interactions among pedestrians [50,52,106] and interactions with the environment [51,107,108]. However, the interaction between pedestrians and vehicles is an equally important factor that needs to be considered. Some researchers have tried to include vehicle information in pedestrian trajectory prediction, but their methods have limitations. Eiffert et al. [26] improved pedestrian trajectory prediction by encoding interactions between pedestrians and a single vehicle using a feature learning network called the “Graph pedestrian–vehicle Attention Network”. However, this method only considers a single vehicle on the road, not multiple vehicles. On the other hand, Chandra et al. [27,28,29] and Carrasco et al. [30] proposed models that can predict the trajectories of heterogeneous traffic agents, including pedestrians, but their primary focus was on vehicles and motorcycles rather than pedestrians. Therefore, there is still a need for more research on the interaction between pedestrians and vehicles in trajectory prediction.

3. Intelligent Vehicle Trajectory Prediction

In the realm of predicting vehicle movements, it has become increasingly evident that a more comprehensive approach is essential. The integration of perception systems, cameras, and intelligent vehicular systems has simplified the acquisition of data from both driving agents and the environment. Nevertheless, relying solely on a traffic agent’s trajectory history for prediction can result in errors, particularly in intricate scenarios. Real-life driving situations are inherently complex, and classical methods of predicting intelligent vehicle trajectories possess limitations. These methods struggle to encompass the multifaceted ways vehicles interact with their surroundings, especially concerning other road users like pedestrians, cyclists, and fellow drivers. Recognizing the significance of comprehending and modeling the diverse interactions on the road proves vital for accurate trajectory prediction. Approaches that are mindful of interactions, acknowledging inter-agent dynamics and behavioral dependencies, contribute to elevated prediction accuracy [109]. Such approaches facilitate the gathering of extensive data on the behaviors and intentions of various road users. Expanding upon the foundation of interaction-aware trajectory prediction, the utilization of graph-based interaction reasoning employs graphs to more effectively capture the intricate relationships and interdependencies between road users. This proves particularly valuable in scenarios where conventional prediction models fall short, such as navigating complex intersections, unstructured road environments, and bustling urban settings characterized by a mix of user behaviors. As cited in [109], intelligent vehicle trajectory prediction models can be categorized into two primary types: interaction-aware trajectory prediction and graph-based interaction reasoning. Our decision to follow this categorization stems from a resolute intention to enhance the fidelity, precision, and adaptability of these models.

3.1. Interaction-Aware Trajectory Prediction

Numerous studies have endeavored to enhance interaction awareness for trajectory prediction approaches by modeling inter-agent correlations among all agents in a driving scene. The early literature on interaction awareness employed traditional approaches, such as classical machine learning models, for example, Hidden Markov Models (HMM), Support Vector Machines (SVM), and Bayesian networks [110,111,112,113]. Nevertheless, these conventional methodologies exhibit suboptimal performance in long-term predictions, particularly for intricate scenarios, and are ill-suited for real-time analysis [114].
The employment of deep learning models, specifically recurrent neural networks (RNNs), temporal convolutional neural networks (CNNs), and graph neural networks (GNNs), has captured the interest of scholars owing to their effectiveness and versatility in various research fields, notably in predicting vehicle trajectories in complex settings. Additionally, the literature proposes a variety of techniques to model the inter-agent interactions for vehicle trajectory prediction. One such approach involves explicitly incorporating the trajectory history of the Target Agent (TA) and its Surrounding Agents (SAs) into the model [115,116,117,118,119,120] in order to consider the impact of SAs. For instance, Dai et al. [115] proposed a two-group LSTM-based RNN approach to model the interactions between the TA and each of its neighbors and subsequently predict the future trajectory of the TA based on its trajectory history. Another approach, TrafficPredict, was introduced by Ma et al. [116], where a system architecture with two layers of LSTM recurrent units was designed to obtain the motion patterns of traffic participants and identify similar behavior among the same group of traffic participants, such as vehicles or bicycles. These methods have limitations, as they fail to account for the effect of the environment and traffic regulations on the TA’s behavior.
A potential alternative strategy for modeling social interactions among a large number of traffic participants in a given scenario involves the implementation of a social pooling mechanism [50,52,121]. This mechanism permits neighboring agents’ LSTM units to share knowledge with one another. Alahi et al. [50] proposed the S-LSTM method, which enables the recurrent units associated with SAs to connect with one another via the design of a pooling layer between each existing LSTM cell. In this technique, the hidden states are streamlined across all agents within an occupancy map. To effectively represent the interactions between all Scene Agents (SAs) in a specific setting, Gupta and colleagues [52] introduced a novel pooling approach known as S-GAN, which relies on a multi-layer perceptron (MLP) coupled with max pooling. The presented approach calculates a comprehensive pooling vector for each Temporal Attribute (TA). This vector is derived from the relative coordinates between the TA, its Spatial Attributes (SAs), and their respective hidden states. In a related work by Deo et al. [121], the authors introduced CS-LSTM, an encoder framework designed for vehicle trajectory prediction. In this approach, convolution and max pooling procedures are utilized across a spatial grid, which accurately captures the TA’s surroundings. Nevertheless, the representations obtained for the vehicles still lack integration with their individual states, leading to inefficiencies in localized computations. Messaoud et al. introduced a novel approach to tackle this problem by employing the Multi-Head Attention (MHA) pooling technique [122,123]. This technique utilizes an encoder based on LSTM to generate a vector representation for each vehicle. Then, an MHA framework is utilized to assess the interconnections among vehicles, specifically focusing on the target vehicle and its Surrounding Agents (SAs) within a defined spatial map. It has been experimentally validated that the implementation of an MHA effectively minimizes the workload of localized computations. Nevertheless, these methods’ lack of efficiency in addressing complex spatio-temporal correlations among traffic participants is a significant drawback. Additionally, the performance of these methods can be affected by the distance used to generate the occupancy grid or the number of SAs considered.

3.2. Graph-Based Interaction Reasoning

Recently, the research area of trajectory prediction has seen a growing interest in graph-based interaction reasoning as an alternative approach to address the limitations of interaction-aware path prediction methods, as discussed in the previous section. Graph-based approaches have focused on modeling interactions between various agents within a driving scene as graphs, where nodes represent agents and edges represent inter-agent interactions. This allows for the simultaneous consideration of spatial and temporal inter-agent correlations. In a particular study, Diehl and colleagues employed a directed graph to model a highway-driving scenario. They proceeded to assess and compare the effectiveness of GAT and GCN in traffic prediction, taking into account a predetermined number of nearby vehicles [124]. In contrast, the authors’ approach to generating a homogeneous graph overlooks crucial factors such as vehicle dynamics and types. To address this, Li et al. proposed a method using a homogeneous undirected graph to capture inter-vehicle interactions and employing graph convolutions to uncover essential features within the dataset [125]. A decoder based on LSTM is utilized to predict the future trajectory of the vehicles. However, the technique still exhibits the previously mentioned constraint. Azadani et al. utilized undirected spatio-temporal graphs to model inter-vehicle interactions and analyzed the trajectory history of target vehicles and their surrounding vehicles using graph and temporal gated convolutions [126]. The future trajectory of the vehicle agents is then predicted using temporal convolutions applied to the extracted latent representations. In recent research, Wu et al. [127] proposed an encoder–decoder architecture that takes into account temporal interdependencies using Multi-Head Attention (MHA) and spatial interactions with graph attention network (GAT) modules. The resulting outputs from these separate modules are then aggregated and fed into a Long Short-Term Memory (LSTM)-based decoder. Similarly, Li et al. [90] introduced the STG-DAT system, which comprises three key modules, namely feature extraction using a multi-layer perceptron (MLP), representation extraction using a GAT as an encoder, and path generation employing Gated Recurrent Units (GRU) while considering the kinematic constraints.
Moreover, a recent study by Mo et al. introduced a directed graph model to analyze different groups of agents in a driving scenario [95]. The researchers used distinct encoders to account for the various agent types present in the scene, as each type’s specific behavior significantly influences their future trajectory patterns. Similarly, following a comparable approach, Sheng et al. developed a distance-dependent weighted graph to represent the Target Agent (TA) and the neighboring vehicles [128]. They analyzed this spatial graph using graph convolutional networks (GCN) and employed GRU units to predict the vehicles’ future trajectory. Furthermore, an alternative approach by Gao et al. involves constructing diverse sub-graphs for individual agents and a high-order graph to capture inter-agent interactions [129]. However, this method’s dense graph generation fails to account for crucial spatial and edge features among all agents. These recent advancements in modeling temporal and spatial interactions among agents have shown promising results in predicting future trajectories in complex environments.

This entry is adapted from the peer-reviewed paper 10.3390/s23177361

This entry is offline, you can click here to edit this entry!