Multi-target tracking is an advanced visual work in computer vision, which is essential for understanding the autonomous driving environment. Due to the excellent performance of deep learning in visual object tracking, many state-of-the-art multi-target tracking algorithms have been developed.多目标跟踪是计算机视觉中的高级视觉工作,对于理解自动驾驶环境至关重要。由于深度学习在视觉对象跟踪领域的出色表现,许多一流的多目标跟踪算法已经发展起来。
Tracking Algorithm Framework |
Principle | Advantage | Disadvantage |
---|---|---|---|
TBD | All objects of interest are detected in each frame of the video, and then they are associated with the detected objects in the previous frame to achieve the effect of tracking | Simple structure and strong interpretability | Over-reliance on object detector performance; bloated algorithm design |
JDT | End-to-end trainable detection box paradigm to jointly learn detection and appearance features | Multi-module joint learning, weight sharing | Local receptive field, when the object is occluded, the tracking effect is not good |
Tranformer-based基于变压器 | Transformer encode变压器编码器-decoder architecture to obtain global and rich contextual interdependcies for tracking解码器架构,用于跟踪全局和丰富的上下文相互依赖关系 | Paraller coputing并行计算; rich global and contextual information; the tracking accuracy and accuracy have been greatly improved,with great potential in the filed of computer vision丰富的全局和上下文信息;跟踪精度和准确性大幅提升,在计算机视觉领域潜力巨大 | The parameters are too large and the computational overhead is high; the Transformer-based network has not been fully adapted to the filed of computer vision参数过大,计算开销高;基于变压器的网络尚未完全适应计算机视觉领域 |
To perform visual multi‐object tracking tasks, the datasets are listed in Table 2. Most detection and tracking elements in data collection are related to autos and pedestrians, which helps enhance autonomous driving.
Ref. | 裁判。 | Datasets | 数据 | Year | 年 | Feature | 特征 | DOI/URL | 数字对象标识符/网址 |
---|---|---|---|---|---|---|---|---|---|
[20][21][27][25,26,32 ] | MOT15,, 16,17,,17,20 | 2016–2020 | Sub datasets containing multiple different camera angles and scenes包含多个不同相机角度和场景的子数据集 | https://doi.org/10.48550/arXiv.1504.01942 https://motchallenge.net/ https://doi.org/10.48550/arXiv.1504.01942 https://motchallenge.net/ |
|||||
[22][23][27,28 ] | KITTI-Tracking基蒂跟踪 | 2012 | Provides annotations for cars and pedestrians, scene objects are sparse为汽车和行人提供注释,场景对象稀疏 | https://doi.org/10.1177/0278364913491297 https://www.cvlibs.net/datasets/kitti/eval_tracking.php https://doi.org/10.1177/0278364913491297 https://www.cvlibs.net/datasets/kitti/eval_tracking.php |
|||||
[24][29] | NuScenes新场景 | 2019 | Dense traffic and challenging driving conditions交通繁忙,驾驶条件恶劣 | https://doi.org/10.48550/arXiv.1903.11027 https://www.nuscenes.org/ https://doi.org/10.48550/arXiv.1903.11027 https://www.nuscenes.org/ |
|||||
[25][30] | Waymo韦莫 | 2020 | Diversified driving environment, dense label information多元化的驾驶环境,密集的标签信息 | https://doi.org/10.48550/arXiv.1912.04838 https://waymo.com/open/data/motion/tfexample https://doi.org/10.48550/arXiv.1912.04838 https://waymo.com/open/data/motion/tfexample |
Setting realistic and accurate evaluation metrics is essential for comparing the effectiveness of visual multi-object tracking algorithms in an unbiased and fair manner. The three criteria that make up the multi-object tracking assessment indicators are if the object detection is real-time, whether the predicted position matches the actual position, and whether each object maintains a distinct ID [28]. MOT Challenge offers recognized MOT evaluation metrics.
设置真实准确的评估指标对于以公正和公平的方式比较视觉多目标跟踪算法的有效性至关重要。构成多目标跟踪评估指标的三个标准是,目标检测是否实时,预测位置是否与实际位置匹配,以及每个对象是否保持不同的ID[33]。MOT 挑战赛提供公认的 MOT 评估指标。 MOTA(多目标跟踪精度):多目标跟踪的精度用于计算跟踪中累积的误差,包括跟踪对象的数量以及它们是否匹配:MOTA (Multi‐Object‐Tracking Accuracy): the accuracy of multi‐object tracking is used to count the accumulation of errors in tracking, including the number of tracking objects and whether they match:
where FN (False Negative) is the number of detection frames that do not match the prediction frame; FP (False positive) is the number of prediction frames that do not match the detection frame; IDSW (ID Switch) is the object ID change the number of times; GT(Ground Truth) is the number of tracking objects.
其中 FN(假阴性)是与预测帧不匹配的检测帧数;FP(误报)是与检测帧不匹配的预测帧数;IDSW(ID开关)是对象ID变化的次数;GT(地面实况)是跟踪对象的数量。MOTP (Multi‐Object‐Tracking Precision): the accuracy of multi‐object tracking, which is used to evaluate whether the object position is accurately positioned.
MOTP(多目标跟踪精度):多目标跟踪的精度,用于评估目标位置是否准确定位。where Ct is the number of matches between the object and the predicted object in the t‐th frame; Bt(i) is the distance between the corresponding position of the object in the t‐th frame and the predicted position, also known as the matching error.
AMOTA (Average Multiple Object Tracking Accuracy): summarize MOTA overall object confidence thresholds instead of using a single threshold. Similar to mAP for object detection, it is used to evaluate the overall accurate performance of the tracking algorithm under all thresholds to improve algorithm robustness. AMOTA can be calculated by integrating MOTA under the recall curve, using interpolation to approximate the integral in order to simplify the calculation.
where L represents the number of recall values (integration confidence threshold), the higher the L, the more accurate the approximate integral. AMOTA represents the multi‐object tracking accuracy at a specific recall value r.
其中 L 表示召回率值的数量(积分置信阈值),L 越高,近似积分越准确。AMOTA 表示特定召回值 r 下的多目标跟踪精度。AMOTP (Average Multi‐object Tracking Precision): The same calculation method as AMOTA, with recall as the abscissa and MOTP as the ordinate, use the interpolation method to obtain AMOTP.
AMOTP(平均多目标跟踪精度):与AMOTA相同的计算方法,以召回为横坐标,MOTP为纵坐标,采用插值法得到AMOTP。IDF1 (ID F1 score): measures the difference between the predicted ID and the correct ID.
MT(大部分跟踪):在 80% 的时间内成功跟踪的对象数占所有跟踪对象的百分比。MT (Mostly Tracked): the number of objects that are successfully tracked 80% of the time as a percentage of all tracked objects.
ML(大部分丢失):满足跟踪成功的对象数量的百分比 20% 的时间在所有跟踪的对象中。ML (Mostly Lost): the percentage of the number of objects that satisfy the tracking success 20% of the time out of all the objects tracked.
FM(分段):评估跟踪完整性,定义为 FM,每当轨迹将其状态从跟踪更改为未跟踪时计数,并在以后的时间点跟踪相同的轨迹。FM (Fragmentation): evaluate tracking integrity, defined as FM, counted whenever a trajectory changes its state from tracked to untracked, and the same trajectory is tracked at a later point in time.
HOTA(高阶度量):由 [34] 提出的用于评估 MOT 的高阶度量。以前的指标过分强调了检测或关联的重要性。此评估指标明确地平衡了执行准确检测、关联和定位的效果,以比较跟踪器。HOTA评分与人类视觉评估更一致。HOTA (Higher Order Metric): A higher order metric for evaluating MOT proposed by [29]. Previous metrics overemphasized the importance of detection or association. This evaluation metric explicitly balances the effects of performing accurate detection, association, and localization into a unified metric for comparing trackers. HOTA scores are more consistent with human visual evaluations.
where
其中α is the IoU threshold, and
是 IoU 阈值,c is the number of positive sample trajectories. In the object tracking experiment, there are predicted detection trajectories and ground truth trajectories. The intersection between the two trajectories is called true positive association(TPA), and the trajectory outside the intersection in the predicted trajectory is called false positive association (FPA). Detections outside the intersection in ground truth trajectories are false negative associations (FNA).
是正样本轨迹的数量。在目标跟踪实验中,有预测的检测轨迹和地面真实轨迹。两条轨迹之间的交点称为真阳性关联(TPA),预测轨迹中交点外的轨迹称为假阳性关联(FPA)。地面真实轨迹中交叉点外的检测是假阴性关联 (FNA)。