Name: Optical Flow Tracking in Visual Odometry
Uploaded: 2025-04-11T03:19:20+01:00
Description: Visual odometry (VO), including keypoint detection, correspondence establishment, and pose estimation, is a crucial technique for determining motion in machine vision, with significant applications in augmented reality (AR), autonomous driving, and visual simultaneous localization and mapping (SLAM)

Video Introduction

This video is adapted from 10.3390/math13071087

Visual odometry (VO), including keypoint detection, correspondence establishment, and pose estimation, is a crucial technique for determining motion in machine vision, with significant applications in augmented reality (AR), autonomous driving, and visual simultaneous localization and mapping (SLAM). For feature-based VO, the repeatability of keypoints affects the pose estimation. The convolutional neural network (CNN)-based detectors extract high-level features from images, thereby exhibiting robustness to viewpoint and illumination changes. Compared with descriptor matching, optical flow tracking exhibits better real-time performance. However, mainstream CNN-based detectors rely on the “joint detection and descriptor” framework to realize matching, making them incompatible with optical flow tracking. To obtain keypoints suitable for optical flow tracking, we propose a self-supervised detector based on transfer learning named OFPoint, which jointly calculates pixel-level positions and confidences. We use the descriptor-based detector simple learned keypoints (SiLK) as the pre-trained model and fine-tune it to avoid training from scratch. To achieve multi-scale feature fusion in detection, we integrate the multi-scale attention mechanism. Furthermore, we introduce the maximum discriminative probability loss term, ensuring the grayscale consistency and local stability of keypoints. OFPoint achieves a balance between accuracy and real-time performance when establishing correspondences on HPatches. Additionally, we demonstrate its effectiveness in VO and its potential for graphics applications such as AR.

Full Transcript