Visual odometry (VO), including keypoint detection, correspondence establishment, and pose estimation, is a crucial technique for determining motion in machine vision, with significant applications in augmented reality (AR), autonomous driving, and visual simultaneous localization and mapping (SLAM). For feature-based VO, the repeatability of keypoints affects the pose estimation. The convolutional neural network (CNN)-based detectors extract high-level features from images, thereby exhibiting robustness to viewpoint and illumination changes. Compared with descriptor matching, optical flow tracking exhibits better real-time performance. However, mainstream CNN-based detectors rely on the “joint detection and descriptor” framework to realize matching, making them incompatible with optical flow tracking. To obtain keypoints suitable for optical flow tracking, we propose a self-supervised detector based on transfer learning named OFPoint, which jointly calculates pixel-level positions and confidences. We use the descriptor-based detector simple learned keypoints (SiLK) as the pre-trained model and fine-tune it to avoid training from scratch. To achieve multi-scale feature fusion in detection, we integrate the multi-scale attention mechanism. Furthermore, we introduce the maximum discriminative probability loss term, ensuring the grayscale consistency and local stability of keypoints. OFPoint achieves a balance between accuracy and real-time performance when establishing correspondences on HPatches. Additionally, we demonstrate its effectiveness in VO and its potential for graphics applications such as AR.