Sequential Spacecraft Depth Completion

Sequential Spacecraft Depth Completion: Comparison

Please note this is a comparison between Version 2 by Rita Xu and Version 1 by Xiang Liu.

The recently proposed spacecraft three-dimensional (3D) structure recovery method based on optical images and LIDAR has enhanced the working distance of a spacecraft’s 3D perception system.

depth completion
sequential depth completion
multi-modal fusion

1. Introduction

The proliferation of satellites in Earth’s orbit has witnessed a significant surge in recent years. However, as many satellites encounter malfunctions or deplete their fuel reserves, the need for on-orbit maintenance ^[1] or the recovery of critical components ^[2] has become imperative. During the execution of on-orbit maintenance tasks, acquiring a target’s precise three-dimensional point cloud data is paramount as these data play a pivotal role in various aspects of space operations, such as navigation ^[3], three-dimensional (3D) reconstruction, pose estimation [4^[4][5],5], component identification and localization ^[6], and decision-making. Consequently, the acquisition of precise 3D point cloud data from a target object has emerged as a critical and fundamental requirement for the successful execution of numerous space missions to be conducted in the dynamic and challenging space environment.

To date, different sensor options have been proposed to obtain point cloud data efficiently and accurately, and they can be categorized into multi-camera vision systems ^[7], time-of-flight (TOF) cameras ^[8], and techniques that combine monocular and LIDAR systems ^[9]. Among these, multi-camera-based solutions utilize the triangulation principle to recover the depths of the extracted feature points, though these solutions struggle with smooth surfaces or repetitive textures. Furthermore, a binocular camera’s baseline dramatically limits such systems’ working distances, which is challenging for meeting the requirements of on-orbit tasks. Unlike binocular systems, TOF cameras accurately determine depths by gauging the times of laser pulse flights. Although capable of obtaining precise depths with high density, the working distances of TOF cameras are generally less than 10 m, hindering their use in practical applications. Recently, a combination of monocular and LIDAR systems has been proposed, and they utilize optical images and sparse-ranging information to restore a spacecraft’s dense depths. Compared with binocular systems and TOF cameras, combining a monocular camera with LIDAR can effectively increase a system’s working distance and reduce its sensitivity to light conditions and materials, which is more suitable for practical applications in space. Therefore, this paper aims to reconstruct a spacecraft’s detailed depth using images obtained using an optical camera and sparse depths obtained via LIDAR.

Numerous learning-based depth completion algorithms have been proposed in recent years, and they have been tailored to address the demands of diverse applications relying on depth information. Existing methods can be roughly categorized into early and late fusion models, depending on the layers where the multimodal data are fused. Early fusion models [10,11,12,13]^{[10][11][12][13]} concatenated the visible images and the depth maps directly and fed them into a U-Net-like system to regress their dense depths. Late fusion models [9,14,15,16,17]^{[9][14][15][16][17]} adopted multiple sub-networks to extract unimodal features contained in optical images and LIDAR separately. The extracted unimodal features were fused through various fusion modules and fed into a decoder to regress their dense depths.

2. Sequential Spacecraft Depth Completion

LIDAR and monocular-based depth completion tasks aim to reconstruct pixel-wise depths from sparse-ranging information obtained via LIDAR with the guidance of an optical image, and this has received considerable research interest due to its significance in different applications. Early research works [21,22,23,24]^{[18][19][20][21]} generally utilized traditional image-processing techniques (such as bilateral filters [25]^[22], global optimization [26]^[23], etc.) to generate dense depth maps. More recently, neural networks’ powerful feature extraction capabilities have propelled learning-based methods to outperform conventional techniques in both accuracy and efficiency. According to different LIDAR/monocular fusion strategies, learning-based depth completion methods can roughly be classified into early fusion models and late fusion models. Early fusion models [10,11,12,13]^{[10][11][12][13]} treated sparse depths as additional channels and fed the concatenation of RGB-D into a U-Net-like network to predict dense depths. For instance, sparse-to-dense ^[10] employed a regression network to predict the pixel-wise depths with RGB-D data as input. Despite the structure of such methods being simple and easy to implement, it is challenging to exhaustively utilize the complementary information of different modal data due to the lack of adequate guidance, leading to blurry depth prediction results. Therefore, various spatial propagation networks (SPNs) [27,28,29,30,31,32]^{[24][25][26][27][28][29]} have been introduced to improve the quality of depth maps derived from early fusion models. Specifically, SPNs [27]^[24] utilized a spatial propagation network to acquire an affinity matrix that captures pairwise interactions within an image, and it established a three-way linkage to facilitate spatial propagation. The CSPN [28]^[25] replaced the three-way connection propagation with recurrent convolution operations, solving the limitation that a SPN cannot consider all local neighbors simultaneously. On this basis, more and more variants (such as learning adaptive kernel sizes and adjusting iterations for each pixel [29]^[26], applying non-local propagation [30]^[27], making use of non-linear propagation [31]^[28], etc.) were proposed and yield better depth completion results. Late fusion models employed two parallel neural network branches to concurrently extract features from RGB images and depth data. Parallel neural network architectures have seen extensive adoption across diverse image processing tasks, such as image classification [33]^[30], multi-sensor data fusion [34,35]^[31][32], object detection [36]^[33], etc. Their widespread usage verifies their versatility and effectiveness in the realm of multi-source data processing. In depth completion tasks, the extracted image features are generally incorporated into depth features through various finely designed fusion modules, culminating in input to a decoder to generate dense depth information. Specifically, FusionNet ^[14] adopted 2D and 3D convolution to extract 2D and 3D features, respectively. The 3D features were then projected into the 2D space. Finally, the composite representations were generated by adding the 2D features and the projected 3D features. Inspired by guided filtering [25]^[22], GuideNet ^[15] proposed a guided unit to predict the content-dependent kernels, which were then leveraged for extracting depth features. FCFRNet ^[16] combined RGB-D features by employing channel shuffling and energy-based fusion operations. SDCNet ^[9] proposed an attention-based feature fusion module, facilitating the aggregation of complementary information from diverse inputs. In addition to single-frame depth completion methods, a few works were dedicated to sequential depth completion [18,19,20]^[34][35][36] tasks. Giang et al. [18]^[34] performed feature warping by utilizing the relative poses between frames and incorporated warped features into current features through a confidence-based integration module. Nguyen et al. [19]^[35] directly fed the prediction results of FusionNet ^[14] into recurrent neural networks to investigate temporal information, helping mitigate the mismatch between frames. Moreover, Chen et al. [20]^[36] utilized CoarseNet, PoseNet, and DepthNet to predict coarse dense maps, relative poses between frames, and final depth maps, respectively.

References

Santos, R.; Rade, D.; Fonseca, D. A machine learning strategy for optimal path planning of space robotic manipulator in on-orbit servicing. Acta Astronaut. 2022, 191, 41–54.
Henshaw, C. The darpa phoenix spacecraft servicing program: Overview and plans for risk reduction. In Proceedings of the International Symposium on Artificial Intelligence, Robotics and Automation in Space (I-SAIRAS), Montreal, QC, Canada, 17–19 June 2014.
Liu, Y.; Xie, Z.; Liu, H. Three-line structured light vision system for non-cooperative satellites in proximity operations. Chin. J. Aeronaut. 2020, 33, 1494–1504.
Guo, J.; He, Y.; Qi, X.; Wu, G.; Hu, Y.; Li, B.; Zhang, J. Real-time measurement and estimation of the 3D geometry and motion parameters for spatially unknown moving targets. Aerosp. Sci. Technol. 2020, 97, 105619.
Liu, X.; Wang, H.; Chen, X.; Chen, W.; Xie, Z. Position Awareness Network for Noncooperative Spacecraft Pose Estimation Based on Point Cloud. IEEE Trans. Aerosp. Electron. Syst. 2022, 59, 507–518.
Wei, Q.; Jiang, Z.; Zhang, H. Robust spacecraft component detection in point clouds. Sensors 2018, 18, 933.
De, J.; Jordaan, H.; Van, D. Experiment for pose estimation of uncooperative space debris using stereo vision. Acta Astronaut. 2020, 168, 164–173.
Jacopo, V.; Andreas, F.; Ulrich, W. Pose tracking of a noncooperative spacecraft during docking maneuvers using a time-of-flight sensor. In Proceedings of the AIAA Guidance, Navigation, and Control Conference (GNC), San Diego, CA, USA, 4–8 January 2016.
Liu, X.; Wang, H.; Yan, Z.; Chen, Y.; Chen, X.; Chen, W. Spacecraft depth completion based on the gray image and the sparse depth map. IEEE Trans. Aerosp. Electron. Syst. 2023, in press.
Ma, F.; Karaman, S. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In Proceedings of the International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018.
Imran, S.; Long, Y.; Liu, X.; Morris, D. Depth coefficients for depth completion. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019.
Teixeira, L.; Oswald, M.; Pollefeys, M.; Chli, M. Aerial single-view depth completion with image-guided uncertainty estimation. IEEE Robots. Autom. Lett. 2020, 5, 1055–1062.
Luo, Z.; Zhang, F.; Fu, G.; Xu, J. Self-Guided Instance-Aware Network for Depth Completion and Enhancement. In Proceedings of the International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021.
Chen, Y.; Yang, B.; Liang, M.; Urtasun, R. Learning joint 2d-3d representations for depth completion. In Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019.
Tang, J.; Tian, F.; Feng, W.; Li, J.; Tan, P. Learning guided convolutional network for depth completion. IEEE Trans. Image Process. 2020, 30, 1116–1129.
Liu, L.; Song, X.; Lyu, X.; Diao, J.; Wang, M.; Liu, Y.; Zhang, L. Fcfr-net: Feature fusion based coarse-to-fine residual learning for depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vancouver, BC, Canada, 2–9 February 2021.
Yan, Z.; Wang, K.; Li, X.; Zhang, Z.; Xu, B.; Li, J.; Yang, J. RigNet: Repetitive image guided network for depth completion. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022.
Yang, Q.; Yang, R.; Davis, J.; Nister, D. Spatial-depth super resolution for range images. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 17–22 June 2007.
Kopf, J.; Cohen, M.; Lischinski, D.; Uyttendaele, M. Joint bilateral upsampling. ACM Trans. Graph. 2007, 26, 96–101.
Ferstl, D.; Reinbacher, C.; Ranftl, R.; Ruther, M.; Bischof, H. Image guided depth upsampling using anisotropic total generalized variation. In Proceedings of the International Conference on Computer Vision (ICCV), Sydney, NSW, Australia, 1–8 December 2013.
Barron, J.; Poole, B. The fast bilateral solver. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016.
He, K.; Sun, J.; Tang, X. Guided image filtering. IEEE Trans Pattern Anal. Mach. Intell. 2012, 35, 1397–1409.
Lee, H.; Soohwan, S.; Sungho, J. 3D reconstruction using a sparse laser scanner and a single camera for outdoor autonomous vehicle. In Proceedings of the International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016.
Liu, S.; Mello, D.; Gu, J.; Zhong, G.; Yang, M.; Kautz, J. Learning affinity via spatial propagation networks. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017.
Cheng, X.; Wang, P.; Yang, R. Learning depth with convolutional spatial propagation network. IEEE Trans Pattern Anal. Mach. Intell. 2019, 42, 2361–2379.
Cheng, X.; Wang, P.; Guan, C.; Yang, R. Cspn++: Learning context and resource aware convolutional spatial propagation networks for depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020.
Park, J.; Joo, K.; Hu, Z.; Liu, C.; So, K. Non-local spatial propagation network for depth completion. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, Scotland, UK, 23–28 August 2020.
Lin, Y.; Cheng, T.; Zhong, Q.; Zhou, W.; Yang, H. Dynamic spatial propagation network for depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Virtual, 22 February–1 March 2022.
Hu, M.; Wang, S.; Li, B.; Ning, S.; Fan, L.; Gong, X. Penet: Towards precise and efficient image guided depth completion. In Proceedings of the International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021.
Li, Y.; Xu, Q.; Li, W.; Nie, J. Automatic clustering-based two-branch CNN for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7803–7816.
Yang, J.; Zhao, Y.; Chan, J. Hyperspectral and multispectral image fusion via deep two-branches convolutional neural network. Remote Sens. 2018, 10, 800.
Fu, Y.; Wu, X. A dual-branch network for infrared and visible image fusion. In Proceedings of the International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021.
Li, Y.; Xu, Q.; He, Z.; Li, W. Progressive Task-based Universal Network for Raw Infrared Remote Sensing Imagery Ship Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13.
Giang, K.; Song, S.; Kim, D.; Choi, S. Sequential Depth Completion with Confidence Estimation for 3D Model Reconstruction. IEEE Robots. Autom. Lett. 2020, 6, 327–334.
Nguyen, T.; Yoo, M. Dense-depth-net: A spatial-temporal approach on depth completion task. In Proceedings of the Region 10 Symposium (TENSYMP), Jeju, Korea, 23–25 August 2021.
Chen, Y.; Zhao, S.; Ji, W.; Gong, M.; Xie, L. MetaComp: Learning to Adapt for Online Depth Completion. arXiv 2022, arXiv:2207.10623.