Image-based refined 3D reconstruction relies on high-resolution and multi-angle images of scenes. The assistance of multi-rotor drones and gimbal provides great convenience for image acquisition.
There is an increasing demand for 3D reconstruction of large scenes in areas such as urban planning, autonomous driving, virtual reality and gaming. Currently, considering data sources, the 3D reconstruction methods can be divided into two types: laser scanning-based and image-based 
. Generally, the laser scanning-based model is most costly, and the reconstructed model suffers from a lack of texture. Image-based 3D reconstruction methods, on the other hand, are less expensive and more effective even with only a monocular camera 
. Many fine-grained reconstruction efforts are carried out using image-based 3D reconstruction methods. These works mainly revolve around the SFM (Structure from Motion) and MVS (Multi-View Stereo) theories 
, observing the target from different perspectives and capturing the 3D information to obtain better reconstruction results. The core idea of the image-based 3D reconstruction method is the efficient use of geometric information on multi-view images 
To obtain high-quality models, current research focuses on two aspects. One is to improve the accuracy and computational performance of reconstruction algorithms. There are already many mature algorithms and software that can achieve high-quality image 3D reconstruction. Open-source algorithms include Colmap, OpenMVG, VisualSfM, etc., while commercial software includes Context Capture, MetaShape, Reality Capture, Pix4D, etc. In addition, some studies such as MVSNet 
and R-MVSNet 
have used an end-to-end depth estimation framework based on deep learning to obtain 3D dense point clouds by estimating depth directly from images to improve accuracy in scenes with repeated or missing textures and drastic changes in illumination. The reconstruction algorithms have been maturely developed relatively. The other is to improve the quality of the reconstructed model by acquiring or selecting high-quality images. Unlike the optimization of reconstruction algorithms, this is mainly applied in the data acquisition phase to input high-quality images to the reconstruction system. The quality of the images determines the reconstructed model quality 
, while the number and resolution of the images determine the time cost of the reconstruction process. Inadequate and insufficient coverage can result in mismatches between images or holes in reconstructed models. On the other hand, excessively redundant images would increase the time and calculation cost during image acquisition and reconstruction processes, and even lead to poor reconstruction quality 
. Image collection is increasingly becoming an essential issue in 3D reconstruction work 
UAVs are widely used in the image acquisition process for 3D reconstruction to acquire images from multiple views, including different orientations and positions. However, most of the flight paths of UAVs are performed under manual control or some predefined flight modes in practical operations 
, in which situation a greater number of images is prone to be captured and thus cause redundancy and long-time consumption. An efficient path planning solution for UAVs to capture images autonomously which can ensure flight safety and reconstructability is urgently required 
. This research focuses on the planning of UAV photographic viewpoints and paths to achieve high-quality image acquisition and ultimately high-quality model reconstruction. The multi-rotor UAV with RTK (Real Time Kinematic) module and gimbal provides the hardware implementation basis for high-quality image collection 
, so that the one-to-one correspondence of captured images and the planned viewpoint can be met.
2. Priori Geometry Proxy
Current viewpoint selection methods for UAV path planning can be divided into two categories depending on whether an initial proxy is required: estimating the viewpoints in an unknown environment iteratively and determining the viewpoints based on an initial coarse model 
. The former estimates new viewpoints through iterative computation to increase information gain without prior knowledge 
. Ref. 
dynamically estimated 3D boundary boxes of buildings to guide online path planning for scene exploration and building observation. Ref. 
estimated building height and captured close-up images through the SLAM framework to reveal architectural details. Generally referred to as the next-best-view, it is challenging for this approach to meet the full coverage requirement of refined reconstruction and it relies on the real-time computing power of the UAV.
The latter solution is also called the priori proxy-based viewpoint selection approach 
, which is often referred to as explore-then-exploit. It requires an initial model, based on which analysis and planning are carried out to identify viewpoints that satisfy the reconstruction requirements. The proxy can be some existing 3D data of the scene with height information, or a low-precision model obtained from an initial flight. Ref. 
used a 2D map of the scene to estimate the building height from the shadows, then generated a 2.5D coarse model. Ref. 
took the reconstructed dense point cloud as the initial model and determined the viewpoints covering the whole scene after preprocessing and quality evaluation of the point cloud. Some studies used a 3D mesh as the proxy and planned the photographic viewpoint based on the surface geometric information 
. Considering safety and robustness, researchers choose the latter, that the rough proxy of the scene to be reconstructed is utilized to plan the flight path. For generality, a triangular mesh is utilized as the proxy. Any 3D data that can be converted to it can also be used, including DEM (Digital Elevation Model), point cloud, BIM model and 3D mesh of the large scene.
3. Viewpoint Optimization
To meet the requirements of 3D reconstruction and ensure the efficiency of the UAV, it is necessary to optimize the viewpoints set continuously and generate fewer viewpoints to complete the image acquisition and 3D reconstruction. Ref. 
selected viewpoints covering the whole scene considering visibility and self-occlusion, but did not consider the observation of particular parts and limitations on the number of viewpoints and flight time. Refs. 
applied submodular optimization methods to select candidate viewpoints, considering factors such as the number of viewpoints, camera angle, flight time and space obstacle avoidance, etc., hoping to obtain more information with the fewest viewpoints under the given constraints. Ref. 
applied a reconstruction heuristic to plan the location and orientation of photographic viewpoints in a continuous optimization problem, intending to produce a more accurate and complete reconstruction result with fewer images. The above methods require many iterations, and each iteration requires traversing every viewpoint. In addition, methods such as 
would generate many viewpoints, resulting in extremely long optimization time and local optimum