1000/1000
Hot
Most Recent
There is no comment~
More
In computer vision, pattern recognition, and robotics, point set registration, also known as point cloud registration or scan matching, is the process of finding a spatial transformation (e.g., scaling, rotation and translation) that aligns two point clouds. The purpose of finding such a transformation includes merging multiple data sets into a globally consistent model (or coordinate frame), and mapping a new measurement to a known data set to identify features or to estimate its pose. Raw 3D point cloud data are typically obtained from Lidars and RGB-D cameras. 3D point clouds can also be generated from computer vision algorithms such as triangulation, bundle adjustment, and more recently, monocular image depth estimation using deep learning. For 2D point set registration used in image processing and feature-based image registration, a point set may be 2D pixel coordinates obtained by feature extraction from an image, for example corner detection. Point cloud registration has extensive applications in autonomous driving, motion estimation and 3D reconstruction, object detection and pose estimation, robotic manipulation, simultaneous localization and mapping (SLAM), panorama stitching, virtual and augmented reality, and medical imaging. As a special case, registration of two point sets that only differ by a 3D rotation (i.e., there is no scaling and translation), is called the Wahba Problem and also related to the orthogonal procrustes problem.
The problem may be summarized as follows:[1] Let
The output of a point set registration algorithm is therefore the optimal transformation
where
where
When the correspondences (i.e.,
Given two point sets, rigid registration yields a rigid transformation which maps one point set to the other. A rigid transformation is defined as a transformation that does not change the distance between any two points. Typically such a transformation consists of translation and rotation.[2] In rare cases, the point set may also be mirrored. In robotics and computer vision, rigid registration has the most applications.
Given two point sets, non-rigid registration yields a non-rigid transformation which maps one point set to the other. Non-rigid transformations include affine transformations such as scaling and shear mapping. However, in the context of point set registration, non-rigid registration typically involves nonlinear transformation. If the eigenmodes of variation of the point set are known, the nonlinear transformation may be parametrized by the eigenvalues.[3] A nonlinear transformation may also be parametrized as a thin plate spline.[3][4]
Some approaches to point set registration use algorithms that solve the more general graph matching problem.[1] However, the computational complexity of such methods tend to be high and they are limited to rigid registrations. In this article, we will only consider algorithms for rigid registration, where the transformation is assumed to contain 3D rotations and translations (possibly also including a uniform scaling).
The PCL (Point Cloud Library) is an open-source framework for n-dimensional point cloud and 3D geometry processing. It includes several point registration algorithms.[5]
Correspondence-based methods assume the putative correspondences
In the simplest case, one can assume that all the correspondences are correct, meaning that the points
where
Note that when the scaling factor is 1 and the translation vector is zero, then the optimization recovers the formulation of the Wahba problem. Despite the non-convexity of the optimization (cb.2) due to non-convexity of the set
More recently, Briales and Gonzalez-Jimenez have developed a semidefinite relaxation using Lagrangian duality, for the case where the model set
The least squares formulation (cb.2) is known to perform arbitrarily bad in the presence of outliers. An outlier correspondence is a pair of measurements
where if the
Next, we describe several common paradigms for robust registration.
Maximum consensus seeks to find the largest set of correspondences that are consistent with the generative model (cb.1) for some choice of spatial transformation
where
Although solving consensus maximization exactly is hard, there exist efficient heuristics that perform quite well in practice. One of the most popular heuristics is the Random Sample Consensus (RANSAC) scheme.[16] RANSAC is an iterative hypothesize-and-verity method. At each iteration, the method first randomly samples 3 out of the total number of
To fill the gap between the fast but inexact RANSAC scheme and the exact but exhaustive BnB optimization, recent researches have developed deterministic approximate methods to solve consensus maximization.[11][12][13][17]
Outlier removal methods seek to pre-process the set of highly corrupted correspondences before estimating the spatial transformation. The motivation of outlier removal is to significantly reduce the number of outlier correspondences, while maintaining inlier correspondences, so that optimization over the transformation becomes easier and more efficient (e.g., RANSAC works poorly when the outlier ratio is above
Parra et al. have proposed a method called Guaranteed Outlier Removal (GORE) that uses geometric constraints to prune outlier correspondences while guaranteeing to preserve inlier correspondences.[10] GORE has been shown to be able to drastically reduce the outlier ratio, which can significantly boost the performance of consensus maximization using RANSAC or BnB. Yang and Carlone have proposed to build pairwise translation-and-rotation-invariant measurements (TRIMs) from the original set of measurements and embed TRIMs as the edges of a graph whose nodes are the 3D points. Since inliers are pairwise consistent in terms of the scale, they must form a clique within the graph. Therefore, using efficient algorithms for computing the maximum clique of a graph can find the inliers and effectively prune the outliers.[18] The maximum clique based outlier removal method is also shown to be quite useful in real-world point set registration problems.[9] Similar outlier removal ideas were also proposed by Parra et al..[19]
M-estimation replaces the least squares objective function in (cb.2) with a robust cost function that is less sensitive to outliers. Formally, M-estimation seeks to solve the following problem:
where
Graduated non-convexity (GNC) is a general-purpose framework for solving non-convex optimization problems without initialization. It has achieved success in early vision and machine learning applications.[25][26] The key idea behind GNC is to solve the hard non-convex problem by starting from an easy convex problem. Specifically, for a given robust cost function
Black and Rangarajan proved that the objective function of each optimization (cb.6) can be dualized into a sum of weighted least squares and a so-called outlier process function on the weights that determine the confidence of the optimization in each pair of measurements.[25] Using Black-Rangarajan duality and GNC tailored for the Geman-McClure function, Zhou et al. developed the fast global registration algorithm that is robust against about
Almost none of the robust registration algorithms mentioned above (except the BnB algorithm that runs in exponential-time in the worst case) comes with performance guarantees, which means that these algorithms can return completely incorrect estimates without notice. Therefore, these algorithms are undesirable for safety-critical applications like autonomous driving.
Very recently, Yang et al. has developed the first certifiably robust registration algorithm, named Truncated least squares Estimation And SEmidefinite Relaxation (TEASER).[9] For point cloud registration, TEASER not only outputs an estimate of the transformation, but also quantifies the optimality of the given estimate. TEASER adopts the following truncated least squares (TLS) estimator:
which is obtained by choosing the TLS robust cost function
However, solving (cb.7) is quite challenging due to its combinatorial nature. TEASER solves (cb.7) as follows : (i) It builds invariant measurements such that the estimation of scale, rotation and translation can be decoupled and solved separately, a strategy that is inspired by the original Horn's method; (ii) The same TLS estimation is applied for each of the three sub-problems, where the scale TLS problem can be solved exactly using an algorithm called adaptive voting, the rotation TLS problem can relaxed to a semidefinite program (SDP) where the relaxation is exact in practice,[22] even with large amount of outliers; the translation TLS problem can solved using component-wise adaptive voting. A fast implementation leveraging GNC is open-sourced here. In practice, TEASER can tolerate more than
In addition to developing TEASER, Yang et al. also prove that, under some mild conditions on the point cloud data, TEASER's estimated transformation has bounded errors from the ground-truth transformation.[9]
The iterative closest point (ICP) algorithm was introduced by Besl and McKay.[28] The algorithm performs rigid registration in an iterative fashion by alternating in (i) given the transformation, finding the closest point in
algorithm ICP(M, S) θ := θ0 while not registered: X := ∅ for mi ∊ T(M, θ): ŝj := closest point in S to mi X := X + ⟨mi, ŝj⟩ θ := least_squares(X) return θ
Here, the function least_squares
performs least squares optimization to minimize the distance in each of the
Because the cost function of registration depends on finding the closest point in
Robust point matching (RPM) was introduced by Gold et al.[31] The method performs registration using deterministic annealing and soft assignment of correspondences between point sets. Whereas in ICP the correspondence generated by the nearest-neighbour heuristic is binary, RPM uses a soft correspondence where the correspondence between any two points can be anywhere from 0 to 1, although it ultimately converges to either 0 or 1. The correspondences found in RPM is always one-to-one, which is not always the case in ICP.[4] Let
The problem is then defined as: Given two point sets
The matrix
subject to
for some regularization parameter
The RPM method optimizes the cost function using the Softassign algorithm. The 1D case will be derived here. Given a set of variables
this is known as the softmax function. As
where
This is straightforward, except that now the constraints on
algorithm RPM2D[math]\displaystyle{ (\mathcal{M}, \mathcal{S}) }[/math] t := 0 a, θ b, c := 0 β := β0 [math]\displaystyle{ \hat{\mu}_{ij} := 1 + \epsilon }[/math] while β < βf: while μ has not converged: // update correspondence parameters by softassign [math]\displaystyle{ Q_{ij} := -\frac{\partial \operatorname{cost}}{\partial \mu_{ij}} }[/math] [math]\displaystyle{ \mu^0_{ij} := \exp(\beta Q_{ij}) }[/math] // apply Sinkhorn's method while [math]\displaystyle{ \hat{\mu} }[/math] has not converged: // update [math]\displaystyle{ \hat{\mu} }[/math] by normalizing across all rows: [math]\displaystyle{ \hat{\mu}^1_{ij} := \frac{\hat{\mu}^0_{ij}}{\sum_{i=1}^{M+1} \hat{\mu}^0_{ij}} }[/math] // update [math]\displaystyle{ \hat{\mu} }[/math] by normalizing across all columns: [math]\displaystyle{ \hat{\mu}^0_{ij} := \frac{\hat{\mu}^1_{ij}}{\sum_{j=1}^{N+1} \hat{\mu}^1_{ij}} }[/math] // update pose parameters by coordinate descent update θ using analytical solution update t using analytical solution update a, b, c using Newton's method [math]\displaystyle{ \beta := \beta_r \beta }[/math] [math]\displaystyle{ \gamma := \frac{\gamma}{\beta_r} }[/math] return a, b, c, θ and t
where the deterministic annealing control parameter
The algorithm can also be extended for point sets in 3D or higher dimensions. The constraints on the correspondence matrix
The thin plate spline robust point matching (TPS-RPM) algorithm by Chui and Rangarajan augments the RPM method to perform non-rigid registration by parametrizing the transformation as a thin plate spline.[4] However, because the thin plate spline parametrization only exists in three dimensions, the method cannot be extended to problems involving four or more dimensions.
The kernel correlation (KC) approach of point set registration was introduced by Tsin and Kanade.[29] Compared with ICP, the KC algorithm is more robust against noisy data. Unlike ICP, where, for every model point, only the closest scene point is considered, here every scene point affects every model point.[29] As such this is a multiply-linked registration algorithm. For some kernel function
The kernel function
The logarithm of KC of a point set is proportional, within a constant factor, to the information entropy. Observe that the KC is a measure of a "compactness" of the point set—trivially, if all points in the point set were at the same location, the KC would evaluate to a large value. The cost function of the point set registration algorithm for some transformation parameter
Some algebraic manipulation yields:
The expression is simplified by observing that
The kernel density estimates are defined as:
The cost function can then be shown to be the correlation of the two kernel density estimates:
Having established the cost function, the algorithm simply uses gradient descent to find the optimal transformation. It is computationally expensive to compute the cost function from scratch on every iteration, so a discrete version of the cost function Equation (kc.6) is used. The kernel density estimates
Compared to ICP and EM-ICP for noisy 2D and 3D point sets, the KC algorithm is less sensitive to noise and results in correct registration more often.[29]
The kernel density estimates are sums of Gaussians and may therefore be represented as Gaussian mixture models (GMM).[32] Jian and Vemuri use the GMM version of the KC registration algorithm to perform non-rigid registration parametrized by thin plate splines.
Coherent point drift (CPD) was introduced by Myronenko and Song.[3][33] The algorithm takes a probabilistic approach to aligning point sets, similar to the GMM KC method. Unlike earlier approaches to non-rigid registration which assume a thin plate spline transformation model, CPD is agnostic with regard to the transformation model used. The point set
Let there be M points in
where, in D dimensions,
The membership probabilities
The GMM centroids are re-parametrized by a set of parameters
where it is assumed that the data is independent and identically distributed. The correspondence probability between two points
The expectation maximization (EM) algorithm is used to find
Ignoring constants independent of
where
with
Minimizing the cost function in Equation (cpd.5) necessarily decreases the negative log-likelihood function E in Equation (cpd.3) unless it is already at a local minimum.[3] Thus, the algorithm can be expressed using the following pseudocode, where the point sets
algorithm CPD[math]\displaystyle{ (\mathcal{M}, \mathcal{S}) }[/math] θ := θ0 initialize 0 ≤ w ≤ 1 [math]\displaystyle{ \sigma^2 := \frac{1}{DNM}\sum_{j=1}^N \sum_{i=1}^M \lVert s_j - m_i \rVert^2 }[/math] while not registered: // E-step, compute P for i ∊ [1, M] and j ∊ [1, N]: [math]\displaystyle{ p_{ij} := \frac {\exp \left( -\frac{1}{2\sigma^2} \lVert s_j - T(m_i, \theta)\rVert^2 \right)} {\sum_{k=1}^{M} \exp \left( -\frac{1}{2\sigma^2} \lVert s_j - T(m_k, \theta)\rVert^2 \right) + (2\pi \sigma^2)^\frac{D}{2} \frac{w}{1-w} \frac{M}{N}} }[/math] // M-step, solve for optimal transformation {θ, σ2} := solve(S, M, P) return θ
where the vector solve
function differs by the type of registration performed. For example, in rigid registration, the output is a scale a, a rotation matrix
which is initialized to one, the identity matrix, and a column vector of zeroes:
The aligned point set is:
The solve_rigid
function for rigid registration can then be written as follows, with derivation of the algebra explained in Myronenko's 2010 paper.[3]
solve_rigid(S, M, P) NP := 1TP1 [math]\displaystyle{ \mu_s:=\frac{1}{N_\mathbf{P}}\mathbf{S}^T\mathbf{P}^T\mathbf{1} }[/math] [math]\displaystyle{ \mu_m:=\frac{1}{N_\mathbf{P}}\mathbf{M}^T\mathbf{P}\mathbf{1} }[/math] [math]\displaystyle{ \hat{\mathbf{S}}:=\mathbf{S} - \mathbf{1}\mu_s^T }[/math] [math]\displaystyle{ \hat{\mathbf{M}}:=\mathbf{M} - \mathbf{1}\mu_m^T }[/math] [math]\displaystyle{ \mathbf{A}:=\hat{\mathbf{S}^T}\mathbf{P}^T\hat{\mathbf{M}} }[/math] U, V := svd(A) // the singular value decomposition of A = UΣVT C := diag(1, …, 1, det(UVT)) // diag(ξ)is the diagonal matrix formed from vector ξ R := UCVT [math]\displaystyle{ a := \frac{\operatorname{tr}(\mathbf{A}^T\mathbf{R})}{\operatorname{tr}(\mathbf{\hat{\mathbf{M}}^T \operatorname{diag}(\mathbf{P}\mathbf{1})\hat{\mathbf{M}}})} }[/math] // tr is the trace of a matrix t := μs − aRμm [math]\displaystyle{ \sigma^2:=\frac{1}{N_\mathbf{P} D}(\operatorname{tr}(\mathbf{\hat{\mathbf{S}}^T \operatorname{diag}(\mathbf{P}^T\mathbf{1})\hat{\mathbf{S}}})-a\operatorname{tr}(\mathbf{A}^T\mathbf{R})) }[/math] return {a, R, t}, σ2
For affine registration, where the goal is to find an affine transformation instead of a rigid one, the output is an affine transformation matrix
The solve_affine
function for rigid registration can then be written as follows, with derivation of the algebra explained in Myronenko's 2010 paper.[3]
solve_affine(S, M, P) NP := 1TP1 [math]\displaystyle{ \mu_s := \frac{1}{N_\mathbf{P}}\mathbf{S}^T\mathbf{P}^T\mathbf{1} }[/math] [math]\displaystyle{ \mu_m := \frac{1}{N_\mathbf{P}}\mathbf{M}^T\mathbf{P}\mathbf{1} }[/math] [math]\displaystyle{ \hat{\mathbf{S}} := \mathbf{S} - \mathbf{1}\mu_s^T }[/math] [math]\displaystyle{ \hat{\mathbf{M}} := \mathbf{M} - \mathbf{1}\mu_s^T }[/math] [math]\displaystyle{ \mathbf{B} := (\hat{\mathbf{S} }^T\mathbf{P}^T\hat{\mathbf{M} })(\hat{\mathbf{M} }^T\operatorname{diag}(\mathbf{P}\mathbf{1})\hat{\mathbf{M} })^{-1} }[/math] t := μs − Bμm [math]\displaystyle{ \sigma^2 := \frac{1}{N_\mathbf{P} D}(\operatorname{tr}(\hat{\mathbf{S} }^T \operatorname{diag}(\mathbf{P}^T\mathbf{1})\hat{\mathbf{S} })-\operatorname{tr}(\hat{\mathbf{S} }^T\mathbf{P}^T\hat{\mathbf{M} }\mathbf{B}^T)) }[/math] return {B, t}, σ2
It is also possible to use CPD with non-rigid registration using a parametrization derived using calculus of variations.[3]
Sums of Gaussian distributions can be computed in linear time using the fast Gauss transform (FGT).[3] Consequently, the time complexity of CPD is
A variant of coherent point drift, called Bayesian coherent point drift (BCPD), was derived through a Bayesian formulation of point set registration. [34] BCPD has several advantages over CPD, e.g., (1) nonrigid and rigid registrations can be performed in a single algorithm, (2) the algorithm can be accelerated regardless of the Gaussianity of a Gram matrix to define motion coherence, (3) the algorithm is more robust against outliers because of a more reasonable definition of an outlier distribution. Additionally, in the Bayesian formulation, motion coherence was introduced through a prior distribution of displacement vectors, providing a clear difference between tuning parameters that control motion coherence.
This algorithm was introduced in 2013 by H. Assalih to accommodate sonar image registration.[35] These types of images tend to have high amounts of noise, so it is expected to have many outliers in the point sets to match. SCS delivers high robustness against outliers and can surpass ICP and CPD performance in the presence of outliers. SCS doesn't use iterative optimization in high dimensional space and is neither probabilistic nor spectral. SCS can match rigid and non-rigid transformations, and performs best when the target transformation is between three and six degrees of freedom.