N-view triangulation: DLT methods 2 and 3 (2).

multiple-view-geometry triangulation

Continuing with DLT methods for N-view triangulation, I was most familiar with the triangulation methods presented in Hartley and Zisserman’s Multiple View Geometry in Computer Vision book (second edition) MVG2004. There, in Section 12.2 a homogenous and inhomogenous method are presented. Going back to the papers, those methods are also described in Hartley and Sturm’s paper titled, ‘Triangulation’ HS1997, where

Linear-Eigen Method \(\mapsto\) Homogenous method (DLT) from section 12.2 of MVG2004

and

Linear-LS Method \(\mapsto\) inhomogenous method from section 12.2 of MVG2004.

In the paper HS1997, there is an interesting discussion about under which conditions either of these methods could be used (section 5.1). The 1997 paper references as the source for the DLT method an earlier paper, “Stereo from uncalibrated cameras,” by Richard Hartley, Rajiv Gupta, and Tom Chang that was at CVPR 1992 HGT1992, the relevant section is 2.2, on page 4. A 4.5-page CVPR paper, it can happen!

Formulation

Ok, after all of that history, here’s the essentials of the method.

Let

  • \(\mathbf{P}\): \(3 \times 4\) camera calibration matrix.
  • \(\mathbf{P}^{iT}\): \(1 \times 4\) \(i\)th row of \(\mathbf{P}\).
  • \(\mathbf{X}\) (world point): column vector, size \(4\).
  • \(\mathbf{x} = \begin{bmatrix} u\\ v\\ 1\end{bmatrix}\) (image point): column vector, size \(3\).
  • \(k\) is the unknown scale factor of image points, \(\mathbf{x} = k\begin{bmatrix} u\\ v\\ 1\end{bmatrix}\).

Like on Page 1, the projection of world point \(\mathbf{X}\) to the image point \(\mathbf{x}\) is \(\mathbf{x} = \mathbf{PX}\). Substituting in the unknown scale factor, and multiplying everything out, we have,

\[\mathbf{x} = \mathbf{PX}\] \[k\begin{bmatrix} u\\ v\\ 1\end{bmatrix} = \mathbf{PX}\] \[ku = \mathbf{P}^{1T}\mathbf{X}\] \[kv = \mathbf{P}^{2T}\mathbf{X}\] \[k = \mathbf{P}^{3T}\mathbf{X}\]

From here, we do a little bit of algebra, \(k\) is in common amoung the three equations, so for instance

\[k = \mathbf{P}^{3T}\mathbf{X} = \frac{\mathbf{P}^{1T}\mathbf{X}}{u}\]

and then we can eliminate \(k\),

\[u\mathbf{P}^{3T}\mathbf{X} = \mathbf{P}^{1T}\mathbf{X}\]

or

\[u\mathbf{P}^{3T}\mathbf{X} - \mathbf{P}^{1T}\mathbf{X} = 0\]

and similarly,

\[v\mathbf{P}^{3T}\mathbf{X} - \mathbf{P}^{2T}\mathbf{X} = 0\]

So for N-view triangulation, you’ll use \(n\) of these image points and camera calibration matrices to create a \((2n) \times 4\) matrix we’ll call \(\mathbf{A}\),

\[\mathbf{A} = \begin{bmatrix} u_1\mathbf{P}_1^{3T} - \mathbf{P}_1^{1T} \\ v_1\mathbf{P}_1^{3T} - \mathbf{P}_1^{2T} \\ u_2\mathbf{P}_2^{3T} - \mathbf{P}_2^{1T} \\ v_2\mathbf{P}_2^{3T} - \mathbf{P}_2^{2T} \\ \vdots \\ u_n\mathbf{P}_n^{3T} - \mathbf{P}_n^{1T} \\ v_n\mathbf{P}_n^{3T} - \mathbf{P}_n^{2T} \end{bmatrix}\]

And multiplying this \(\mathbf{A}\) matrix by the world point should be zero \(\mathbf{A}\mathbf{X} = \mathbf{0}\), from the equations where we eliminated \(k\).

Homogenous method (DLT 2 – Linear-Eigen)

The homogenous method is simple from here – use SVD, the last column of the V matrix from SVD is the solution that minimizes \(||\mathbf{A}\mathbf{X}||\) with \(||\mathbf{X}|| = 1\). Like from Page 1 DLT method 1, you’ll need to normalize the solution such that the fourth element of \(\mathbf{X}\) is 1.

\[\mathbf{X} = \begin{bmatrix} X\\ Y\\ Z\\ 1 \end{bmatrix}\]

Inhomogenous method (DLT 3 – Linear-LS)

The inhomogenous method seems not as popular, and it took me a minute to figure out the derivation, which is mostly presented descriptively in text.

Starting from the essentials section and the matrix \(\mathbf{A}\), the assumption is that the world point is inhomogenous, or the last value is set to 1.

\[\begin{bmatrix} u_1\mathbf{P}_1^{3T} - \mathbf{P}_1^{1T} \\ v_1\mathbf{P}_1^{3T} - \mathbf{P}_1^{2T} \\ u_2\mathbf{P}_2^{3T} - \mathbf{P}_2^{1T} \\ v_2\mathbf{P}_2^{3T} - \mathbf{P}_2^{2T} \\ \vdots \\ u_n\mathbf{P}_n^{3T} - \mathbf{P}_n^{1T} \\ v_n\mathbf{P}_n^{3T} - \mathbf{P}_n^{2T} \end{bmatrix} \begin{bmatrix} X_w\\ Y_w\\ Z_w\\ 1\end{bmatrix} = \mathbf{0}\]

(there are too many x’s when working with multiple-vew geometry … \(X_w\) is the world x-coordinate.)

Let \(\mathbf{A}_{1:3}\) be a submatrix of matrix A, the first three columns, and \(\mathbf{A}_{4}\) be another submatrix, the last column of \(\mathbf{A}\). Multiplying everything out, we can represent the problem as

\[\mathbf{A}_{1:3} \begin{bmatrix} X_w\\ Y_w\\ Z_w \end{bmatrix} + \mathbf{A}_{4} = 0\]

This is starting to look like a least squares problem! Ok, rearranging,

\[\mathbf{A}_{1:3} \begin{bmatrix} X_w\\ Y_w\\ Z_w \end{bmatrix} = -\mathbf{A}_{4}\] \[\begin{bmatrix} X_w\\ Y_w\\ Z_w \end{bmatrix} = -(\mathbf{A}_{1:3}^T\mathbf{A}_{1:3})^{-1}\mathbf{A}_{1:3}^T\mathbf{A}_{4}\]

EDIT: The above method uses the normal equations, and isn’t the optimal method for solving linear least squares minimization. See this TIL for details.

There is a discussion of normalization relevant to this method on Page 3.

Listing of N-view Triangulation posts.

References

[HS1997] Richard Hartley and Peter Sturm, “Triangulation,” 1997. Computer Vision and Image Understanding, vol. 68, issue 2, pages 146–157. doi 10.1006/cviu.1997.0547. link.

[HGC1992] Richard Hartley, Rajiv Gupta, and Tom Chang, “Stereo from uncalibrated cameras,” 1992. CVPR 1992. doi 10.1109/CVPR.1992.223179. link.

[MVG2004] Richard Hartley and Andrew Zisserman, Multiple View Geometry in Computer Vision, 2nd edition. 2004. Cambridge University Press, ISBN: 0521540518. link.

© Amy Tabb 2018 - 2023. All rights reserved. The contents of this site reflect my personal perspectives and not those of any other entity.