Linear Least Squares with Equality constraints, the Lagrangian method.
27 Feb 2022This post is an annotated version a Linear Least Squares with Equality constraints (LSE for short) solution from Matrix Computations, Gene H. Golub and Charles F. Van Loan. 4th edition, 2013 ISBN 9781421407944.
I wasn’t able to find this solution on the web, so I decided to write a post where I’ll provide some extra detail and implementation snippets for the Golub and Van Loan solution in section 6.2.4, p. 316, “LSE solution using the augmented system,” a Lagrange multiplier method. (Note, that text provides other ways to solve this problem, but I am familiar with the method of Lagrange multipliers, so that’s where I started.)
Problem definition
First, the LSE problem is a constrained least squares problem in the following form:
\[\min_{Bx = d} \| Ax - b \|_2\]where
- \(A \in \mathbb{R}^{m_1 \times n_1}\), \(m_1 \geq n_1\),
- \(B \in \mathbb{R}^{m_2 \times n_1}\), \(m_2 < n_1\),
- \(b \in \mathbb{R}^{m_1}\),
- \(d \in \mathbb{R}^{m_2}\).
The 6.2.4 solution to this problem in Golub and Van Loan uses Lagrange multipliers. Form the Lagrangian function,
\[f(x, \lambda) = \frac{1}{2} \| Ax - b \|^2_2 + \lambda^T(d - Bx) \label{l-function}\]Note: from the method of Lagrange multipliers, the function to minimize is \(\| Ax - b \|_2\) and the constraint function \(g(x) = Bx - d\) or equivalently, \(g(x) = d - Bx\). \(g(x) = 0\) in the traditional setup using Lagrange multipliers.
Expanding Eq. \eqref{l-function}, we have
\[f(x, \lambda) = \frac{1}{2} x^{T} A^TAx - x^TA^Tb + b^Tb - \lambda^TBx + \lambda^Td \label{l-function-expanded}\]The next step is to differentiate \(f(x, \lambda)\) with respect to \(x\) and set it to zero:
\[f(x, \lambda) = A^TAx - A^Tb - B^T\lambda = 0 \label{dl-function}\]In the method of Lagrange multipliers, the constraint in Eq. \eqref{dl-function} is also a constraint with the \(g(x) = 0\) constraint for a extremizer (minimizer or maximizer). Our constraints are:
- \[A^TAx - A^Tb - B^T\lambda = 0 \label{list-constraint0}\]
- \[Bx = d \label{list-constraint1}\]
The Golub and Van Loan approach goes from there, to stating that we can combine these two constraints with the equation \(r=b-Ax\) to get a linear system,
\[\begin{bmatrix} 0 & A^T & B^T\\ A & I & 0\\ B & 0 & 0 \end{bmatrix} \begin{bmatrix} x\\ r\\ \lambda \end{bmatrix} = \begin{bmatrix} 0\\ b\\ d \end{bmatrix} \label{linear-sys1}\]An important note: Eq. \eqref{linear-sys1} is nonsingular if both \(A\) and \(B\) have full rank.
Annotation of constraint step to the linear system of equations step
If you’re happy with that jump from Eq.s \eqref{list-constraint0}-\eqref{list-constraint1} to Eq. \eqref{linear-sys1}, no need to read any further. It took me a minute, so constraints are:
\[A^TAx - A^Tb - B^T\lambda = 0 \label{con1}\] \[Bx = d \label{con2}\] \[r = b - Ax \label{con3}\]First, multiply the first constraint by -1 on both sides:
\[-A^TAx + A^Tb + B^T\lambda = 0 \label{con1-neg}\]Eq. \eqref{con1-neg} can be simplified using Eq. \eqref{con3}:
\[\begin{align} A^T(b - Ax) + B^T\lambda = 0 \\ A^Tr + B^T\lambda = 0 \label{line1} \end{align}\]The second line of the linear system in Eq \eqref{linear-sys1} comes from the \(r = b - Ax\), which can be rearranged to \(Ax + Ir = b\) (where \(I\) is an identity matrix of \(m_1 \times m_1\)).
The third line of Eq. \ref{linear-sys1} is the \(Bx = d\) constraint.
Implementation details
First, I’ll mark up the linear system of equations with the dimensions on the top and side:
\[\begin{array}{c c c c c} & \begin{array}{c c c c} n & m_1 & m_2 \\ \end{array} \\ \begin{array}{c} n \\ m_1\\ m_2 \end{array} & \left[ \begin{array}{c c c} 0 & A^T & B^T \\ A & I & 0 \\ B & 0 & 0 \end{array} \right] & \left[ \begin{array}{c} x \\ r\\ \lambda \end{array} \right] & = & \left[ \begin{array}{c} 0\\ b\\ d \end{array} \right] \end{array}\]Given this, the following C++ snippet gives an outline of the code to create the constraint matrix and solve (using Eigen, and Eigen’s dynamic submatrix access, described here).
MatrixXd A(m1, n);
MatrixXd b(m1, 1);
MatrixXd B(m2, n);
MatrixXd d(m2, 1);
A.setZero(); B.setZero();
// fill in A, b, B, d values from data
...
...
// construct the constraint matrix -- LHS is 'left-hand side'.
MatrixXd LHS(n + m1 + m2, n + m1 + m2);
LHS.setZero();
// RHS is 'right-hand side'
MatrixXd RHS(n + m1 + m2, 1); RHS.setZero();
LHS.block(0, n, n, m1) = A.transpose();
LHS.block(0, n + m1, n, m2) = B.transpose();
LHS.block(n, 0, m1, n) = A;
LHS.block(n, n, m1, m1) = MatrixXd::Identity(m1, m1);
LHS.block(n + m1, 0, m2, n) = B;
RHS.block(n, 0, m1, 1) = b;
RHS.block(n + m1, 0, m2, 1) = d;
JacobiSVD<MatrixXd> svd(LHS, ComputeThinU | ComputeThinV);
MatrixXd resultVector = svd.solve(RHS);
More reading.
[GVL13] Gene H. Golub and Charles F Van Loan. 2013. Matrix Computations. ISBN 9781421407944.
[NW06] Jorge Nocedal and Stephen Wright. 2006. Numerical Optimization. ISBN 9780387303031, section 17.3 covers Langrange multipliers.
[Wiki] Wikipedia: Lagrange multipliers.
Listing of Numerical Linear Algebra posts.