Linear Least Squares with Equality constraints, the Lagrangian method.

c-plus-plus eigen least-squares numerical-analysis optimization

This post is an annotated version a Linear Least Squares with Equality constraints (LSE for short) solution from Matrix Computations, Gene H. Golub and Charles F. Van Loan. 4th edition, 2013 ISBN 9781421407944.

I wasn’t able to find this solution on the web, so I decided to write a post where I’ll provide some extra detail and implementation snippets for the Golub and Van Loan solution in section 6.2.4, p. 316, “LSE solution using the augmented system,” a Lagrange multiplier method. (Note, that text provides other ways to solve this problem, but I am familiar with the method of Lagrange multipliers, so that’s where I started.)

Problem definition

First, the LSE problem is a constrained least squares problem in the following form:

\[\min_{Bx = d} \| Ax - b \|_2\]


  • \(A \in \mathbb{R}^{m_1 \times n_1}\), \(m_1 \geq n_1\),
  • \(B \in \mathbb{R}^{m_2 \times n_1}\), \(m_2 < n_1\),
  • \(b \in \mathbb{R}^{m_1}\),
  • \(d \in \mathbb{R}^{m_2}\).

The 6.2.4 solution to this problem in Golub and Van Loan uses Lagrange multipliers. Form the Lagrangian function,

\[f(x, \lambda) = \frac{1}{2} \| Ax - b \|^2_2 + \lambda^T(d - Bx) \label{l-function}\]

Note: from the method of Lagrange multipliers, the function to minimize is \(\| Ax - b \|_2\) and the constraint function \(g(x) = Bx - d\) or equivalently, \(g(x) = d - Bx\). \(g(x) = 0\) in the traditional setup using Lagrange multipliers.

Expanding Eq. \eqref{l-function}, we have

\[f(x, \lambda) = \frac{1}{2} x^{T} A^TAx - x^TA^Tb + b^Tb - \lambda^TBx + \lambda^Td \label{l-function-expanded}\]

The next step is to differentiate \(f(x, \lambda)\) with respect to \(x\) and set it to zero:

\[f(x, \lambda) = A^TAx - A^Tb - B^T\lambda = 0 \label{dl-function}\]

In the method of Lagrange multipliers, the constraint in Eq. \eqref{dl-function} is also a constraint with the \(g(x) = 0\) constraint for a extremizer (minimizer or maximizer). Our constraints are:

  • \[A^TAx - A^Tb - B^T\lambda = 0 \label{list-constraint0}\]
  • \[Bx = d \label{list-constraint1}\]

The Golub and Van Loan approach goes from there, to stating that we can combine these two constraints with the equation \(r=b-Ax\) to get a linear system,

\[\begin{bmatrix} 0 & A^T & B^T\\ A & I & 0\\ B & 0 & 0 \end{bmatrix} \begin{bmatrix} x\\ r\\ \lambda \end{bmatrix} = \begin{bmatrix} 0\\ b\\ d \end{bmatrix} \label{linear-sys1}\]

An important note: Eq. \eqref{linear-sys1} is nonsingular if both \(A\) and \(B\) have full rank.

Annotation of constraint step to the linear system of equations step

If you’re happy with that jump from Eq.s \eqref{list-constraint0}-\eqref{list-constraint1} to Eq. \eqref{linear-sys1}, no need to read any further. It took me a minute, so constraints are:

\[A^TAx - A^Tb - B^T\lambda = 0 \label{con1}\] \[Bx = d \label{con2}\] \[r = b - Ax \label{con3}\]

First, multiply the first constraint by -1 on both sides:

\[-A^TAx + A^Tb + B^T\lambda = 0 \label{con1-neg}\]

Eq. \eqref{con1-neg} can be simplified using Eq. \eqref{con3}:

\[\begin{align} A^T(b - Ax) + B^T\lambda = 0 \\ A^Tr + B^T\lambda = 0 \label{line1} \end{align}\]

The second line of the linear system in Eq \eqref{linear-sys1} comes from the \(r = b - Ax\), which can be rearranged to \(Ax + Ir = b\) (where \(I\) is an identity matrix of \(m_1 \times m_1\)).

The third line of Eq. \ref{linear-sys1} is the \(Bx = d\) constraint.

Implementation details

First, I’ll mark up the linear system of equations with the dimensions on the top and side:

\[\begin{array}{c c c c c} & \begin{array}{c c c c} n & m_1 & m_2 \\ \end{array} \\ \begin{array}{c} n \\ m_1\\ m_2 \end{array} & \left[ \begin{array}{c c c} 0 & A^T & B^T \\ A & I & 0 \\ B & 0 & 0 \end{array} \right] & \left[ \begin{array}{c} x \\ r\\ \lambda \end{array} \right] & = & \left[ \begin{array}{c} 0\\ b\\ d \end{array} \right] \end{array}\]

Given this, the following C++ snippet gives an outline of the code to create the constraint matrix and solve (using Eigen, and Eigen’s dynamic submatrix access, described here).

   MatrixXd A(m1, n);
   MatrixXd b(m1, 1);

   MatrixXd B(m2, n);
   MatrixXd d(m2, 1);

   A.setZero(); B.setZero();
   // fill in A, b, B, d values from data

   // construct the constraint matrix -- LHS is 'left-hand side'.
   MatrixXd LHS(n + m1 + m2, n + m1 + m2); 
   // RHS is 'right-hand side'
   MatrixXd RHS(n + m1 + m2, 1); RHS.setZero();
   LHS.block(0, n, n, m1) = A.transpose();
   LHS.block(0, n + m1, n, m2) = B.transpose();

   LHS.block(n, 0, m1, n) = A;

   LHS.block(n, n, m1, m1) = MatrixXd::Identity(m1, m1);

   LHS.block(n + m1, 0, m2, n) = B;

   RHS.block(n, 0, m1, 1) = b;

   RHS.block(n + m1, 0, m2, 1) = d;

   JacobiSVD<MatrixXd> svd(LHS, ComputeThinU | ComputeThinV);

   MatrixXd resultVector = svd.solve(RHS);

More reading.

[GVL13] Gene H. Golub and Charles F Van Loan. 2013. Matrix Computations. ISBN 9781421407944.

[NW06] Jorge Nocedal and Stephen Wright. 2006. Numerical Optimization. ISBN 9780387303031, section 17.3 covers Langrange multipliers.

[Wiki] Wikipedia: Lagrange multipliers.

Listing of Numerical Linear Algebra posts.

© Amy Tabb 2018 - 2023. All rights reserved. The contents of this site reflect my personal perspectives and not those of any other entity.