Joel T. Kaardal, Ph.D.
Machine learning research engineer
Introduction

The augmented Lagrangian method solves constrained minimization problems of the form

$$ \begin{align} \small \min_{\mathbf{x}} f(\mathbf{x}) \small \ \text{ subject to } \ & \small \mathbf{g}(\mathbf{x}) \geq \mathbf{0} \text{, } \ \mathbf{h}(\mathbf{x}) = \mathbf{0} \\ & \small \mathbf{x}_\mathrm{lb} \leq \mathbf{x} \leq \mathbf{x}_\mathrm{ub} \end{align} $$

where \( \mathbf{x} \) is a weight vector, \( f(\mathbf{x}) \) is a scalar objective function, \( \mathbf{g} \) is a vector of inequality constraints, \( \mathbf{h} \) is a vector of equality constraints, and \( \mathbf{x}_\mathrm{lb} \) and \( \mathbf{x}_\mathrm{ub} \) are lower and upper bounds on \( \mathbf{x} \) (box constraints), respectively. The purpose of the augmented Lagrangian method is to improve on the simpler penalty method by using the method of Lagrange multipliers to find a feasible local optimum for finite penalty. The algorithm minimizes the augmented Lagrangian for a given strength of the penalty function and then, if the Karush-Kuhn-Tucker (KKT) conditions are not satisfied, the strength of the penalty function is increased until the KKT conditions are satisfied.

The main advantage of this algorithm over the interior-point method is that it uses a first-order method (projected gradient descent) to solve for a local minimum. The disadvantages are that the augmented Lagrangian is considerably more sensitive to its hyperparameter settings (e.g. the rate at which the strength of the penalty function increases) and does not certify that a stationary point is a local minimum. The latter makes the augmented Lagrangian algorithm best suited to convex optimization problems.

Example

Solve the following constrained maximization problem,

$$ \begin{align} \small \max_{x,y} f(x,y) = & \small \max_{x,y} \left(x + y\right) \\ & \small \text{ subject to } \ x^2 + y^2 = 1 \end{align} $$

which has a global maximum at \(x = \frac{\sqrt{2}}{2} \) and \(y = \frac{\sqrt{2}}{2} \).

The first thing to note here is that this problem is a maximization problem while the augmented Lagrangian is designed to solve minimization problems. This is easily rectified by negation; e.g. \( f(x,y) \leftarrow -f(x,y) \). The equality constraint also needs to be transformed into cannonical form by subtracting one from both sides such that \( h(x,y) = x^2 + y^2 - 1 = 0 \). Next, we need to determine the gradient of the (negated) objective function and the Jacobian of the equality constraint which are

$$ \small \boldsymbol{\nabla} f(x,y) = \left[\begin{array}{c} -1 \\ -1 \end{array}\right], \quad \boldsymbol{\nabla} h(x,y) = \left[\begin{array}{c} 2x \\ 2y \end{array}\right]. $$

The problem statement is passed in as a struct to the solver. In this example, the problem struct looks like this:

% problem struct
problem = struct( ...
    'f',   @(x)(-sum(x)), ...     % objective function
    'df',  @(x)([-1; -1]), ...    % gradient of objective function
    'ce',  @(x)(sum(x.^2)-1), ... % equality constraint
    'dce', @(x)(2*x), ...         % Jacobian of equality constraints
);

where @(x)() are anonymous functions. The problem weights, \( x \) and \( y \), are initialized to uniform random numbers on the range [0, 1] and the Lagrange multiplier is initialized to zero before passing everything to the solver function:

% initialization of weights and the Lagrange multiplier
x0 = rand(2, 1);
lambda0 = 0.0;

% find a feasible stationary point (hopefully a minimum)
[x, fval, lambda, kkt] = almSolve(problem, x0, lambda0);

which returns the weights at the solution (x), the objective function value at the solution (fval), the Lagrange multiplier at the solution (lambda), and the KKT conditions evaluated at the solution (kkt). Using the default hyperparameters, a feasible stationary point was found to \( 1.0 \cdot 10^{-4} \) precision in the norm of the KKT conditions after three iterations at \( x = 0.7071 \) and \( y = 0.7071 \) which is approximately equal to the feasible global minimum of the problem. This indicates that the algorithm has succeeded.