Gauss-Newton (BHHH)

The Gauss-Newton optimizer is an alternative to the standard FOCE/FOCEI approach that exploits the nonlinear-least-squares structure of the FOCE objective. It uses the BHHH (Berndt-Hall-Hall-Hausman) approximation to the Hessian and typically converges in 10-30 iterations, compared to 100+ evaluations for gradient-based methods like SLSQP.

This approach mirrors NONMEM's modified Gauss-Newton algorithm.

Variants

MethodKeyDescription
Pure Gauss-NewtongnFast convergence for well-conditioned problems
GN + FOCEI hybridgn_hybridRuns GN first, then polishes with FOCEI for robustness

Algorithm Overview

BHHH Hessian Approximation

The FOCE objective has the form:

\[ \text{OFV} = 2 \sum_{i=1}^{N} \text{NLL}_i \]

The BHHH approximation uses the outer product of per-subject gradients as an approximate Hessian:

\[ H_{\text{BHHH}} = 4 \sum_{i=1}^{N} g_i , g_i^T \]

where \( g_i = \nabla_x \text{NLL}_i \) is the gradient of the negative log-likelihood for subject \( i \) with respect to the packed parameter vector. Per-subject gradients are computed via central finite differences.

This approximation is equivalent to the Gauss-Newton Hessian for log-likelihood objectives and is guaranteed positive semi-definite.

Levenberg-Marquardt Damping

To improve stability away from the minimum, the Hessian is regularized:

\[ H_{\text{LM}} = H_{\text{BHHH}} + \lambda , \text{diag}(H_{\text{BHHH}}) \]

The damping factor \( \lambda \) is adapted during optimization:

  • On successful step (OFV decreases): \( \lambda \leftarrow 0.3 , \lambda \) (minimum \( 10^{-6} \))
  • On rejected step (OFV increases): \( \lambda \leftarrow 10 , \lambda \)
  • If \( \lambda > 10^6 \), the algorithm stops

The Newton step is obtained by solving:

\[ H_{\text{LM}} , \delta = -\nabla \text{OFV} \]

via Cholesky decomposition. If the Cholesky fails (singular Hessian), \( \lambda \) is increased and the iteration is retried.

Each step uses backtracking line search (up to 15 halvings). At each trial point, EBEs are re-estimated via the inner loop (warm-started from the previous EBEs), ensuring the FOCE objective is evaluated consistently.

Convergence

The algorithm converges when:

  • Relative OFV change \( < 10^{-6} \) for at least 3 iterations, or
  • Maximum iterations are reached

Hybrid Mode (GN + FOCEI)

When method = gn_hybrid, the algorithm runs in two phases:

  1. GN phase: Run Gauss-Newton iterations to quickly find the basin of the minimum
  2. FOCEI polish: Run up to 100 iterations of standard FOCEI optimization (via NLopt SLSQP) warm-started from the GN result

The final result is whichever phase produced the lower OFV. This combines the fast convergence of Gauss-Newton with the refined accuracy of FOCEI.

Configuration

[fit_options]
  method     = gn          # or gn_hybrid
  maxiter    = 100         # max GN iterations
  covariance = true

The initial Levenberg-Marquardt damping factor defaults to 0.01 and adapts automatically.

When to Use

Use gn when:

  • You want fast convergence and the model is well-conditioned
  • You are iterating on model development and need quick feedback

Use gn_hybrid when:

  • You want robustness against local minima
  • The model is complex (many parameters or random effects)
  • You want the speed of GN with a safety net

Stick with foce/focei when:

  • The model has convergence issues with GN (rare)
  • You need exact compatibility with standard FOCE/FOCEI results