SAEM
Stochastic Approximation Expectation-Maximization (SAEM) is an alternative estimation method that uses MCMC sampling for random effects instead of MAP optimization. It is more robust to local minima and can handle complex random effect structures.
Algorithm Overview
SAEM replaces the deterministic inner loop of FOCE with stochastic sampling, following the Monolix convention with a two-phase step-size schedule.
References
- Delyon, Lavielle, Moulines (1999). Convergence of a stochastic approximation version of the EM algorithm. Annals of Statistics, 94--128.
- Kuhn & Lavielle (2004). Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM: Probability and Statistics 8:115--131.
Two-Phase Schedule
Phase 1: Exploration (iterations 1 to K1)
Step size \( \gamma_k = 1 \). The algorithm explores the parameter space rapidly, with the sufficient statistics being fully replaced each iteration. This allows fast movement toward the basin of the MLE.
Default: 150 iterations.
Phase 2: Convergence (iterations K1+1 to K1+K2)
Step size \( \gamma_k = 1/(k - K_1) \). The algorithm performs a decreasing-weight average, which guarantees almost-sure convergence to the MLE under regularity conditions.
Default: 250 iterations.
Per-Iteration Steps
Each SAEM iteration consists of:
1. E-Step: Metropolis-Hastings Sampling
For each subject, run n_mh_steps Metropolis-Hastings iterations to sample from the conditional distribution of random effects:
\[ p(\eta_i | y_i, \theta, \Omega, \sigma) \]
Proposal: symmetric random walk in deviation space, \( \eta_{\text{prop}} = \eta_{\text{current}} + \delta_i \cdot L \cdot z \), with \( z \sim N(0, I) \) and \( L = \text{chol}(\Omega) \). The schedule is identical across both phases — only the SA step size \( \gamma_k \) changes between exploration and convergence.
The MH kernel is symmetric in \( \eta \), so the proposal density cancels and the acceptance log-ratio is the difference of individual_nll values, which encodes the prior \( N(0, \Omega) \) plus the observation likelihood.
Acceptance: Accept with probability \( \min(1, \exp(\text{NLL}{\text{current}} - \text{NLL}{\text{prop}})) \).
The MH sampling is parallelized across subjects using Rayon.
2. Stochastic Approximation Update
Update the sufficient statistic for \( \Omega \):
\[ S_2 \leftarrow (1 - \gamma_k) \cdot S_2 + \gamma_k \cdot \frac{1}{N} \sum_{i=1}^{N} \eta_i \eta_i^T \]
3. M-Step for Omega (Closed Form)
\[ \Omega_k = S_2 \]
with structurally-zero entries (cross-block off-diagonals, standalone-vs-block off-diagonals, and all off-diagonals in a fully-diagonal Ω) zeroed out — the SA accumulator \( (1/N) \sum_i \eta_i \eta_i^T \) is dense by construction, but the model declares which entries are free parameters. Without this projection the chain feeds spurious sampling correlations into the next iteration's MH proposal Cholesky and Ω drifts toward rank-deficiency.
4. M-Step for Theta and Sigma (Optimization)
Minimize the conditional observation negative log-likelihood with ETAs held fixed:
\[ \sum_{i=1}^{N} \sum_{j=1}^{n_i} \left[ \frac{1}{2} \log V_{ij} + \frac{1}{2} \frac{(y_{ij} - f_{ij})^2}{V_{ij}} \right] \]
When mu_referencing = true (the default), ferx detects lognormal parameters from the [individual_parameters] block and applies the closed-form EM update for those thetas instead of running NLopt:
\[ \log \theta_j \leftarrow \log \theta_j + \gamma_k \cdot \overline{\eta_j} \]
where \( \overline{\eta_j} = (1/N) \sum_i \eta_{i,j} \) is the empirical mean of the post-MH random effects for the eta paired with \( \theta_j \). For a log-mu-referenced model where \( \log P_i = \log \theta + \eta_i \) with \( \eta_i \sim N(0, \omega^2) \), this is exactly the M-step that maximises the complete-data log-likelihood, scaled by the SA step size \( \gamma_k \). After the update the etas are re-centred by the same shift so they remain deviations from the new \( \log \theta_j \).
NLopt still runs for any remaining thetas (non-mu-referenced) and for sigma — the closed-form-updated thetas are pinned at their new values for the NLopt call.
When mu_referencing = false, the full NLopt M-step runs for all thetas as before.
The number of NLopt evaluations saved is stored in FitResult::saem_mu_ref_m_step_evals_saved, accumulated across SAEM iterations as 2 × mstep_maxiter × n_mu_ref_pairs per outer step (one finite-difference probe pair per pinned mu-ref dimension, capped at mstep_maxiter NLopt gradient requests). The field is None when mu-referencing is off or method ≠ SAEM.
5. Adaptive MH Step Sizes
Every adapt_interval iterations, the per-subject step sizes \( \delta_i \) are adjusted:
- If acceptance rate > 40%: increase \( \delta_i \) by 10% (up to 5.0)
- If acceptance rate < 40%: decrease \( \delta_i \) by 10% (down to 0.01)
This targets an acceptance rate around 40%, balancing exploration and mixing.
Post-SAEM Finalization
After the SAEM iterations complete:
- EBE Refinement: Run the standard FOCE inner loop (BFGS optimization) warm-started from the SAEM ETAs to obtain final empirical Bayes estimates
- FOCE OFV: Compute the objective function using the FOCE/Laplace approximation, so AIC and BIC are directly comparable with FOCE results
- Covariance Step: Optionally compute standard errors via finite-difference Hessian (same method as FOCE)
- Diagnostics: Compute PRED, IPRED, CWRES, IWRES for each subject
Configuration
[fit_options]
method = saem
n_exploration = 150 # Phase 1 iterations
n_convergence = 250 # Phase 2 iterations
n_mh_steps = 3 # MH steps per subject per iteration
adapt_interval = 50 # Step-size adaptation frequency
seed = 12345 # RNG seed for reproducibility
covariance = true # Compute standard errors
Tuning Guide
Not Converging
- Increase
n_exploration(e.g., 300) to give more time for basin finding - Increase
n_convergence(e.g., 500) for a longer averaging window - Increase
n_mh_steps(e.g., 5-10) for better mixing in the E-step
Slow Convergence
- Decrease
n_explorationandn_convergenceif parameters stabilize early - Use
adapt_interval = 25for faster step-size adaptation
Reproducibility
- Always set
seedfor reproducible results - Different seeds will produce slightly different estimates due to the stochastic nature of the algorithm
Output
The SAEM iteration progress is printed to stderr:
SAEM: 10 subjects, 3 ETAs, 400 total iter (150 explore + 250 converge)
SAEM iter 1/400 [explore] gamma=1.000 condNLL=95.244
SAEM iter 50/400 [explore] gamma=1.000 condNLL=56.705
SAEM iter 150/400 [explore] gamma=1.000 condNLL=46.071
SAEM iter 200/400 [converge] gamma=0.020 condNLL=36.799
SAEM iter 400/400 [converge] gamma=0.004 condNLL=38.096
SAEM iterations complete. Computing final EBEs and OFV...
The condNLL is the conditional observation negative log-likelihood (not the final OFV). It should generally decrease during the exploration phase and stabilize during convergence.