Environmental linkages and priors: a formula-driven API

Overview

Environmental drivers (temperature, prey density, climate indices) enter stock assessments through several distinct process modules: recruitment, natural mortality, growth, catchability, selectivity. Rceattle exposes a long-format linkage table where each row corresponds to exactly one estimated coefficient that ties an environmental term (or any user-specified design column) to a process parameter. Users describe the linkages they want with linkage_spec() – a formula-driven helper – and pass the result into the relevant build_*() function. Internally, fit_mod() pools every spec into a single data.frame with one row per beta, paired with a shared design matrix X. The TMB template iterates the table once, accumulates per-process offsets, and adds them to the underlying linear predictors.

This vignette covers the user-facing API, the formula syntax it supports, prior and bound options, and the per-species variations. The linkage path is wired for growth (von Bertalanffy and Richards), natural mortality (log_M1), and recruitment (log_R0, log_alpha, log_beta).

A first example: temperature on `K`

Suppose we want sex- and species-specific von Bertalanffy growth, with the von Bertalanffy K parameter responding to a temperature index by species:

library(Rceattle)
data(whamGrowthData)
plot_data(whamGrowthData)

# Use logistic selectivity for all fleets in this example.
whamGrowthData$fleet_control$Selectivity <- "Logistic"
whamGrowthData$fleet_control$Selectivity_dimension <- "Length"

# 1. Attach an env_data table to the data list. Must include `Year`
#    plus whatever covariates show up in your formulas. Here `temp` is
#    a placeholder; in real use it would be a standardised SST series,
#    PDO index, etc.
yrs <- whamGrowthData$styr:whamGrowthData$endyr
whamGrowthData$env_data <- data.frame(
  Year = yrs,
  temp = rnorm(length(yrs), 0, 0.1)
)

# 2. Build a growth specification with one linkage per parameter that
#    you want to vary. linkage_spec() captures (formula, by, link,
#    init, bounds, priors) for one process parameter.
growth_spec <- build_growth(
  fun = "vonBertalanffy",
  linkages = list(
    log_K = linkage_spec(
      formula = ~ temp,             # 2 columns: (Intercept) and `temp`
      by      = ~ species + sex,    # one beta per (species, sex, column)
      priors  = list(temp = normal(0, 0.1)) # Tight prior
    )
  )
)

# 3. Fit as usual.
fit <- fit_mod(
  data_list   = whamGrowthData,
  growthFun   = growth_spec,
  estimateMode = 0, 
  msmMode     = 0,
  fit_control = fit_control(verbose = 0)
)
summary(fit)

The `linkage_spec()` API

linkage_spec(
  formula,
  param     = NULL,
  by        = ~ species,
  species   = NULL,
  sex       = NULL,
  link      = c("identity", "log", "logit"),
  init      = NULL,
  bounds    = NULL,
  priors    = NULL,
  re_group  = NA_character_,
  est_phase = 1L
)

formula – a one-sided R formula that describes the linear predictor for the target parameter. Anything model.matrix() understands works: ~ 1, ~ temp, ~ temp + PDO, ~ temp:PDO, ~ poly(temp, 2), factor predictors, etc.
param – the target parameter name (e.g. "log_K"). When the spec is registered inside a build_*() linkages list keyed by parameter name, param is filled in automatically from the list key.
by – a one-sided formula naming the stratifying factors that should each get their own coefficients. Allowed names are species, sex, and age_bin. The default is ~species (one coefficient set per species). Pass ~species + sex for per-(species, sex) coefficients, ~species + age_bin for an age-binned M response, or NULL to share a single coefficient set across every species/sex.
species – optional integer vector of 1-based species ids this spec applies to. NULL (default) means every species in the model. Use this to register different formulas for different species under the same target parameter – see Per-species formulas below.
sex – optional vector of sex ids this spec applies to. May be supplied as integers (1L = female, 2L = male) or as the strings "Females"/"Males" (case-insensitive; "female", "male", "f", "m" also accepted). NULL (default) means every sex in the model. Only meaningful when by includes sex; otherwise the filter is a no-op. Use this to register different formulas or priors per sex under the same target parameter – see Per-sex formulas below.
link – the link function applied to the predictor (NOT YET IMPLEMENTED)
init, bounds – optional named lists keyed by design-matrix column name. init = list(temp = 0.05) sets the temp slope’s initial value to 0.05; bounds = list(temp = c(-2, 2)) constrains it during optimisation.
priors – a named list of prior objects keyed by design-matrix column. Each entry is either a single Rceattle_prior (applied to every species/sex row that uses that column) or a named list of priors keyed by species id (one prior per species). Inside linkage_spec() (and only there) the unprefixed short-names normal(), lognormal(), gamma(), and beta() resolve to the corresponding prior_* constructors via a private data mask – see Priors below.
re_group – random-effects grouping label (currently a placeholder).
est_phase – estimation phase ordinal. 0 fixes the coefficient at its initial value; 1+ makes it estimable. Phased fitting through fit_control(phase = TRUE) honors the values you pick.

Priors

Five families are available:

prior_normal(mean, sd)
prior_lognormal(meanlog, sdlog)
prior_gamma(shape, rate)
prior_beta(shape1, shape2)
# Family "none" is implicit -- no prior contribution.

Inside linkage_spec(priors = ...) the constructors are available without the prior_ prefix so user code stays close to mathematical notation:

linkage_spec(
  formula = ~ temp + PDO,
  by      = ~ species,
  priors  = list(
    temp = normal(0, 1),     # informative prior, ~95% in (-2, 2)
    PDO  = normal(0, 5)      # weak prior
  )
)

The unprefixed forms are scoped to the priors = ... argument and do not mask base::gamma() or base::beta() at the package level. Outside that argument you can still call the math functions directly:

gamma(5)       # 24, the math gamma function
beta(2, 3)     # 1/12, the math beta function

linkage_spec(
  formula = ~ temp,
  priors  = list(temp = gamma(2, 3))   # interpreted as prior_gamma(shape=2, rate=3)
)

For programmatic prior assembly (e.g. species-specific priors built in a loop) call the prior_* constructors directly:

priors <- lapply(species_specific_sds, function(s) prior_normal(0, s))
names(priors) <- c("temp", "PDO", "salinity")
linkage_spec(formula = ~ temp + PDO + salinity, priors = priors)

Each prior’s negative log-density contributes to slot 20 of the joint NLL (fit$quantities$jnll_comp[20, ] in R):

fit$quantities$jnll_comp["Linkage-table priors", ]

Putting a prior on the parameter itself

Bayesian-style informative priors can be applied directly to the base growth parameters (log_k, log_L1, log_Linf, log_m), natural mortality (log_M1), or recruitment parameters (log_R0, log_alpha, log_beta).

A prior attached to an (Intercept) row is re-targeted to the base parameter. The same syntax is used for any other column.

# Example: prior on log_K for von Bertalanffy growth
growth_spec <- build_growth(
  fun = "vonBertalanffy",
  linkages = list(
    log_K = linkage_spec(
      formula = ~ 1,                              # intercept only -- no env effects
      by      = ~ species + sex,
      init    = list(`(Intercept)` = log(0.3)),   # start at K = 0.3
      priors  = list(`(Intercept)` = normal(log(0.3), 0.1))
    )
  )
)

The same pattern works for any linkage target – log_Linf, log_L1, log_m (Richards), log_M1, log_R0, log_alpha, log_beta. You can also key the prior by species id for species- specific priors:

rec_spec <- build_srr(
  srr_fun = 0,
  linkages = list(
    log_R0 = linkage_spec(
      formula = ~ 1,
      by      = ~ species,
      init    = list(`(Intercept)` = log(1000)),
      priors  = list(`(Intercept)` = list(
        `1` = normal(log(1000), 0.2),
        `2` = normal(log(1500), 0.3),
        `3` = normal(log(800),  0.25)
      ))
    )
  )
)

The prior contribution appears in slot 20 of the joint NLL:

fit$quantities$jnll_comp["Linkage-table priors", ]

You can inspect the linked parameter values via the linkage table, the estimated slope coefficients, and the base-parameter array:

fit$data_list$linkage_table[fit$data_list$linkage_table$param == "log_K", ]
fit$estimated_params$beta_linkage          # (Intercept) rows are 0; slopes are estimated
fit$estimated_params$log_growth_pars       # base log_K / log_L1 / log_Linf / log_m

The same approach works for log_M1 when you want a prior on natural mortality itself:

data("GOApollock") # Using GOA pollock as an example

# One-species model, normal(log(0.4), 0.1) prior on log_M1
# applied via the linkage intercept. The structural M1_model is
# kept at "sex_age_invariant" (= 1).
M1_spec <- build_M1(
  M1_model = "sex_age_invariant",
  linkages = list(
    log_M1 = linkage_spec(
      formula = ~ 1,
      by      = ~ species,
      init    = list(`(Intercept)` = log(0.34)),
      priors  = list(`(Intercept)` = normal(log(0.34), 0.01))
    )
  )
)

# Fit the model - no manual mapping needed
fit <- fit_mod(
  data_list = GOApollock,
  M1Fun     = M1_spec,
  estimateMode = 0,
  fit_control = fit_control(phase = TRUE)
)

The linkage prior is applied verbatim, with no lognormal bias-correction term.

Species- and (species, sex)-specific priors

Each entry under priors = list(...) may be:

a single prior – applies to every species/sex row that uses that design column,
a named list of priors keyed by species id – one prior per specified species, or
a named list of priors keyed by species id and sex – one prior per specified species and sex, or
a two-level nested list keyed first by specified species, then by specified sex id.

# Species-only keying
linkage_spec(
  formula = ~ temp,
  by      = ~ species,
  priors  = list(
    temp = list(
      `1` = normal(0, 0.1),    # tight prior for species 1
      #`2` = normal(0, 0.5),   # No prior species 2
      `3` = normal(0, 1.0)     # weakest for species 3
      # any species not listed gets no prior
    )
  )
)

For sexually dimorphic stocks, the same temp column can take different priors per (species, sex). At each level the value can be either an Rceattle_prior (applies to all sub-strata) or a list that drills down further:

linkage_spec(
  formula = ~ temp,
  by      = ~ species + sex,
  priors  = list(
    temp = list(
      # Species 1: different tightness per sex.
      `1` = list(`1` = normal(0, 0.1),    # sp 1, sex 1 (e.g. females)
                 `2` = normal(0, 0.2)),   # sp 1, sex 2 (e.g. males)
      # Species 2: scalar applies to every sex.
      `2` = normal(0, 0.5)
      # Any (species, sex) combination not listed gets no prior.
    )
  )
)

The same syntax works for any prior family. This is the recommended pattern when a covariate has different prior strength across stocks (e.g. when one species has prior literature on its temp sensitivity but another doesn’t), or when within a species the sexes are expected to respond differently. Built programmatically, it’s just:

sd_by_species <- c(`1` = 0.1, `2` = 0.5, `3` = 1.0)
priors_per_sp <- lapply(sd_by_species, function(s) prior_normal(0, s))

linkage_spec(
  formula = ~ temp,
  by      = ~ species,
  priors  = list(temp = priors_per_sp)
)

Per-species formulas

If species have qualitatively different env relationships – e.g. species 1 responds only to temperature, species 2 also to PDO – register multiple specs against the same parameter and use the species argument on each to scope it to a subset of stocks:

build_growth(
  fun = "vonBertalanffy",
  linkages = list(
    log_K = list(
      linkage_spec(
        formula = ~ temp,
        by      = ~ species,
        species = 1L,                                    # only sp 1
        priors  = list(temp = normal(0, 0.5))
      ),
      linkage_spec(
        formula = ~ temp + PDO,
        by      = ~ species,
        species = 2L,                                    # only sp 2
        priors  = list(temp = normal(0, 1),
                       PDO  = normal(0, 1))
      )
    )
  )
)

Linkages can accepts both forms under each parameter key:

linkages = list(log_K = single_spec)             # one formula for all sp
linkages = list(log_K = list(spec_a, spec_b))    # different formulas per sp

Inspecting the result, fit$linkages$log_K is either an Rceattle_linkage_spec or a list of them, and fit$data_list$linkage_table shows the materialized rows with species and X_col.

This pattern composes with species-specific priors – each spec carries its own priors, and within a spec the priors can themselves be species-keyed if you want even finer control:

linkage_spec(
  formula = ~ temp,
  by      = ~ species + sex,
  species = c(1L, 2L),
  priors  = list(
    temp = list(
      `1` = normal(0, 0.3),
      `2` = normal(0, 0.7)
    )
  )
)

Per-sex formulas

The sex argument on linkage_spec() mirrors species: it scopes a spec to a subset of sex ids. Combined with the multi-spec list form, this lets you register separate specs per sex against the same parameter – typical use cases are sex-dimorphic M priors or distinct env responses for females vs. males:

# Female and male M1 each get their own intercept-only prior.
M1_spec <- build_M1(
  M1_model = "sex_specific",
  linkages = list(
    log_M1 = list(
      linkage_spec(
        formula = ~ 1,
        by      = ~ species + sex,
        sex     = "Females",                              # or sex = 1L
        init    = list(`(Intercept)` = log(0.12)),
        priors  = list(`(Intercept)` = normal(log(0.12), 0.0001))
      ),
      linkage_spec(
        formula = ~ 1,
        by      = ~ species + sex,
        sex     = "Males",                                # or sex = 2L
        init    = list(`(Intercept)` = log(0.20)),
        priors  = list(`(Intercept)` = normal(log(0.20), 0.0001))
      )
    )
  )
)

The sex filter is a no-op when by does not include sex. If you want different formulas per sex (rather than just different priors or inits), use the multi-spec form – one spec with sex = 1L and its own RHS, another with sex = 2L and a different RHS.

If you only need different priors per (species, sex) for a single shared formula, the nested-priors form (see Species- and (species, sex)-specific priors above) is simpler – one spec, one set of design columns, prior values keyed by species then sex. Reach for the per-sex multi-spec form when you need different RHS terms, different init/bounds, or different est_phase between sexes.

Polynomial and basis-expansion formulas

The formula passed to linkage_spec() is fed straight to stats::model.matrix(), so any expression model.matrix() understands works – polynomial bases, splines, interactions, and arithmetic transformations.

Polynomials. Use poly() for orthogonal polynomial bases (the default; numerically well-conditioned), or poly(..., raw = TRUE) for raw monomials. Each basis order shows up as a separate design column with its own coefficient:

# 4th-order orthogonal polynomial in temp, by species.
linkage_spec(
  formula = ~ poly(temp, 4),
  by      = ~ species
)
# design columns produced:
#   (Intercept), poly(temp, 4)1, poly(temp, 4)2,
#   poly(temp, 4)3, poly(temp, 4)4

poly() orthogonalises the basis against the rows of env_data, so the columns are zero-mean over the model years and the linear slope, quadratic curvature, etc. are estimated independently – generally the right choice if you don’t have a specific scientific reason to prefer raw monomials.

Arbitrary arithmetic. Use I() to insert a literal expression (otherwise temp^2 is interpreted as a formula crossing operator and reduces to temp):

# Linear + raw quadratic in temp.
linkage_spec(
  formula = ~ temp + I(temp^2),
  by      = ~ species,
  priors  = list(`I(temp^2)` = normal(0, 0.1))   # tight prior on curvature
)

The design column name is the deparsed call (I(temp^2)), so when you key init, bounds, or priors against it you have to use the same string – backticks let you write it as a name.

Splines. Natural cubic splines via splines::ns() work out-of-the-box and are often a better choice than high-order polynomials when you want flexibility without strong tail behaviour:

linkage_spec(
  formula = ~ splines::ns(temp, df = 4),
  by      = ~ species
)

Interactions. temp:PDO adds a single product column; temp * PDO is shorthand for temp + PDO + temp:PDO:

linkage_spec(
  formula = ~ temp * PDO,
  by      = ~ species,
  priors  = list(`temp:PDO` = normal(0, 0.5))
)

Mixing forms across species. Combined with the per-species formula machinery, you can give one species a polynomial response and another a linear one without duplicating columns:

build_growth(
  fun = "vonBertalanffy",
  linkages = list(
    log_K = list(
      linkage_spec(formula = ~ poly(temp, 3),
                   by = ~ species, species = 1L),
      linkage_spec(formula = ~ temp,
                   by = ~ species, species = 2L)
    )
  )
)

A practical caution: same-named columns must be numerically identical across specs that use them. With deterministic formulas evaluated against the same env_data this is automatic; it can only fail if you’ve built specs against different env_data frames.

What happens internally

When you call fit_mod() with linkages:

build_growth(fun, linkages) returns a list whose linkages element is a named list of Rceattle_linkage_spec objects (or, for per-species formulas, lists of them).
fit_mod() stashes growth_fun / growth_linkages and M1_linkages on data_list, runs data_check(), then calls pool_linkages(spec_groups = list(growth = ..., M = ...)).
pool_linkages() materialises every spec against data_list$env_data and the species/sex/age strata, unions the per-spec design columns by name into a single shared design matrix X, and remaps each row’s local X_col to the global column index. The resulting Rceattle_linkage_table and linkage_X matrix are stored on data_list.
encode_linkage_for_tmb() turns the table into the parallel integer/numeric vectors that the TMB template consumes (linkage_process, linkage_param, linkage_species, linkage_sex, linkage_age_bin, linkage_X_col, linkage_link, linkage_prior_family, linkage_prior_p1, linkage_prior_p2).
The TMB template iterates the linkage rows once and builds per-process offset tensors:
- growth_linkage_offset of shape [nspp, max_sex, n_growth_params, nyrs] via rceattle_apply_linkages() in src/TMB/linkage.hpp, added to growth_parameters on the log scale.
- M_linkage_offset of shape [nspp, max_sex, max_age, nyrs] via rceattle_apply_M_linkages(), added to ln_M1 inside the M1_at_age compute.
- recruitment_linkage_offset of shape [nspp, n_rec_params, nyrs] via rceattle_apply_recruitment_linkages(), multiplied against (R0, exp(rec_pars(*, alpha)), exp(rec_pars(*, beta))) each year at every calculate_recruitment() call site (hindcast, BRPs, dynamic BRPs, projections, expected R). Year 0 is treated like any other year – the (Intercept) row is held at zero, so the offset is purely the slope contribution.
All three accumulators read from the same encoded table and the same beta_linkage parameter vector; rows dispatch by their process column. With no linkages the offsets stay at zero so existing fits are identical-by-construction.
Priors and bounds flow through the same table: build_bounds() reads each row’s lower / upper into bounds$lower$beta_linkage / bounds$upper$beta_linkage, and the cpp adds -d<family>(beta, p1, p2, log = TRUE) to slot 19 of the joint NLL for every row whose prior_family is not "none".

Natural mortality

build_M1() supports the same linkages argument. The valid parameter key is log_M1; the offset enters additively (on the log scale) inside the M1_at_age compute, so a coefficient of 0.3 on a normalised temperature index multiplies M1 by exp(0.3 * temp[yr]) each year.

Why log_M1 and not log_M? CEATTLE decomposes total natural mortality into M = M1 + M2. M1 is residual / non-predation mortality and is a parameter (ln_M1 in the cpp, with optional random-effect deviates and a direct prior via M_prior / M_prior_sd). M2 is predation mortality and is a derived quantity each iteration – a function of predator abundance, suitability, and predator ration. There is no scalar ln_M2 parameter to attach a linkage coefficient to. Environmental effects on predation propagate naturally via upstream parameters: a temp effect on predator growth (log_K), a temp effect on recruitment (forthcoming), or env-driven inputs to the bioenergetics ration calculation. The linkage system therefore exposes log_M1 as the single M-side target, with M2 covered by upstream propagation.

yrs <- GOApollock$styr:GOApollock$endyr
GOApollock$env_data <- data.frame(
  Year = yrs,
  temp = rnorm(length(yrs), 0, 0.1)
)

M1_spec <- build_M1(
  M1_model = "sex_age_invariant",            # or M1_model = 1 (back-compat)
  linkages = list(
    log_M1 = linkage_spec(
      formula = ~ temp,
      by      = ~ species,
      priors  = list(temp = normal(0, 0.1))
    )
  )
)

fit <- fit_mod(
  data_list   = GOApollock,
  M1Fun       = M1_spec,
  estimateMode = 0,
  msmMode     = 0
)

The structural M choices (M1_model) and the random-effects choices (M1_re) accept either integer codes or readable strings interchangeably:

build_M1(M1_model = "sex_age_specific", M1_re = "ar1_age")   # = 3, 4
build_M1(M1_model = 3,                  M1_re = 4)            # equivalent

linkages on M is age-aware: by default a row’s age_bin is NA and the offset broadcasts across every age. To pin an env effect to a specific age slice, set by = ~ species + age_bin and supply strata$age_bin when calling materialize directly (fit_mod() provides this automatically). For most operational uses the default broadcast is what you want.

The new linkage path coexists with the legacy M1_indices = c(...) column-index argument for M1_model = 4 / 5: both add to ln_M1 on the log scale and either may be used. New code should prefer linkages – it composes naturally with priors, bounds, and per-species formulas.

Inspecting the result:

fit$data_list$linkage_table         # rows have process == "M"
fit$quantities$M_linkage_offset     # [nspp, max_sex, max_age, nyrs]
fit$quantities$jnll_comp["Linkage-table priors", ]

Growth and M linkages share the same table – one fit can carry linkage rows for both processes, and the underlying coefficient vector beta_linkage indexes both.

Recruitment

build_srr() accepts the same linkages argument with three allowed parameter keys:

log_R0 – offset on log(R0). Meaningful for any srr_fun; the offset is added to the equilibrium / mean recruitment on the log scale, including the first model year. This is the right target if you want “recruitment responds to temp” and aren’t fitting an SRR.
log_alpha – offset on the alpha parameter (productivity) of a Beverton-Holt or Ricker SRR. Only does work when srr_fun %in% c(2, 3, 4, 5) – the SRRs that consume alpha.
log_beta – offset on the beta parameter (density-dependence) of BH or Ricker. Same applicability as log_alpha.

# Mean-recruitment with a temperature linkage on log_R0.
rec_spec <- build_srr(
  srr_fun  = "mean",                       # = 0
  linkages = list(
    log_R0 = linkage_spec(
      formula = ~ temp,
      priors  = list(temp = normal(0, 0.5))
    )
  )
)

# Beverton-Holt with env effects on alpha (and species-keyed
# priors so each stock has its own sensitivity).
rec_spec_bh <- build_srr(
  srr_fun  = "BevertonHolt",               # = 2
  linkages = list(
    log_alpha = linkage_spec(
      formula = ~ temp,
      priors  = list(
        temp = list(`1` = normal(0, 0.3),
                    `2` = normal(0, 0.7))
      )
    )
  )
)

fit <- fit_mod(
  data_list   = data_list,
  recFun      = rec_spec,
  estimateMode = 0,
  msmMode     = 0
)

The structural choice (srr_fun) accepts either integer codes or readable strings interchangeably:

build_srr(srr_fun = "mean")           # = 0
build_srr(srr_fun = "BevertonHolt")   # = 2
build_srr(srr_fun = "Ricker")         # = 4

Inspecting the result:

fit$data_list$linkage_table             # rows have process == "recruitment"
fit$quantities$recruitment_linkage_offset  # [nspp, n_rec_params, nyrs]
                                           # n_rec_params = 3 (R0, alpha, beta)

Practical note: a linkage row’s age_bin is ignored for recruitment (recruitment is at age 0 by definition). The sex_ratio data input partitions recruits across sexes downstream.

A multi-process example

The same pattern composes when more than one parameter is linked:

growth_spec <- build_growth(
  fun = "Richards",
  linkages = list(
    log_K    = linkage_spec(
      formula = ~ temp,
      by      = ~ species + sex,
      priors  = list(temp = normal(0, 1))
    ),
    log_Linf = linkage_spec(
      formula = ~ temp + PDO,
      by      = ~ species,
      priors  = list(temp = normal(0, 0.5),
                     PDO  = normal(0, 0.5))
    ),
    log_m    = linkage_spec(
      formula = ~ 1,             # intercept only -- no env effect
      by      = ~ species
    )
  )
)

The pooler will:

compute one shared X = model.matrix(~ temp + PDO, env_data) of three columns,
emit 4 (sp × sex) + 3 (Linf cols) × 3 (sp) + 3 (m intercepts) = 16 rows in the linkage table,
match same-named columns ((Intercept), temp) across specs so there’s no duplication in X.

Per-process logic (vB has no m; only Richards does) is enforced at build time:

build_growth(
  fun = "vonBertalanffy",
  linkages = list(log_m = linkage_spec(~ temp))
)
#> Error: linkages$log_m is only valid with fun = 'Richards'; ...

Growth, M, and recruitment can all be linked in the same fit – each build_*() output carries its own linkages list and fit_mod() pools them against a single shared design matrix:

fit <- fit_mod(
  data_list   = data_list,
  growthFun   = build_growth(
    fun      = "vonBertalanffy",
    linkages = list(
      log_K = linkage_spec(formula = ~ temp)
    )
  ),
  M1Fun       = build_M1(
    M1_model = "sex_age_invariant",
    linkages = list(
      log_M1 = linkage_spec(formula = ~ temp)
    )
  ),
  recFun      = build_srr(
    srr_fun  = "mean",
    linkages = list(
      log_R0 = linkage_spec(formula = ~ temp)
    )
  ),
  estimateMode = 0,
  msmMode     = 0
)

# Linkage table contains rows for all three processes:
table(fit$data_list$linkage_table$process)
#>      growth           M  recruitment
#>           4           4            4

Inspecting and debugging

A few diagnostics worth knowing:

# The materialised table -- one row per estimated coefficient.
fit$data_list$linkage_table

# The shared design matrix -- one column per unique RHS term.
fit$data_list$linkage_X

# Per-(species, sex, growth_param, year) offsets actually applied to
# growth_parameters this iteration. Useful for verifying the linkage
# fired as you intended.
str(fit$quantities$growth_linkage_offset)
str(fit$quantities$M_linkage_offset)
str(fit$quantities$recruitment_linkage_offset)

# Per-row coefficient values in the same order as `linkage_table`.
fit$estimated_params$beta_linkage

# The prior NLL contribution from slot 19 of jnll_comp.
fit$quantities$jnll_comp["Linkage-table priors", ]

Behind the scens: Automatic base-parameter handling

Any pooled linkage table is stored on the fitted model:

fit$data_list$linkage_table

For three species × two sexes × ~ temp (intercept + slope) you get 12 rows: an (Intercept) beta and a temp-slope beta per (species, sex) combination. The (Intercept) beta is fixed at 0 (mapped out) and the base parameter (log_K, log_M1, log_R0, …) carries the level. The slope rows give the year-by-year offset that multiplies the base. See Automatic base-parameter handling below.

The shared design matrix is stored alongside:

head(fit$data_list$linkage_X)

The base parameter (log_R0, log_M1, log_K, …) and the linkage table cooperate cleanly: any priors or initial values are applied to the base parameter, and the linkage rows hold year-by-year offsets that is added to the base parameter on the log scale. The exact behavior depends on whether the formula carries an intercept:

Formula	Base parameter	Slope rows
`~ 1` (intercept only)	estimable; init from spec or default	n/a
`~ temp` (intercept + slope)	estimable; init from spec or default	estimated
`~ 0 + temp` (slope only)	mapped NA, value from spec or default	estimated

Intercept-bearing formulas (~ 1, ~ temp) – the base parameter remains estimable; for example log_R0 for species s lives on rec_pars[s, 1] exactly as it does without any linkages. The (Intercept) row of the table exists for bookkeeping and as a hook for init/prior (see below) but its beta_linkage slot is held at 0 and not estimated. The recruitment / mortality / growth modules read base + offset(yr) each year – with the intercept fixed at zero, the offset is exactly the slope contribution.

Slope-only formulas (~ 0 + temp) – there is no (Intercept) row, so the base parameter is mapped out (NA) and stays at the default initial value, or the value supplied by the inits argument in fit_mod, or the value supplied by init = list("(Intercept)"= ...) on the spec.

`init` and the base parameter

init = list((Intercept)= X) on the linkage spec flows to the base parameter rather than to the linkage row. The linkage row’s beta_linkage slot stays at 0; the base parameter starts at X. This is the recommended way to set a sensible starting value for intercept-bearing linkages – it composes naturally with phasing (the base parameter goes through the existing per-process phase logic) and keeps the optimiser grounded.

# Start log_M1 at log(0.06) without estimating an extra intercept beta.
build_M1(
  M1_model = "sex_age_invariant",
  linkages = list(
    log_M1 = linkage_spec(
      formula = ~ 1,
      init    = list(`(Intercept)` = log(0.06))
    )
  )
)

Stratification

Stratification (by) determines which base-parameter slots are affected: by = ~ species produces one (Intercept) row per species; by = NULL (a fully shared linkage) produces a single shared row; and by = ~ species + sex produces one per (species, sex). Each (Intercept) row that the user supplies an init for pushes that init into the matching base-parameter slot.

This rule applies uniformly to every linkage target (log_K, log_L1, log_Linf, log_m, log_M1, log_R0, log_alpha, log_beta).

Single-sex models and `by = ~ ... + sex`

If your model is single-sex (nsex = 1 for every species in scope), asking for by = ~ species + sex produces a warning: there is only one sex level available, so the sex stratum collapses to a single shared level. The result is equivalent to by = ~ species. Models with mixed nsex (some single-sex, some two-sex species) emit one sex level per species automatically – single-sex species get one row per parameter, two-sex species get two.

Pooling

Inside pool_linkages() each spec is materialized independently and the resulting rows are concatenated into a single linkage table. The shared design matrix unions all RHS terms across specs, so species 1’s rows reference only (Intercept) and temp, while species 2’s rows can additionally reference PDO – there’s no duplication of the temp column.

pool_linkages() unions the design columns by name so the global matrix X contains both temp and the three poly(temp, 3)* columns; species 1’s rows reference the polynomial columns, species 2’s only the linear temp.

On the roadmap

Random-effects grouping – the re_group column on the linkage table is stored end-to-end; the cpp will grow a hierarchical penalty so coefficients sharing a re_group value are partially pooled.
Age-dependent linkages on processes other than M – the age_bin stratum is consumed by the M accumulator today (a row’s age_bin == NA broadcasts across ages; specific values pin the offset to that age slice). The schema is reserved for other processes that may eventually vary by age.
Selectivity / catchability linkages – the encoder already reserves process = "sel" and process = "q" codes, but no cpp accumulator consumes them yet.
Legacy retirement (v4.4.0) – srr_indices, M1_indices, srr_fun in c(1, 3, 5), and M1_model in c(4, 5) retire in the next minor release; see Scheduled removal (v4.4.0) in NEWS.md for the migration table.