# References and methodology
Comprehensive documentation of all equations, algorithms, software
dependencies, and literature references used across the `jwspecfit`,
`jwspecmcmc`, and `jwspecabund` packages.
---
## Software dependencies
| Package | Version | Role | Reference |
|---------|---------|------|-----------|
| NumPy | >= 1.24 | Array operations | Harris et al. (2020) |
| SciPy | >= 1.10 | Optimisation, special functions | Virtanen et al. (2020) |
| Astropy | >= 5.0 | Units, FITS I/O | Astropy Collaboration (2022) |
| matplotlib | >= 3.6 | Static plotting | Hunter (2007) |
| plotly | >= 5.0 | Interactive plotting | Plotly Technologies Inc. |
| lmfit | >= 1.2 | Parameter management | Newville et al. (2014) |
| tqdm | >= 4.60 | Progress bars | da Costa-Luis (2019) |
| joblib | >= 1.2 | Parallel bootstrap | Joblib Development Team |
| emcee | >= 3.1 | Affine-invariant MCMC | Foreman-Mackey et al. (2013) |
| nautilus | >= 1.0 | Importance nested sampling | Lange (2023) |
| dynesty | >= 2.0 | Dynamic nested sampling | Speagle (2020); Koposov et al. (2022) |
| PyNEB | >= 1.1.25 | Atomic physics, emissivities | Luridiana et al. (2015) |
| corner | >= 2.2 | Corner plots | Foreman-Mackey (2016) |
---
## 1. Emission-line fitting (`jwspecfit`)
### 1.1 Line profiles
Emission lines are modelled as Gaussians bin-averaged over each
spectral pixel using the error function:
$$
\bar{G}_i = \frac{A}{2\,\Delta\lambda_i}
\left[
\operatorname{erf}\!\left(\frac{\lambda_{i+1} - \mu}{\sigma\sqrt{2}}\right)
- \operatorname{erf}\!\left(\frac{\lambda_i - \mu}{\sigma\sqrt{2}}\right)
\right]
$$
where *A* is the integrated flux (erg s−1 cm−2),
*μ* is the line centroid (Å), *σ* is the Gaussian width (Å), and
λi, λi+1 are the pixel edges. This avoids
sampling bias when lines are narrower than pixels (prism regime,
R ∼ 30).
### 1.2 Spectral resolution
The instrumental line spread function is assumed Gaussian with
σinst = λ / (R × 2.3548). The resolving power *R* is
determined as follows:
| Source | R(λ) | Reference |
|--------|------|-----------|
| PRISM/CLEAR | R = 50 + 50(λ − 1) + 15(λ − 1)2 (µm), clipped to [30, 300] | Heuristic fit to NIRSpec instrument model |
| G140M, G235M, G395M | R = 1000 | NIRSpec documentation |
| G140H, G235H, G395H | R = 2700 | NIRSpec documentation |
| Pixel estimate | R ≈ λ / (2Δλ) | Fallback when grating unknown |
### 1.3 Continuum subtraction
Iterative sigma-clipped polynomial fit:
1. Mask pixels within ±6σinst of every expected
emission line and blueward of NV 1239 Å (Lyman break region).
2. Fit weighted polynomial of degree *d* (default *d* = 2) to
unmasked pixels.
3. Compute residuals ri = fi − Ci.
4. Reject pixels with ri > k σ̂ (default *k* = 2.5)
to iteratively clip positive outliers (emission).
5. Repeat steps 2–4 for 5 iterations.
### 1.4 Optimisation
The model is fit using `scipy.optimize.least_squares` with the
Trust Region Reflective (TRF) algorithm, Jacobian-based parameter
scaling (`x_scale='jac'`), and box constraints on all parameters.
### 1.5 Parameter constraints
| Constraint | Implementation | Reference |
|------------|---------------|-----------|
| [NII] doublet ratio | A(6549) = A(6585) / 2.96 | Storey & Zeippen (2000) |
| Balmer–OIII width tying | σv(Hα, Hβ, Hγ, Hδ, [NII]) = σv([OIII] 5007) | Assumed common NLR kinematics |
| Centroid tying | Same lines, velocity-space centroid tied to [OIII] | |
| Broad Balmer centroids | Tied to narrow counterpart | |
### 1.6 Broad Balmer component detection
BIC-based model selection compares four models:
| Model | Components |
|-------|-----------|
| narrow | Narrow lines only |
| broad1 | + intermediate broad (FWHM ∼ 500–2000 km s−1) |
| broad2 | + very broad / BLR (FWHM ∼ 2000–5000 km s−1) |
| both | + both broad components |
The Bayesian Information Criterion is:
$$
\mathrm{BIC} = k \ln n - 2 \ln \hat{\mathcal{L}}
$$
where *k* is the number of free parameters, *n* is the number of
data points, and L̂ is the maximised likelihood.
A more complex model is accepted if ΔBIC > 6
(default threshold). Robust comparison uses bootstrapped BIC
distributions (nboot,BIC = 100).
### 1.7 Bootstrap uncertainties
1. Perturb flux: f′i = fi + σi × εi,
where εi ∼ 𝒩(0, 1).
2. Refit with `least_squares` using the best-fit parameters as
initial guess.
3. Repeat Nboot times (default 1000).
4. Report σ̂(flux) from the bootstrap distribution for each line.
Parallelised via `joblib` (`n_jobs = -1` uses all cores).
### 1.8 Lyman-alpha modelling
Lyα is modelled as a skewed Gaussian (Owen 1956;
`scipy.stats.skewnorm`) attenuated by mean IGM transmission:
$$
F_{\mathrm{Ly}\alpha}(\lambda) = S(\lambda;\, A,\, \mu,\, \sigma,\, \alpha) \times T_\mathrm{IGM}(\lambda,\, z)
$$
where *S* is the bin-averaged skewed Gaussian (positive skew =
red tail) and TIGM is the mean IGM transmission.
**IGM transmission** follows Inoue et al. (2014), accounting for
Lyman-series line absorption (LAF) and damped Lyman-alpha system
(DLA) continuum absorption. Coefficients are loaded from their
Table 2 (shipped as `data/inoue2014_table2.txt`).
### 1.9 Line database
Rest-frame vacuum wavelengths from NIST Atomic Spectra Database,
Morton (2003), and Storey & Zeippen (2000). 34 lines from Lyα
(1215.67 Å) through [SII] 6732 (6734.53 Å).
---
## 2. MCMC fitting (`jwspecmcmc`)
### 2.1 Likelihood
Gaussian log-likelihood identical to the least-squares objective:
$$
\ln \mathcal{L} = -\frac{1}{2} \sum_i
\left(\frac{f_i - C_i - M_i(\boldsymbol{\theta})}{\sigma_i}\right)^2
$$
where fi is the observed flux, Ci the continuum,
Mi the emission-line model, and σi the flux
uncertainty.
### 2.2 Priors
Default priors are uniform within the same bounds used by
`least_squares`. Per-parameter overrides are supported:
| Prior type | PDF | Use case |
|-----------|-----|----------|
| `UniformPrior(lo, hi)` | π(θ) ∝ 1 for θ ∈ [lo, hi] | Default for all parameters |
| `GaussianPrior(mean, std, lo, hi)` | Truncated Gaussian | Informative priors |
| `LogUniformPrior(lo, hi)` | π(θ) ∝ 1/θ for θ ∈ [lo, hi] | Scale-invariant parameters |
### 2.3 Samplers
**emcee** (Foreman-Mackey et al. 2013): affine-invariant ensemble
sampler with stretch moves. Walkers are initialised from a
tight ball around the least-squares MLE.
**nautilus** (Lange 2023): importance nested sampler using neural
network-based proposal distributions. Uses `n_eff` target
effective samples.
### 2.4 Convergence diagnostics
| Diagnostic | Criterion | Reference |
|-----------|-----------|-----------|
| Gelman–Rubin R̂ | R̂ < 1.05 for all parameters | Gelman & Rubin (1992) |
| Effective sample size (ESS) | ESS > 100 for all parameters | Via FFT autocorrelation |
### 2.5 Posterior summaries
Point estimates: medians. Uncertainties: (16th, 84th) percentile
half-widths, reported as asymmetric (−δlo, +δhi)
credible intervals.
---
## 3. Chemical abundances (`jwspecabund`)
### 3.1 Overview of methods
Three independent methods for gas-phase metallicity:
| Method | When used | Lines required | Uncertainty propagation |
|--------|----------|---------------|----------------------|
| Direct Te | [OIII] 4363 detected (SNR ≥ 3) | [OIII] 4363, 5007, 4959, Hβ, + Balmer pair for dust | MC perturbation or MCMC posterior propagation through PyNEB |
| Bayesian forward model | User-selected (`method="forward"`) | Any subset of optical CELs + Hβ | Full MCMC/nested sampling of ionic abundances, Te, ne |
| Strong-line calibrations | No auroral line | [OIII] 5007, Hβ, optionally [OII], Hα | MC perturbation including calibration scatter |
### 3.2 Dust correction
Dust correction is applied to all emission-line fluxes before
abundance calculations (for the direct and strong-line methods).
AV is derived from the Balmer decrement when not supplied.
#### 3.2.1 AV from the Balmer decrement
$$
A_V = \frac{2.5 \log_{10}(R_\mathrm{obs} / R_\mathrm{int})}{f(\lambda_\mathrm{den}) - f(\lambda_\mathrm{num})}
$$
where Robs is the observed Balmer ratio (Hα/Hβ or
Hγ/Hβ), Rint is the intrinsic Case B ratio, and
f(λ) = A(λ) / AV is the normalised attenuation curve.
**Intrinsic Balmer ratios** (Case B, Te = 104 K,
ne = 100 cm−3; Osterbrock & Ferland 2006):
| Ratio | Value |
|-------|-------|
| Hα/Hβ | 2.86 |
| Hγ/Hβ | 0.468 |
| Hδ/Hβ | 0.259 |
**Note on temperature dependence:** The intrinsic Balmer ratios
are temperature-dependent. At Te = 2.5 × 104 K
the Hα/Hβ ratio is ∼2.72, not 2.86. The current implementation
uses fixed ratios at 104 K for the external dust correction.
See Section 3.5 (forward model) for self-consistent treatment where
AV is a free parameter and Balmer ratios are recomputed at
each model Te.
#### 3.2.2 Salim+18 / Noll+09 attenuation (default)
Modified Calzetti et al. (2000) attenuation with a UV bump and
power-law slope deviation (Noll et al. 2009; Salim et al. 2018):
$$
A(\lambda) = E(B{-}V) \left[ k'_\mathrm{Calz}(\lambda) \left(\frac{\lambda}{0.55\,\mu\mathrm{m}}\right)^\delta + B \cdot D(\lambda) \cdot R_V \right]
$$
where:
- k′Calz(λ) is the Calzetti et al. (2000) starburst curve,
- D(λ) is a Drude profile at 2175 Å modelling the UV bump:
D(λ) = (λγ)2 / [(λ2 − λ02)2 + (λγ)2],
- δ is the slope deviation (default −0.35),
- *B* is the UV bump strength (default 2.27),
- RV = AV / E(B−V) (default 3.15).
**Default parameters** (RV = 3.15, δ = −0.35, *B* = 2.27)
follow the median values for star-forming galaxies from
Salim et al. (2018).
**References:**
- Calzetti, D. et al. 2000, ApJ, 533, 682
- Noll, S. et al. 2009, A&A, 507, 1793
- Salim, S. et al. 2018, ApJ, 859, 11
#### 3.2.3 Cardelli+89 extinction
Milky Way extinction curve (Cardelli, Clayton & Mathis 1989):
$$
A(\lambda) = A_V \left[ a(x) + \frac{b(x)}{R_V} \right]
$$
where x = 1/λ (µm−1) and a(x), b(x) are piecewise
polynomials for the IR (0.3 ≤ x < 1.1), optical/NIR (1.1 ≤ x < 3.3),
and UV (3.3 ≤ x < 8.0) regimes. Default RV = 3.1.
**Reference:** Cardelli, J. A., Clayton, G. C. & Mathis, J. S.
1989, ApJ, 345, 245
#### 3.2.4 Flux correction
Observed fluxes are corrected as:
$$
F_\mathrm{corr} = F_\mathrm{obs} \times 10^{0.4\,A(\lambda)}
$$
Errors are scaled by the same factor. When MCMC posteriors are
available, each posterior sample is corrected individually.
---
### 3.3 Direct Te method
The direct method uses the temperature-sensitive auroral-to-nebular
line ratio to determine electron temperature, breaking the
degeneracy between temperature and ionic abundance inherent in
strong-line methods. This is the gold standard for gas-phase
metallicity when the auroral line is detected.
#### 3.3.1 Two-zone ionisation model
A two-zone model is assumed:
| Zone | Traced by | Temperature | Lines |
|------|----------|-------------|-------|
| High ionisation | O2+, Ne2+ | Te(high) = Te([OIII]) | [OIII] 4363, 4959, 5007; [NeIII] 3869 |
| Low ionisation | O+, N+, S+ | Te(low) = Te([OII]) | [OII] 3726, 3729; [NII] 6585; [SII] 6718, 6732 |
Intermediate ions (S2+, Ar2+) use
Tmid = (Thigh + Tlow) / 2.
The contribution of O3+/H+ to the total oxygen abundance
is assumed negligible, following Berg et al. (2021).
#### 3.3.2 Electron temperature: Te([OIII])
Determined from the [OIII] auroral-to-nebular ratio using PyNEB:
$$
R_{\mathrm{[OIII]}} = \frac{I(4363)}{I(5007) + I(4959)}
$$
PyNEB solves for Te numerically using atomic data for the
O2+ ion. We use `getTemDen()` with
`to_eval="(L(4363))/(L(5007)+L(4959))"`, `log=True`, `start_x=3.0`,
`end_x=5.0` for robust root-finding across the range
103–105 K (essential for metal-poor galaxies at
z > 4 where Te > 25 000 K is common).
**Underlying physics:** For a collisionally excited line (CEL) the
emissivity depends on Te through the Boltzmann factor. For a
two-level simplification (Osterbrock & Ferland 2006, Chapter 5), the
emissivity of a transition from upper level *j* to lower level *i* is:
$$
\varepsilon_{ji} = n_j A_{ji} h\nu_{ji}
$$
where Aji is the spontaneous transition probability and
nj is the population of level *j*, determined by solving the
statistical equilibrium equations. In the low-density limit
(ne ≪ ncrit), every collisional excitation is
followed by radiative de-excitation, and the emissivity ratio of the
auroral (1S0 → 1D2, 4363 Å)
to nebular (1D2 → 3P1,2,
4959+5007 Å) transitions depends primarily on Te:
$$
R_\text{[OIII]} = \frac{j_{4363}}{j_{4959} + j_{5007}}
= \frac{A_{4363}}{A_{4959} + A_{5007}}
\frac{\Omega(^1D_2,\, ^1S_0)}{\Omega(^3P,\, ^1D_2)}
\exp\!\left(-\frac{\Delta E_{43}}{k_B T_e}\right)
$$
where Ω are the collision strengths,
ΔE43 = E(1S0) − E(1D2) = 3.29 eV
is the energy gap between the auroral and nebular upper levels, and
kB is the Boltzmann constant. This gives the strong
exponential sensitivity to Te.
PyNEB solves the full multi-level (typically 5-level) statistical
equilibrium problem numerically, accounting for all collisional and
radiative transitions and the actual density dependence. The
5-level model for O2+ includes the ground term
3P0,1,2 and excited terms 1D2
and 1S0. The statistical equilibrium equations are:
$$
\sum_{j \neq i} n_j \left[ A_{ji} + n_e q_{ji}(T_e) \right]
= n_i \sum_{j \neq i} \left[ A_{ij} + n_e q_{ij}(T_e) \right]
$$
for each level *i*, where qij is the collisional rate
coefficient:
$$
q_{ij}(T_e) = \frac{8.629 \times 10^{-6}}{g_i\, T_e^{1/2}}\, \Omega_{ij}(T_e) \quad \text{cm}^3\,\text{s}^{-1}
$$
with gi the statistical weight of level *i* and
Ωij(Te) the thermally averaged (Maxwellian)
collision strength. PyNEB interpolates tabulated
Ωij(Te) from the atomic data files.
#### 3.3.3 Electron temperature: Te([NII])
When [NII] 5756 is detected, Te(N+) is measured
directly from the [NII] 5756/6585 ratio via PyNEB, providing an
independent low-ionisation zone temperature:
$$
R_\text{[NII]} = \frac{I(5756)}{I(6549) + I(6585)}
$$
The same 5-level statistical equilibrium approach applies, with
the N+ energy gap
ΔE = E(1S0) − E(1D2) = 4.05 eV
(the [NII] auroral line lies at shorter wavelength than [OIII] 4363
because the energy gap is larger). This ratio is less commonly
detected because [NII] 5756 is intrinsically fainter than [OIII] 4363.
#### 3.3.4 Te–Te relations
When Te([OII]) cannot be measured directly (the usual case),
the low-ionisation zone temperature is derived from the high-ionisation
zone temperature via empirical relations.
**Available relations:**
| Name | Equation | Reference | Notes |
|------|---------|-----------|-------|
| `"desi"` | Te(low) = 0.648 Te(high) + 3270 | DESI DR2 (arXiv:2601.02463) | Calibrated on the DESI Early Data Release galaxy sample |
| `"classical"` | Te(low) = 0.7 Te(high) + 3000 | Garnett (1992) | Widely used; original form from Campbell, Terlevich & Melnick (1986) |
The Te–Te relation is a significant source of
systematic uncertainty. Other relations in the literature include:
| Equation | Reference |
|---------|-----------|
| T2 = 0.7 T3 + 1500 | Campbell et al. (1986) / Garnett (1992), corrected by Andrews & Martini (2013) |
| Various parametric forms | Izotov et al. (2006); Pilyugin et al. (2009); Nicholls et al. (2014) |
| T2 = 1.0807 T3 − 0.0846 T32 (T in 104 K) | Pagel et al. (1992) |
The ∼1000–2000 K discrepancy between theoretically and
empirically calibrated relations (Andrews & Martini 2013; Curti
et al. 2017) arises partly from the difference between individual
H II regions and galaxy-integrated spectra, where flux originates
from a mixture of H II regions with varying properties plus diffuse
ionised gas.
#### 3.3.5 Electron density
ne is derived from density-sensitive doublet ratios via PyNEB:
| Doublet | Lines | Diagnostic ratio |
|---------|-------|-----------------|
| [SII] | 6718, 6732 | I(6718) / I(6732) |
| [OII] | 3726, 3729 | I(3726) / I(3729) |
[SII] is preferred when available; [OII] is used as a fallback.
If neither doublet is detected, ne = 100 cm−3 is
assumed (abundance diagnostics are insensitive to ne at
ne ≲ 104 cm−3; e.g. Isobe et al. 2023).
**Physical basis:** The [SII] and [OII] doublets arise from
transitions between levels within the ground-term fine-structure
splitting (4S3/2, 2D3/2,5/2
for S+; analogous for O+). The two lines in each
doublet have different critical densities, so their ratio depends on
ne:
$$
\frac{I(\lambda_1)}{I(\lambda_2)}
= \frac{n_{j_1}\,A_{j_1 \to i_1}\,\nu_1}{n_{j_2}\,A_{j_2 \to i_2}\,\nu_2}
$$
At low density (ne ≪ ncrit) the ratio approaches
the ratio of collision strengths; at high density
(ne ≫ ncrit) it approaches the ratio of statistical
weights. The transition occurs around
ncrit ∼ 103–104 cm−3,
making the doublet ratio a sensitive density probe in this regime.
PyNEB uses the full multi-level solution to extract ne from
the observed ratio at the adopted Te.
#### 3.3.6 Ionic abundances
Ionic abundances Xi+/H+ are computed via PyNEB's
`getIonAbundance()` method for Te ≤ 30 000 K. For
Te > 30 000 K (beyond the Storey & Hummer 1995 H I
recombination tables bundled with PyNEB), we compute abundances
manually:
$$
\frac{X^{i+}}{\mathrm{H}^+} = \frac{I_\mathrm{line}}{I_{\mathrm{H}\beta}} \times \frac{\varepsilon_{\mathrm{H}\beta}(T_e, n_e)}{\varepsilon_\mathrm{line}(T_e, n_e)}
$$
where εline is obtained from PyNEB's CEL emissivity tables
(valid at any temperature) and εHβ uses the Aller (1984)
formula (Section 3.5.3) which has no upper temperature limit.
The CEL emissivity is computed from the level populations obtained
by solving the statistical equilibrium equations (Section 3.3.2):
$$
\varepsilon_\mathrm{line}(T_e, n_e)
= n_{X^{i+}} \, n_e \, q_{ij}(T_e) \, \frac{A_{ji}}{A_{ji} + n_e\,q_{ji}(T_e)} \, h\nu_{ji}
$$
In the low-density limit this simplifies to
ε ∝ nXi+ ne qij(Te) hνji,
and the ionic abundance reduces to:
$$
\frac{X^{i+}}{\mathrm{H}^+}
= \frac{I_\mathrm{line}}{I_{\mathrm{H}\beta}}
\times \frac{\alpha_{\mathrm{H}\beta}^\mathrm{eff}(T_e)}{q_{ij}(T_e)\,h\nu_{ji} / (h\nu_{\mathrm{H}\beta})}
$$
where αeffHβ is the effective recombination
coefficient for Hβ (see Section 3.5.3). PyNEB handles the full
density-dependent multi-level solution internally.
**Ions computed:**
| Ion | Line(s) | Zone | PyNEB atom |
|-----|---------|------|-----------|
| O2+/H+ | [OIII] 5007 | High | `Atom("O", 3)` |
| O+/H+ | [OII] 3726+3729 | Low | `Atom("O", 2)` |
| N+/H+ | [NII] 6585 | Low | `Atom("N", 2)` |
| S+/H+ | [SII] 6718+6732 | Low | `Atom("S", 2)` |
| S2+/H+ | [SIII] 9069 | Mid | `Atom("S", 3)` |
| Ne2+/H+ | [NeIII] 3869 | High | `Atom("Ne", 3)` |
| Ar2+/H+ | [ArIII] 7136 | Mid | `Atom("Ar", 3)` |
#### 3.3.7 Atomic data
PyNEB (Luridiana et al. 2015) is used for all atomic physics
calculations. The default atomic and collisional data shipped with
PyNEB ≥ 1.1.25 include:
- **Transition probabilities:** Froese Fischer & Tachiev (2004)
for most ions; Storey & Zeippen (2000) for [NII], [OIII].
- **Collision strengths:**
- O+: Kisielius et al. (2009)
- O2+: Aggarwal & Keenan (1999) / Storey, Sochi & Badnell (2014)
- N+: Tayal (2011)
- S+: Tayal & Zatsarinny (2010)
- **H I recombination:** Storey & Hummer (1995), valid to 30 000 K.
Users requiring specific atomic data versions (e.g. to match a
particular paper) should configure PyNEB's data files directly via
`pn.atomicData.setDataFile()`.
#### 3.3.8 Total abundances and ICFs
**Oxygen:**
$$
\frac{\mathrm{O}}{\mathrm{H}} = \frac{\mathrm{O}^+}{\mathrm{H}^+} + \frac{\mathrm{O}^{2+}}{\mathrm{H}^+}
$$
No ICF is applied for higher ionisation states; the O3+
contribution is negligible for typical H II regions (Berg et al. 2021).
**Nitrogen:**
$$
\frac{\mathrm{N}}{\mathrm{O}} = \mathrm{ICF}_\mathrm{N} \times \frac{\mathrm{N}^+/\mathrm{H}^+}{\mathrm{O}^+/\mathrm{H}^+}
$$
where ICFN is from Izotov et al. (2006, eq. 18):
$$
\mathrm{ICF}_\mathrm{N} = -0.825\,x^2 + 0.718\,x + 0.853, \qquad x = \mathrm{O}^+ / \mathrm{O}_\mathrm{total}
$$
clamped to ICFN ≥ 1. For many applications the standard
assumption N/O ≈ N+/O+ (ICF = 1) is used,
justified by the similar ionisation potentials of O+ and
N+ (e.g. Nava et al. 2006; Amayo et al. 2021).
**Sulfur:**
$$
\mathrm{S/O} = \mathrm{ICF}_\mathrm{S} \times (\mathrm{S}^+ + \mathrm{S}^{2+}) / \mathrm{O}
$$
ICF from Izotov et al. (2006, eq. 20), accounting for S3+:
$$
\mathrm{ICF}_\mathrm{S} = 0.013 + 5.986\,x - 21.085\,x^2 + 26.462\,x^3 - 11.282\,x^4
$$
clamped to ICFS ≥ 1.
**Neon:**
$$
\mathrm{Ne/O} = \mathrm{ICF}_\mathrm{Ne} \times \mathrm{Ne}^{2+}/\mathrm{O}^{2+}
$$
ICF from Izotov et al. (2006, eq. 19):
$$
\mathrm{ICF}_\mathrm{Ne} = -0.385\,x^2 + 0.405\,x + 0.335
$$
**Argon:**
$$
\mathrm{Ar/O} = \mathrm{ICF}_\mathrm{Ar} \times \mathrm{Ar}^{2+}/\mathrm{O}^{2+}
$$
ICF from Izotov et al. (2006, eqs. 22/23):
$$
\mathrm{ICF}_\mathrm{Ar} = 0.157 + 3.119\,x - 6.185\,x^2 + 4.517\,x^3
$$
clamped to ICFAr ≥ 1.
In all ICF expressions, x = O+ / Ototal is the
fractional ionic abundance of O+.
**Reporting convention:**
$$
12 + \log(\mathrm{O/H}) = 12 + \log_{10}\!\left(\frac{\mathrm{O}^+/\mathrm{H}^+ + \mathrm{O}^{2+}/\mathrm{H}^+}{1}\right)
$$
$$
\log(\mathrm{N/O}) = \log_{10}\!\left(\mathrm{ICF}_\mathrm{N} \times \frac{\mathrm{N}^+/\mathrm{H}^+}{\mathrm{O}^+/\mathrm{H}^+}\right)
$$
The solar oxygen abundance is
12 + log(O/H)☉ = 8.69 (Asplund et al. 2009).
**Reference:** Izotov, Y. I., Stasinska, G., Meynet, G., Guseva, N. G.
& Thuan, T. X. 2006, A&A, 448, 955
#### 3.3.9 Uncertainty propagation
**Bootstrap / least-squares results:** Gaussian Monte Carlo
perturbation (NMC = 1000 by default):
1. For each iteration, perturb every line flux:
F′line = max(Fline + σline × ε, 10−50),
where ε ∼ 𝒩(0, 1).
2. Recompute Te, ionic abundances, totals through the full
PyNEB pipeline.
3. Report σ̂ of the resulting 12+log(O/H) and log(N/O) distributions.
**MCMC results:** Each posterior sample's line fluxes are propagated
through the full PyNEB pipeline. Posteriors are thinned to
Nposterior samples (default 1000) via random subsampling
to keep runtime manageable (∼7 ms per PyNEB evaluation).
---
### 3.4 Strong-line calibrations (Sanders+25)
Used when the [OIII] 4363 auroral line is not detected.
Strong-line methods use empirical calibrations between bright
emission-line ratios and metallicity, calibrated on samples of
galaxies with direct Te measurements.
#### 3.4.1 Diagnostic ratios
$$
\mathrm{O3} = \log_{10}\!\left(\frac{[\mathrm{OIII}]\,5007}{\mathrm{H}\beta}\right), \qquad
\mathrm{O2} = \log_{10}\!\left(\frac{[\mathrm{OII}]}{\mathrm{H}\beta}\right)
$$
$$
\mathrm{R23} = \log_{10}\!\left(\frac{[\mathrm{OIII}]\,5007 + [\mathrm{OII}]}{\mathrm{H}\beta}\right), \qquad
\mathrm{O32} = \log_{10}\!\left(\frac{[\mathrm{OIII}]\,5007}{[\mathrm{OII}]}\right)
$$
[OII] denotes the total [OII] λλ3726,3729 flux (resolved or
unresolved).
#### 3.4.2 Calibration polynomials
Each diagnostic ratio *R* is modelled as a polynomial in
Z = 12 + log(O/H):
$$
R_\mathrm{model}(Z) = \sum_{k=0}^{n} c_k \,(Z - Z_\mathrm{ref})^k, \qquad Z_\mathrm{ref} = 8.0
$$
**Coefficients (Sanders et al. 2025, Table 3):**
| Ratio | c0 | c1 | c2 | c3 | c4 | Zmin | Zmax |
|-------|-------|-------|-------|-------|-------|-------|-------|
| O3 | 0.852 | −0.162 | −1.149 | −0.553 | — | 7.3 | 8.6 |
| O2 | 0.172 | 0.954 | −0.832 | — | — | 7.3 | 8.6 |
| R23 | 0.998 | 0.053 | −0.141 | −0.493 | −0.774 | 7.3 | 8.6 |
| O32 | 0.697 | −1.245 | −0.869 | — | — | 7.3 | 8.6 |
#### 3.4.3 Simultaneous chi-squared fit
The best-fit metallicity Zbest is found by minimising
the simultaneous chi-squared across all available diagnostics:
$$
\chi^2(Z) = \sum_{m \in \mathrm{diag}} \frac{(R_\mathrm{obs}^{(m)} - R_\mathrm{model}^{(m)}(Z))^2}{\sigma_{\mathrm{obs},m}^2 + \sigma_{\mathrm{cal},m}^2}
$$
where σobs is the measurement error on log10
of the ratio (Gaussian error propagation) and σcal
is the total calibration scatter in ratio space:
$$
\sigma_\mathrm{cal} = \sqrt{\sigma_{R,\mathrm{fit}}^2 + \sigma_{R,\mathrm{int}}^2}
$$
Minimisation uses `scipy.optimize.minimize_scalar` with method
`"bounded"` on Z ∈ [6.5, 9.0].
#### 3.4.4 Three scatter components
Sanders+25 report three distinct scatter values for each diagnostic:
| Symbol | Meaning | Role in code |
|--------|---------|-------------|
| σR,fit | RMS residual of the polynomial fit to the calibration sample | Combined into σcal |
| σR,int | Intrinsic scatter in the ratio at fixed Z (physical scatter from galaxy-to-galaxy variations) | Combined into σcal |
| σO/H,int | Intrinsic scatter in 12+log(O/H) at fixed ratio (the actual metallicity uncertainty floor per diagnostic) | Stored for reference |
**Values:**
| Ratio | σR,fit | σR,int | σcal | σO/H,int |
|-------|------|------|------|------|
| O3 | 0.04 | 0.13 | 0.136 | 0.14 |
| O2 | 0.03 | 0.25 | 0.252 | 0.22 |
| R23 | 0.03 | 0.07 | 0.076 | 0.13 |
| O32 | 0.09 | 0.29 | 0.304 | 0.25 |
#### 3.4.5 Uncertainty propagation
**Bootstrap / least-squares results:** Monte Carlo perturbation
(NMC = 1000 by default). Each iteration:
1. Perturb each observed log-ratio by
𝒩(0, √(σobs2 + σcal2))
— both measurement and calibration scatter are included.
2. Re-minimise the chi-squared to find a perturbed Z.
3. Report (16th, 84th) percentiles of the Z distribution.
**MCMC results:** Each posterior sample provides a set of fluxes.
For each sample:
1. Compute line ratios from the posterior fluxes.
2. Perturb each ratio by 𝒩(0, σcal) to propagate
calibration scatter.
3. Minimise to find Z for that sample.
The MCMC posterior spread captures measurement uncertainty, and
the σcal perturbation adds the calibration scatter on top.
Posteriors are thinned to Nposterior samples (default 1000).
**Reference:** Sanders, R. L. et al. 2025, ApJ (submitted)
---
### 3.5 Bayesian forward model (Cullen+25 approach)
The forward model simultaneously constrains ionic abundances,
electron temperature, and electron density by predicting emission-line
flux ratios and comparing to observations via MCMC or nested sampling.
#### 3.5.1 Free parameters and priors
| Parameter | Symbol | Range | Prior | Unit |
|-----------|--------|-------|-------|------|
| Electron temperature | log10 Te | [3.5, 5.0] | Uniform | K |
| Electron density | log10 ne | [0.0, 5.0] | Uniform | cm−3 |
| Per-ion abundance | log10(Xi+/H+) | [−7, −2] (oxygen); [−8, −3] (N, Ne); [−9, −4] (S, Ar) | Uniform in log (= log-uniform in linear) | — |
Only ions with at least one detected emission line are included as
free parameters. Typically 4–8 free parameters depending on the
available lines.
#### 3.5.2 Model prediction
For each parameter vector **θ**, the predicted line/Hβ flux ratio is:
$$
\frac{F_\mathrm{line}}{F_{\mathrm{H}\beta}} = \frac{X^{i+}}{\mathrm{H}^+} \times \frac{\varepsilon_\mathrm{line}(T_e, n_e)}{\varepsilon_{\mathrm{H}\beta}(T_e, n_e)}
$$
where εline is the collisionally excited line (CEL)
emissivity from PyNEB's `Atom.getEmissivity()`, and εHβ is
the Hβ recombination emissivity.
For lines consisting of multiple transitions (e.g. [OII] doublet),
the emissivities are summed.
#### 3.5.3 Hβ emissivity
For Te ≤ 30 000 K, the Storey & Hummer (1995) H I
recombination tables in PyNEB are used via
`RecAtom("H", 1).getEmissivity(Te, ne, wave=4861)`.
For Te > 30 000 K, the Aller (1984) Case B formula is used:
$$
\alpha_{\mathrm{H}\beta}^{\mathrm{eff}} = 3.03 \times 10^{-14} \left(\frac{T_e}{10^4\,\mathrm{K}}\right)^{-0.874} \;\mathrm{cm}^3\,\mathrm{s}^{-1}
$$
$$
\varepsilon_{\mathrm{H}\beta} = \alpha_{\mathrm{H}\beta}^{\mathrm{eff}} \times h\nu_{\mathrm{H}\beta}
$$
This formula has no upper temperature limit, enabling abundance
calculations for the extremely metal-poor, high-Te galaxies
found at z > 4 with JWST.
**References:**
- Aller, L. H. 1984, *Physics of Thermal Gaseous Nebulae* (Reidel)
- Storey, P. J. & Hummer, D. G. 1995, MNRAS, 272, 41
#### 3.5.4 Likelihood
Gaussian log-likelihood on line/Hβ flux ratios:
$$
\ln \mathcal{L}(\boldsymbol{\theta}) = -\frac{1}{2} \sum_{\mathrm{lines}\, l}
\frac{(R_{\mathrm{obs},l} - R_{\mathrm{model},l}(\boldsymbol{\theta}))^2}{\sigma_{R,l}^2}
$$
where Robs,l = Fl / FHβ and
σR,l is the error on the ratio propagated from the flux
uncertainties.
#### 3.5.5 Sampling
| Sampler | Algorithm | Key settings | Reference |
|---------|----------|-------------|-----------|
| emcee | Affine-invariant ensemble MCMC | 32 walkers, 5000 steps, 1000 burn-in | Foreman-Mackey et al. (2013) |
| dynesty | Dynamic nested sampling | 500 live points | Speagle (2020); Koposov et al. (2022) |
Walkers are initialised around a MAP estimate found by
`scipy.optimize.differential_evolution`.
#### 3.5.6 Total abundances from posteriors
$$
12 + \log(\mathrm{O/H}) = 12 + \log_{10}\!\left(10^{\log(\mathrm{O}^+/\mathrm{H}^+)} + 10^{\log(\mathrm{O}^{2+}/\mathrm{H}^+)}\right)
$$
computed sample-by-sample from the posterior chains.
N/O is derived as log(N+/H+) − log(O+/H+)
(= log(N+/O+), the ICF = 1 assumption).
#### 3.5.7 Current limitations and differences from Cullen+25
The current implementation differs from the full Cullen et al. (2025)
/ Scholte et al. (2025) EXCELS methodology in several ways:
1. **AV is not a free parameter.** Dust correction is applied
externally before the forward model, using fixed Case B Balmer
ratios at 104 K. Cullen+25 treat AV as a 6th
free parameter and recompute temperature-dependent Balmer ratios at
each likelihood evaluation, providing self-consistent dust
correction.
2. **Single Te zone in the forward model.** The forward model
uses a single `log_Te` parameter for all CEL emissivities.
Cullen+25 apply a Te–Te relation to derive
Te([OII]) from Te([OIII]) for the
low-ionisation zone. In practice this difference is absorbed into
the fitted ionic abundances when both O+ and O2+
are free parameters.
3. **Prior bounds differ** (see Section 3.5.1). Cullen+25 use
log Te ∈ [3.0, 4.52] and log ne ∈ [1.0, 3.0].
---
### 3.6 Broad Balmer component handling
When a `BroadFitResult` or `MCMCBroadFitResult` is passed to
`compute_abundances()`, broad Balmer component fluxes (Hα, Hβ,
Hγ, Hδ) are automatically **summed** with the narrow component
fluxes. This ensures the total hydrogen flux is used for the Balmer
decrement and abundance normalisation
(Iline / IHβ).
For MCMC results, broad and narrow posteriors are summed
sample-by-sample to preserve correlations.
---
## 4. Full bibliography
### Atomic physics and emissivities
- Aggarwal, K. M. & Keenan, F. P. 1999, ApJS, 123, 311 — O2+ collision strengths
- Aller, L. H. 1984, *Physics of Thermal Gaseous Nebulae* (Reidel) — Hβ emissivity formula
- Froese Fischer, C. & Tachiev, G. 2004, ADNDT, 87, 1 — Transition probabilities
- Kisielius, R. et al. 2009, ApJ, 694, 1209 — O+ collision strengths
- Luridiana, V., Morisset, C. & Shaw, R. A. 2015, A&A, 573, A42 — PyNEB
- Storey, P. J. & Hummer, D. G. 1995, MNRAS, 272, 41 — H I recombination
- Storey, P. J., Sochi, T. & Badnell, N. R. 2014, MNRAS, 441, 3028 — O2+ collision strengths
- Storey, P. J. & Zeippen, C. J. 2000, MNRAS, 312, 813 — [NII], [OIII] transition probabilities
- Tayal, S. S. 2011, ApJS, 195, 12 — N+ collision strengths
- Tayal, S. S. & Zatsarinny, O. 2010, ApJS, 188, 32 — S+ collision strengths
### Dust attenuation and extinction
- Calzetti, D. et al. 2000, ApJ, 533, 682 — Starburst attenuation law
- Cardelli, J. A., Clayton, G. C. & Mathis, J. S. 1989, ApJ, 345, 245 — MW extinction
- Noll, S. et al. 2009, A&A, 507, 1793 — UV bump + slope modification
- Osterbrock, D. E. & Ferland, G. J. 2006, *Astrophysics of Gaseous Nebulae and Active Galactic Nuclei*, 2nd ed. — Case B Balmer ratios
- Salim, S. et al. 2018, ApJ, 859, 11 — Modified Calzetti parameters
### Electron temperature and Te–Te relations
- Andrews, B. H. & Martini, P. 2013, ApJ, 765, 140 — Te–Te correction
- Campbell, A., Terlevich, R. & Melnick, J. 1986, MNRAS, 223, 811 — Original Te–Te relation
- Curti, M. et al. 2017, MNRAS, 465, 1384 — Te–Te discrepancy
- DESI Collaboration 2025, arXiv:2601.02463 — DESI DR2 Te–Te relation
- Garnett, D. R. 1992, AJ, 103, 1330 — Classical Te–Te relation
- Nicholls, D. C. et al. 2014, ApJ, 786, 155 — Te–Te relations
- Pagel, B. E. J. et al. 1992, MNRAS, 255, 325 — Te–Te relation
- Pilyugin, L. S. et al. 2009, A&A, 508, 1043 — Te–Te relation
- Pilyugin, L. S. et al. 2010, ApJ, 720, 1738 — Te–Te discrepancy
### Electron density
- Isobe, Y. et al. 2023, ApJ, 956, 139 — Density insensitivity at low ne
### Ionic abundances and ICFs
- Amayo, A. et al. 2021, MNRAS, 505, 2361 — N/O = N+/O+ justification
- Asplund, M. et al. 2009, ARA&A, 47, 481 — Solar abundances
- Berg, D. A. et al. 2021, ApJ, 922, 170 — O3+ contribution negligible
- Garnett, D. R. et al. 1997, ApJ, 489, 63 — C/O ionisation correction factor
- Izotov, Y. I. et al. 2006, A&A, 448, 955 — ICFs (N, Ne, S, Ar)
- Nava, A. et al. 2006, ApJ, 645, 1076 — N/O = N+/O+ justification
### Strong-line calibrations
- Sanders, R. L. et al. 2025, ApJ (submitted) — Simultaneous O3/O2/R23/O32 calibrations
### Forward modelling
- Cullen, F. et al. 2025, MNRAS (EXCELS) — Forward modelling approach
- Scholte, D. et al. 2025, MNRAS (EXCELS) — Methodology section 4.3
### MCMC and sampling
- Foreman-Mackey, D. et al. 2013, PASP, 125, 306 — emcee
- Foreman-Mackey, D. 2016, JOSS, 1, 24 — corner.py
- Gelman, A. & Rubin, D. B. 1992, Statistical Science, 7, 457 — R̂
- Koposov, S. et al. 2022, doi:10.5281/zenodo.7388523 — dynesty
- Lange, J. U. 2023, MNRAS, 525, 3181 — nautilus
- Speagle, J. S. 2020, MNRAS, 493, 3132 — dynesty
### IGM and Lyman-alpha
- Inoue, A. K. et al. 2014, MNRAS, 442, 1805 — IGM transmission model
### Line database
- Morton, D. C. 2003, ApJS, 149, 205 — UV line wavelengths
- NIST Atomic Spectra Database — Rest-frame vacuum wavelengths
### General
- Harris, C. R. et al. 2020, Nature, 585, 357 — NumPy
- Hunter, J. D. 2007, Computing in Science & Engineering, 9, 90 — matplotlib
- Virtanen, P. et al. 2020, Nature Methods, 17, 261 — SciPy
- Astropy Collaboration 2022, ApJ, 935, 167 — Astropy