# References and methodology Comprehensive documentation of all equations, algorithms, software dependencies, and literature references used across the `jwspecfit`, `jwspecmcmc`, and `jwspecabund` packages. --- ## Software dependencies | Package | Version | Role | Reference | |---------|---------|------|-----------| | NumPy | >= 1.24 | Array operations | Harris et al. (2020) | | SciPy | >= 1.10 | Optimisation, special functions | Virtanen et al. (2020) | | Astropy | >= 5.0 | Units, FITS I/O | Astropy Collaboration (2022) | | matplotlib | >= 3.6 | Static plotting | Hunter (2007) | | plotly | >= 5.0 | Interactive plotting | Plotly Technologies Inc. | | lmfit | >= 1.2 | Parameter management | Newville et al. (2014) | | tqdm | >= 4.60 | Progress bars | da Costa-Luis (2019) | | joblib | >= 1.2 | Parallel bootstrap | Joblib Development Team | | emcee | >= 3.1 | Affine-invariant MCMC | Foreman-Mackey et al. (2013) | | nautilus | >= 1.0 | Importance nested sampling | Lange (2023) | | dynesty | >= 2.0 | Dynamic nested sampling | Speagle (2020); Koposov et al. (2022) | | PyNEB | >= 1.1.25 | Atomic physics, emissivities | Luridiana et al. (2015) | | corner | >= 2.2 | Corner plots | Foreman-Mackey (2016) | --- ## 1. Emission-line fitting (`jwspecfit`) ### 1.1 Line profiles Emission lines are modelled as Gaussians bin-averaged over each spectral pixel using the error function: $$ \bar{G}_i = \frac{A}{2\,\Delta\lambda_i} \left[ \operatorname{erf}\!\left(\frac{\lambda_{i+1} - \mu}{\sigma\sqrt{2}}\right) - \operatorname{erf}\!\left(\frac{\lambda_i - \mu}{\sigma\sqrt{2}}\right) \right] $$ where *A* is the integrated flux (erg s−1 cm−2), *μ* is the line centroid (Å), *σ* is the Gaussian width (Å), and λi, λi+1 are the pixel edges. This avoids sampling bias when lines are narrower than pixels (prism regime, R ∼ 30). ### 1.2 Spectral resolution The instrumental line spread function is assumed Gaussian with σinst = λ / (R × 2.3548). The resolving power *R* is determined as follows: | Source | R(λ) | Reference | |--------|------|-----------| | PRISM/CLEAR | R = 50 + 50(λ − 1) + 15(λ − 1)2 (µm), clipped to [30, 300] | Heuristic fit to NIRSpec instrument model | | G140M, G235M, G395M | R = 1000 | NIRSpec documentation | | G140H, G235H, G395H | R = 2700 | NIRSpec documentation | | Pixel estimate | R ≈ λ / (2Δλ) | Fallback when grating unknown | ### 1.3 Continuum subtraction Iterative sigma-clipped polynomial fit: 1. Mask pixels within ±6σinst of every expected emission line and blueward of NV 1239 Å (Lyman break region). 2. Fit weighted polynomial of degree *d* (default *d* = 2) to unmasked pixels. 3. Compute residuals ri = fi − Ci. 4. Reject pixels with ri > k σ̂ (default *k* = 2.5) to iteratively clip positive outliers (emission). 5. Repeat steps 2–4 for 5 iterations. ### 1.4 Optimisation The model is fit using `scipy.optimize.least_squares` with the Trust Region Reflective (TRF) algorithm, Jacobian-based parameter scaling (`x_scale='jac'`), and box constraints on all parameters. ### 1.5 Parameter constraints | Constraint | Implementation | Reference | |------------|---------------|-----------| | [NII] doublet ratio | A(6549) = A(6585) / 2.96 | Storey & Zeippen (2000) | | Balmer–OIII width tying | σv(Hα, Hβ, Hγ, Hδ, [NII]) = σv([OIII] 5007) | Assumed common NLR kinematics | | Centroid tying | Same lines, velocity-space centroid tied to [OIII] | | | Broad Balmer centroids | Tied to narrow counterpart | | ### 1.6 Broad Balmer component detection BIC-based model selection compares four models: | Model | Components | |-------|-----------| | narrow | Narrow lines only | | broad1 | + intermediate broad (FWHM ∼ 500–2000 km s−1) | | broad2 | + very broad / BLR (FWHM ∼ 2000–5000 km s−1) | | both | + both broad components | The Bayesian Information Criterion is: $$ \mathrm{BIC} = k \ln n - 2 \ln \hat{\mathcal{L}} $$ where *k* is the number of free parameters, *n* is the number of data points, and L̂ is the maximised likelihood. A more complex model is accepted if ΔBIC > 6 (default threshold). Robust comparison uses bootstrapped BIC distributions (nboot,BIC = 100). ### 1.7 Bootstrap uncertainties 1. Perturb flux: f′i = fi + σi × εi, where εi ∼ 𝒩(0, 1). 2. Refit with `least_squares` using the best-fit parameters as initial guess. 3. Repeat Nboot times (default 1000). 4. Report σ̂(flux) from the bootstrap distribution for each line. Parallelised via `joblib` (`n_jobs = -1` uses all cores). ### 1.8 Lyman-alpha modelling Lyα is modelled as a skewed Gaussian (Owen 1956; `scipy.stats.skewnorm`) attenuated by mean IGM transmission: $$ F_{\mathrm{Ly}\alpha}(\lambda) = S(\lambda;\, A,\, \mu,\, \sigma,\, \alpha) \times T_\mathrm{IGM}(\lambda,\, z) $$ where *S* is the bin-averaged skewed Gaussian (positive skew = red tail) and TIGM is the mean IGM transmission. **IGM transmission** follows Inoue et al. (2014), accounting for Lyman-series line absorption (LAF) and damped Lyman-alpha system (DLA) continuum absorption. Coefficients are loaded from their Table 2 (shipped as `data/inoue2014_table2.txt`). ### 1.9 Line database Rest-frame vacuum wavelengths from NIST Atomic Spectra Database, Morton (2003), and Storey & Zeippen (2000). 34 lines from Lyα (1215.67 Å) through [SII] 6732 (6734.53 Å). --- ## 2. MCMC fitting (`jwspecmcmc`) ### 2.1 Likelihood Gaussian log-likelihood identical to the least-squares objective: $$ \ln \mathcal{L} = -\frac{1}{2} \sum_i \left(\frac{f_i - C_i - M_i(\boldsymbol{\theta})}{\sigma_i}\right)^2 $$ where fi is the observed flux, Ci the continuum, Mi the emission-line model, and σi the flux uncertainty. ### 2.2 Priors Default priors are uniform within the same bounds used by `least_squares`. Per-parameter overrides are supported: | Prior type | PDF | Use case | |-----------|-----|----------| | `UniformPrior(lo, hi)` | π(θ) ∝ 1 for θ ∈ [lo, hi] | Default for all parameters | | `GaussianPrior(mean, std, lo, hi)` | Truncated Gaussian | Informative priors | | `LogUniformPrior(lo, hi)` | π(θ) ∝ 1/θ for θ ∈ [lo, hi] | Scale-invariant parameters | ### 2.3 Samplers **emcee** (Foreman-Mackey et al. 2013): affine-invariant ensemble sampler with stretch moves. Walkers are initialised from a tight ball around the least-squares MLE. **nautilus** (Lange 2023): importance nested sampler using neural network-based proposal distributions. Uses `n_eff` target effective samples. ### 2.4 Convergence diagnostics | Diagnostic | Criterion | Reference | |-----------|-----------|-----------| | Gelman–Rubin R̂ | R̂ < 1.05 for all parameters | Gelman & Rubin (1992) | | Effective sample size (ESS) | ESS > 100 for all parameters | Via FFT autocorrelation | ### 2.5 Posterior summaries Point estimates: medians. Uncertainties: (16th, 84th) percentile half-widths, reported as asymmetric (−δlo, +δhi) credible intervals. --- ## 3. Chemical abundances (`jwspecabund`) ### 3.1 Overview of methods Three independent methods for gas-phase metallicity: | Method | When used | Lines required | Uncertainty propagation | |--------|----------|---------------|----------------------| | Direct Te | [OIII] 4363 detected (SNR ≥ 3) | [OIII] 4363, 5007, 4959, Hβ, + Balmer pair for dust | MC perturbation or MCMC posterior propagation through PyNEB | | Bayesian forward model | User-selected (`method="forward"`) | Any subset of optical CELs + Hβ | Full MCMC/nested sampling of ionic abundances, Te, ne | | Strong-line calibrations | No auroral line | [OIII] 5007, Hβ, optionally [OII], Hα | MC perturbation including calibration scatter | ### 3.2 Dust correction Dust correction is applied to all emission-line fluxes before abundance calculations (for the direct and strong-line methods). AV is derived from the Balmer decrement when not supplied. #### 3.2.1 AV from the Balmer decrement $$ A_V = \frac{2.5 \log_{10}(R_\mathrm{obs} / R_\mathrm{int})}{f(\lambda_\mathrm{den}) - f(\lambda_\mathrm{num})} $$ where Robs is the observed Balmer ratio (Hα/Hβ or Hγ/Hβ), Rint is the intrinsic Case B ratio, and f(λ) = A(λ) / AV is the normalised attenuation curve. **Intrinsic Balmer ratios** (Case B, Te = 104 K, ne = 100 cm−3; Osterbrock & Ferland 2006): | Ratio | Value | |-------|-------| | Hα/Hβ | 2.86 | | Hγ/Hβ | 0.468 | | Hδ/Hβ | 0.259 | **Note on temperature dependence:** The intrinsic Balmer ratios are temperature-dependent. At Te = 2.5 × 104 K the Hα/Hβ ratio is ∼2.72, not 2.86. The current implementation uses fixed ratios at 104 K for the external dust correction. See Section 3.5 (forward model) for self-consistent treatment where AV is a free parameter and Balmer ratios are recomputed at each model Te. #### 3.2.2 Salim+18 / Noll+09 attenuation (default) Modified Calzetti et al. (2000) attenuation with a UV bump and power-law slope deviation (Noll et al. 2009; Salim et al. 2018): $$ A(\lambda) = E(B{-}V) \left[ k'_\mathrm{Calz}(\lambda) \left(\frac{\lambda}{0.55\,\mu\mathrm{m}}\right)^\delta + B \cdot D(\lambda) \cdot R_V \right] $$ where: - k′Calz(λ) is the Calzetti et al. (2000) starburst curve, - D(λ) is a Drude profile at 2175 Å modelling the UV bump: D(λ) = (λγ)2 / [(λ2 − λ02)2 + (λγ)2], - δ is the slope deviation (default −0.35), - *B* is the UV bump strength (default 2.27), - RV = AV / E(B−V) (default 3.15). **Default parameters** (RV = 3.15, δ = −0.35, *B* = 2.27) follow the median values for star-forming galaxies from Salim et al. (2018). **References:** - Calzetti, D. et al. 2000, ApJ, 533, 682 - Noll, S. et al. 2009, A&A, 507, 1793 - Salim, S. et al. 2018, ApJ, 859, 11 #### 3.2.3 Cardelli+89 extinction Milky Way extinction curve (Cardelli, Clayton & Mathis 1989): $$ A(\lambda) = A_V \left[ a(x) + \frac{b(x)}{R_V} \right] $$ where x = 1/λ (µm−1) and a(x), b(x) are piecewise polynomials for the IR (0.3 ≤ x < 1.1), optical/NIR (1.1 ≤ x < 3.3), and UV (3.3 ≤ x < 8.0) regimes. Default RV = 3.1. **Reference:** Cardelli, J. A., Clayton, G. C. & Mathis, J. S. 1989, ApJ, 345, 245 #### 3.2.4 Flux correction Observed fluxes are corrected as: $$ F_\mathrm{corr} = F_\mathrm{obs} \times 10^{0.4\,A(\lambda)} $$ Errors are scaled by the same factor. When MCMC posteriors are available, each posterior sample is corrected individually. --- ### 3.3 Direct Te method The direct method uses the temperature-sensitive auroral-to-nebular line ratio to determine electron temperature, breaking the degeneracy between temperature and ionic abundance inherent in strong-line methods. This is the gold standard for gas-phase metallicity when the auroral line is detected. #### 3.3.1 Two-zone ionisation model A two-zone model is assumed: | Zone | Traced by | Temperature | Lines | |------|----------|-------------|-------| | High ionisation | O2+, Ne2+ | Te(high) = Te([OIII]) | [OIII] 4363, 4959, 5007; [NeIII] 3869 | | Low ionisation | O+, N+, S+ | Te(low) = Te([OII]) | [OII] 3726, 3729; [NII] 6585; [SII] 6718, 6732 | Intermediate ions (S2+, Ar2+) use Tmid = (Thigh + Tlow) / 2. The contribution of O3+/H+ to the total oxygen abundance is assumed negligible, following Berg et al. (2021). #### 3.3.2 Electron temperature: Te([OIII]) Determined from the [OIII] auroral-to-nebular ratio using PyNEB: $$ R_{\mathrm{[OIII]}} = \frac{I(4363)}{I(5007) + I(4959)} $$ PyNEB solves for Te numerically using atomic data for the O2+ ion. We use `getTemDen()` with `to_eval="(L(4363))/(L(5007)+L(4959))"`, `log=True`, `start_x=3.0`, `end_x=5.0` for robust root-finding across the range 103–105 K (essential for metal-poor galaxies at z > 4 where Te > 25 000 K is common). **Underlying physics:** For a collisionally excited line (CEL) the emissivity depends on Te through the Boltzmann factor. For a two-level simplification (Osterbrock & Ferland 2006, Chapter 5), the emissivity of a transition from upper level *j* to lower level *i* is: $$ \varepsilon_{ji} = n_j A_{ji} h\nu_{ji} $$ where Aji is the spontaneous transition probability and nj is the population of level *j*, determined by solving the statistical equilibrium equations. In the low-density limit (ne ≪ ncrit), every collisional excitation is followed by radiative de-excitation, and the emissivity ratio of the auroral (1S01D2, 4363 Å) to nebular (1D23P1,2, 4959+5007 Å) transitions depends primarily on Te: $$ R_\text{[OIII]} = \frac{j_{4363}}{j_{4959} + j_{5007}} = \frac{A_{4363}}{A_{4959} + A_{5007}} \frac{\Omega(^1D_2,\, ^1S_0)}{\Omega(^3P,\, ^1D_2)} \exp\!\left(-\frac{\Delta E_{43}}{k_B T_e}\right) $$ where Ω are the collision strengths, ΔE43 = E(1S0) − E(1D2) = 3.29 eV is the energy gap between the auroral and nebular upper levels, and kB is the Boltzmann constant. This gives the strong exponential sensitivity to Te. PyNEB solves the full multi-level (typically 5-level) statistical equilibrium problem numerically, accounting for all collisional and radiative transitions and the actual density dependence. The 5-level model for O2+ includes the ground term 3P0,1,2 and excited terms 1D2 and 1S0. The statistical equilibrium equations are: $$ \sum_{j \neq i} n_j \left[ A_{ji} + n_e q_{ji}(T_e) \right] = n_i \sum_{j \neq i} \left[ A_{ij} + n_e q_{ij}(T_e) \right] $$ for each level *i*, where qij is the collisional rate coefficient: $$ q_{ij}(T_e) = \frac{8.629 \times 10^{-6}}{g_i\, T_e^{1/2}}\, \Omega_{ij}(T_e) \quad \text{cm}^3\,\text{s}^{-1} $$ with gi the statistical weight of level *i* and Ωij(Te) the thermally averaged (Maxwellian) collision strength. PyNEB interpolates tabulated Ωij(Te) from the atomic data files. #### 3.3.3 Electron temperature: Te([NII]) When [NII] 5756 is detected, Te(N+) is measured directly from the [NII] 5756/6585 ratio via PyNEB, providing an independent low-ionisation zone temperature: $$ R_\text{[NII]} = \frac{I(5756)}{I(6549) + I(6585)} $$ The same 5-level statistical equilibrium approach applies, with the N+ energy gap ΔE = E(1S0) − E(1D2) = 4.05 eV (the [NII] auroral line lies at shorter wavelength than [OIII] 4363 because the energy gap is larger). This ratio is less commonly detected because [NII] 5756 is intrinsically fainter than [OIII] 4363. #### 3.3.4 Te–Te relations When Te([OII]) cannot be measured directly (the usual case), the low-ionisation zone temperature is derived from the high-ionisation zone temperature via empirical relations. **Available relations:** | Name | Equation | Reference | Notes | |------|---------|-----------|-------| | `"desi"` | Te(low) = 0.648 Te(high) + 3270 | DESI DR2 (arXiv:2601.02463) | Calibrated on the DESI Early Data Release galaxy sample | | `"classical"` | Te(low) = 0.7 Te(high) + 3000 | Garnett (1992) | Widely used; original form from Campbell, Terlevich & Melnick (1986) | The Te–Te relation is a significant source of systematic uncertainty. Other relations in the literature include: | Equation | Reference | |---------|-----------| | T2 = 0.7 T3 + 1500 | Campbell et al. (1986) / Garnett (1992), corrected by Andrews & Martini (2013) | | Various parametric forms | Izotov et al. (2006); Pilyugin et al. (2009); Nicholls et al. (2014) | | T2 = 1.0807 T3 − 0.0846 T32 (T in 104 K) | Pagel et al. (1992) | The ∼1000–2000 K discrepancy between theoretically and empirically calibrated relations (Andrews & Martini 2013; Curti et al. 2017) arises partly from the difference between individual H II regions and galaxy-integrated spectra, where flux originates from a mixture of H II regions with varying properties plus diffuse ionised gas. #### 3.3.5 Electron density ne is derived from density-sensitive doublet ratios via PyNEB: | Doublet | Lines | Diagnostic ratio | |---------|-------|-----------------| | [SII] | 6718, 6732 | I(6718) / I(6732) | | [OII] | 3726, 3729 | I(3726) / I(3729) | [SII] is preferred when available; [OII] is used as a fallback. If neither doublet is detected, ne = 100 cm−3 is assumed (abundance diagnostics are insensitive to ne at ne ≲ 104 cm−3; e.g. Isobe et al. 2023). **Physical basis:** The [SII] and [OII] doublets arise from transitions between levels within the ground-term fine-structure splitting (4S3/2, 2D3/2,5/2 for S+; analogous for O+). The two lines in each doublet have different critical densities, so their ratio depends on ne: $$ \frac{I(\lambda_1)}{I(\lambda_2)} = \frac{n_{j_1}\,A_{j_1 \to i_1}\,\nu_1}{n_{j_2}\,A_{j_2 \to i_2}\,\nu_2} $$ At low density (ne ≪ ncrit) the ratio approaches the ratio of collision strengths; at high density (ne ≫ ncrit) it approaches the ratio of statistical weights. The transition occurs around ncrit ∼ 103–104 cm−3, making the doublet ratio a sensitive density probe in this regime. PyNEB uses the full multi-level solution to extract ne from the observed ratio at the adopted Te. #### 3.3.6 Ionic abundances Ionic abundances Xi+/H+ are computed via PyNEB's `getIonAbundance()` method for Te ≤ 30 000 K. For Te > 30 000 K (beyond the Storey & Hummer 1995 H I recombination tables bundled with PyNEB), we compute abundances manually: $$ \frac{X^{i+}}{\mathrm{H}^+} = \frac{I_\mathrm{line}}{I_{\mathrm{H}\beta}} \times \frac{\varepsilon_{\mathrm{H}\beta}(T_e, n_e)}{\varepsilon_\mathrm{line}(T_e, n_e)} $$ where εline is obtained from PyNEB's CEL emissivity tables (valid at any temperature) and ε uses the Aller (1984) formula (Section 3.5.3) which has no upper temperature limit. The CEL emissivity is computed from the level populations obtained by solving the statistical equilibrium equations (Section 3.3.2): $$ \varepsilon_\mathrm{line}(T_e, n_e) = n_{X^{i+}} \, n_e \, q_{ij}(T_e) \, \frac{A_{ji}}{A_{ji} + n_e\,q_{ji}(T_e)} \, h\nu_{ji} $$ In the low-density limit this simplifies to ε ∝ nXi+ ne qij(Te) hνji, and the ionic abundance reduces to: $$ \frac{X^{i+}}{\mathrm{H}^+} = \frac{I_\mathrm{line}}{I_{\mathrm{H}\beta}} \times \frac{\alpha_{\mathrm{H}\beta}^\mathrm{eff}(T_e)}{q_{ij}(T_e)\,h\nu_{ji} / (h\nu_{\mathrm{H}\beta})} $$ where αeff is the effective recombination coefficient for Hβ (see Section 3.5.3). PyNEB handles the full density-dependent multi-level solution internally. **Ions computed:** | Ion | Line(s) | Zone | PyNEB atom | |-----|---------|------|-----------| | O2+/H+ | [OIII] 5007 | High | `Atom("O", 3)` | | O+/H+ | [OII] 3726+3729 | Low | `Atom("O", 2)` | | N+/H+ | [NII] 6585 | Low | `Atom("N", 2)` | | S+/H+ | [SII] 6718+6732 | Low | `Atom("S", 2)` | | S2+/H+ | [SIII] 9069 | Mid | `Atom("S", 3)` | | Ne2+/H+ | [NeIII] 3869 | High | `Atom("Ne", 3)` | | Ar2+/H+ | [ArIII] 7136 | Mid | `Atom("Ar", 3)` | #### 3.3.7 Atomic data PyNEB (Luridiana et al. 2015) is used for all atomic physics calculations. The default atomic and collisional data shipped with PyNEB ≥ 1.1.25 include: - **Transition probabilities:** Froese Fischer & Tachiev (2004) for most ions; Storey & Zeippen (2000) for [NII], [OIII]. - **Collision strengths:** - O+: Kisielius et al. (2009) - O2+: Aggarwal & Keenan (1999) / Storey, Sochi & Badnell (2014) - N+: Tayal (2011) - S+: Tayal & Zatsarinny (2010) - **H I recombination:** Storey & Hummer (1995), valid to 30 000 K. Users requiring specific atomic data versions (e.g. to match a particular paper) should configure PyNEB's data files directly via `pn.atomicData.setDataFile()`. #### 3.3.8 Total abundances and ICFs **Oxygen:** $$ \frac{\mathrm{O}}{\mathrm{H}} = \frac{\mathrm{O}^+}{\mathrm{H}^+} + \frac{\mathrm{O}^{2+}}{\mathrm{H}^+} $$ No ICF is applied for higher ionisation states; the O3+ contribution is negligible for typical H II regions (Berg et al. 2021). **Nitrogen:** $$ \frac{\mathrm{N}}{\mathrm{O}} = \mathrm{ICF}_\mathrm{N} \times \frac{\mathrm{N}^+/\mathrm{H}^+}{\mathrm{O}^+/\mathrm{H}^+} $$ where ICFN is from Izotov et al. (2006, eq. 18): $$ \mathrm{ICF}_\mathrm{N} = -0.825\,x^2 + 0.718\,x + 0.853, \qquad x = \mathrm{O}^+ / \mathrm{O}_\mathrm{total} $$ clamped to ICFN ≥ 1. For many applications the standard assumption N/O ≈ N+/O+ (ICF = 1) is used, justified by the similar ionisation potentials of O+ and N+ (e.g. Nava et al. 2006; Amayo et al. 2021). **Sulfur:** $$ \mathrm{S/O} = \mathrm{ICF}_\mathrm{S} \times (\mathrm{S}^+ + \mathrm{S}^{2+}) / \mathrm{O} $$ ICF from Izotov et al. (2006, eq. 20), accounting for S3+: $$ \mathrm{ICF}_\mathrm{S} = 0.013 + 5.986\,x - 21.085\,x^2 + 26.462\,x^3 - 11.282\,x^4 $$ clamped to ICFS ≥ 1. **Neon:** $$ \mathrm{Ne/O} = \mathrm{ICF}_\mathrm{Ne} \times \mathrm{Ne}^{2+}/\mathrm{O}^{2+} $$ ICF from Izotov et al. (2006, eq. 19): $$ \mathrm{ICF}_\mathrm{Ne} = -0.385\,x^2 + 0.405\,x + 0.335 $$ **Argon:** $$ \mathrm{Ar/O} = \mathrm{ICF}_\mathrm{Ar} \times \mathrm{Ar}^{2+}/\mathrm{O}^{2+} $$ ICF from Izotov et al. (2006, eqs. 22/23): $$ \mathrm{ICF}_\mathrm{Ar} = 0.157 + 3.119\,x - 6.185\,x^2 + 4.517\,x^3 $$ clamped to ICFAr ≥ 1. In all ICF expressions, x = O+ / Ototal is the fractional ionic abundance of O+. **Reporting convention:** $$ 12 + \log(\mathrm{O/H}) = 12 + \log_{10}\!\left(\frac{\mathrm{O}^+/\mathrm{H}^+ + \mathrm{O}^{2+}/\mathrm{H}^+}{1}\right) $$ $$ \log(\mathrm{N/O}) = \log_{10}\!\left(\mathrm{ICF}_\mathrm{N} \times \frac{\mathrm{N}^+/\mathrm{H}^+}{\mathrm{O}^+/\mathrm{H}^+}\right) $$ The solar oxygen abundance is 12 + log(O/H) = 8.69 (Asplund et al. 2009). **Reference:** Izotov, Y. I., Stasinska, G., Meynet, G., Guseva, N. G. & Thuan, T. X. 2006, A&A, 448, 955 #### 3.3.9 Uncertainty propagation **Bootstrap / least-squares results:** Gaussian Monte Carlo perturbation (NMC = 1000 by default): 1. For each iteration, perturb every line flux: F′line = max(Fline + σline × ε, 10−50), where ε ∼ 𝒩(0, 1). 2. Recompute Te, ionic abundances, totals through the full PyNEB pipeline. 3. Report σ̂ of the resulting 12+log(O/H) and log(N/O) distributions. **MCMC results:** Each posterior sample's line fluxes are propagated through the full PyNEB pipeline. Posteriors are thinned to Nposterior samples (default 1000) via random subsampling to keep runtime manageable (∼7 ms per PyNEB evaluation). --- ### 3.4 Strong-line calibrations (Sanders+25) Used when the [OIII] 4363 auroral line is not detected. Strong-line methods use empirical calibrations between bright emission-line ratios and metallicity, calibrated on samples of galaxies with direct Te measurements. #### 3.4.1 Diagnostic ratios $$ \mathrm{O3} = \log_{10}\!\left(\frac{[\mathrm{OIII}]\,5007}{\mathrm{H}\beta}\right), \qquad \mathrm{O2} = \log_{10}\!\left(\frac{[\mathrm{OII}]}{\mathrm{H}\beta}\right) $$ $$ \mathrm{R23} = \log_{10}\!\left(\frac{[\mathrm{OIII}]\,5007 + [\mathrm{OII}]}{\mathrm{H}\beta}\right), \qquad \mathrm{O32} = \log_{10}\!\left(\frac{[\mathrm{OIII}]\,5007}{[\mathrm{OII}]}\right) $$ [OII] denotes the total [OII] λλ3726,3729 flux (resolved or unresolved). #### 3.4.2 Calibration polynomials Each diagnostic ratio *R* is modelled as a polynomial in Z = 12 + log(O/H): $$ R_\mathrm{model}(Z) = \sum_{k=0}^{n} c_k \,(Z - Z_\mathrm{ref})^k, \qquad Z_\mathrm{ref} = 8.0 $$ **Coefficients (Sanders et al. 2025, Table 3):** | Ratio | c0 | c1 | c2 | c3 | c4 | Zmin | Zmax | |-------|-------|-------|-------|-------|-------|-------|-------| | O3 | 0.852 | −0.162 | −1.149 | −0.553 | — | 7.3 | 8.6 | | O2 | 0.172 | 0.954 | −0.832 | — | — | 7.3 | 8.6 | | R23 | 0.998 | 0.053 | −0.141 | −0.493 | −0.774 | 7.3 | 8.6 | | O32 | 0.697 | −1.245 | −0.869 | — | — | 7.3 | 8.6 | #### 3.4.3 Simultaneous chi-squared fit The best-fit metallicity Zbest is found by minimising the simultaneous chi-squared across all available diagnostics: $$ \chi^2(Z) = \sum_{m \in \mathrm{diag}} \frac{(R_\mathrm{obs}^{(m)} - R_\mathrm{model}^{(m)}(Z))^2}{\sigma_{\mathrm{obs},m}^2 + \sigma_{\mathrm{cal},m}^2} $$ where σobs is the measurement error on log10 of the ratio (Gaussian error propagation) and σcal is the total calibration scatter in ratio space: $$ \sigma_\mathrm{cal} = \sqrt{\sigma_{R,\mathrm{fit}}^2 + \sigma_{R,\mathrm{int}}^2} $$ Minimisation uses `scipy.optimize.minimize_scalar` with method `"bounded"` on Z ∈ [6.5, 9.0]. #### 3.4.4 Three scatter components Sanders+25 report three distinct scatter values for each diagnostic: | Symbol | Meaning | Role in code | |--------|---------|-------------| | σR,fit | RMS residual of the polynomial fit to the calibration sample | Combined into σcal | | σR,int | Intrinsic scatter in the ratio at fixed Z (physical scatter from galaxy-to-galaxy variations) | Combined into σcal | | σO/H,int | Intrinsic scatter in 12+log(O/H) at fixed ratio (the actual metallicity uncertainty floor per diagnostic) | Stored for reference | **Values:** | Ratio | σR,fit | σR,int | σcal | σO/H,int | |-------|------|------|------|------| | O3 | 0.04 | 0.13 | 0.136 | 0.14 | | O2 | 0.03 | 0.25 | 0.252 | 0.22 | | R23 | 0.03 | 0.07 | 0.076 | 0.13 | | O32 | 0.09 | 0.29 | 0.304 | 0.25 | #### 3.4.5 Uncertainty propagation **Bootstrap / least-squares results:** Monte Carlo perturbation (NMC = 1000 by default). Each iteration: 1. Perturb each observed log-ratio by 𝒩(0, √(σobs2 + σcal2)) — both measurement and calibration scatter are included. 2. Re-minimise the chi-squared to find a perturbed Z. 3. Report (16th, 84th) percentiles of the Z distribution. **MCMC results:** Each posterior sample provides a set of fluxes. For each sample: 1. Compute line ratios from the posterior fluxes. 2. Perturb each ratio by 𝒩(0, σcal) to propagate calibration scatter. 3. Minimise to find Z for that sample. The MCMC posterior spread captures measurement uncertainty, and the σcal perturbation adds the calibration scatter on top. Posteriors are thinned to Nposterior samples (default 1000). **Reference:** Sanders, R. L. et al. 2025, ApJ (submitted) --- ### 3.5 Bayesian forward model (Cullen+25 approach) The forward model simultaneously constrains ionic abundances, electron temperature, and electron density by predicting emission-line flux ratios and comparing to observations via MCMC or nested sampling. #### 3.5.1 Free parameters and priors | Parameter | Symbol | Range | Prior | Unit | |-----------|--------|-------|-------|------| | Electron temperature | log10 Te | [3.5, 5.0] | Uniform | K | | Electron density | log10 ne | [0.0, 5.0] | Uniform | cm−3 | | Per-ion abundance | log10(Xi+/H+) | [−7, −2] (oxygen); [−8, −3] (N, Ne); [−9, −4] (S, Ar) | Uniform in log (= log-uniform in linear) | — | Only ions with at least one detected emission line are included as free parameters. Typically 4–8 free parameters depending on the available lines. #### 3.5.2 Model prediction For each parameter vector **θ**, the predicted line/Hβ flux ratio is: $$ \frac{F_\mathrm{line}}{F_{\mathrm{H}\beta}} = \frac{X^{i+}}{\mathrm{H}^+} \times \frac{\varepsilon_\mathrm{line}(T_e, n_e)}{\varepsilon_{\mathrm{H}\beta}(T_e, n_e)} $$ where εline is the collisionally excited line (CEL) emissivity from PyNEB's `Atom.getEmissivity()`, and ε is the Hβ recombination emissivity. For lines consisting of multiple transitions (e.g. [OII] doublet), the emissivities are summed. #### 3.5.3 Hβ emissivity For Te ≤ 30 000 K, the Storey & Hummer (1995) H I recombination tables in PyNEB are used via `RecAtom("H", 1).getEmissivity(Te, ne, wave=4861)`. For Te > 30 000 K, the Aller (1984) Case B formula is used: $$ \alpha_{\mathrm{H}\beta}^{\mathrm{eff}} = 3.03 \times 10^{-14} \left(\frac{T_e}{10^4\,\mathrm{K}}\right)^{-0.874} \;\mathrm{cm}^3\,\mathrm{s}^{-1} $$ $$ \varepsilon_{\mathrm{H}\beta} = \alpha_{\mathrm{H}\beta}^{\mathrm{eff}} \times h\nu_{\mathrm{H}\beta} $$ This formula has no upper temperature limit, enabling abundance calculations for the extremely metal-poor, high-Te galaxies found at z > 4 with JWST. **References:** - Aller, L. H. 1984, *Physics of Thermal Gaseous Nebulae* (Reidel) - Storey, P. J. & Hummer, D. G. 1995, MNRAS, 272, 41 #### 3.5.4 Likelihood Gaussian log-likelihood on line/Hβ flux ratios: $$ \ln \mathcal{L}(\boldsymbol{\theta}) = -\frac{1}{2} \sum_{\mathrm{lines}\, l} \frac{(R_{\mathrm{obs},l} - R_{\mathrm{model},l}(\boldsymbol{\theta}))^2}{\sigma_{R,l}^2} $$ where Robs,l = Fl / F and σR,l is the error on the ratio propagated from the flux uncertainties. #### 3.5.5 Sampling | Sampler | Algorithm | Key settings | Reference | |---------|----------|-------------|-----------| | emcee | Affine-invariant ensemble MCMC | 32 walkers, 5000 steps, 1000 burn-in | Foreman-Mackey et al. (2013) | | dynesty | Dynamic nested sampling | 500 live points | Speagle (2020); Koposov et al. (2022) | Walkers are initialised around a MAP estimate found by `scipy.optimize.differential_evolution`. #### 3.5.6 Total abundances from posteriors $$ 12 + \log(\mathrm{O/H}) = 12 + \log_{10}\!\left(10^{\log(\mathrm{O}^+/\mathrm{H}^+)} + 10^{\log(\mathrm{O}^{2+}/\mathrm{H}^+)}\right) $$ computed sample-by-sample from the posterior chains. N/O is derived as log(N+/H+) − log(O+/H+) (= log(N+/O+), the ICF = 1 assumption). #### 3.5.7 Current limitations and differences from Cullen+25 The current implementation differs from the full Cullen et al. (2025) / Scholte et al. (2025) EXCELS methodology in several ways: 1. **AV is not a free parameter.** Dust correction is applied externally before the forward model, using fixed Case B Balmer ratios at 104 K. Cullen+25 treat AV as a 6th free parameter and recompute temperature-dependent Balmer ratios at each likelihood evaluation, providing self-consistent dust correction. 2. **Single Te zone in the forward model.** The forward model uses a single `log_Te` parameter for all CEL emissivities. Cullen+25 apply a Te–Te relation to derive Te([OII]) from Te([OIII]) for the low-ionisation zone. In practice this difference is absorbed into the fitted ionic abundances when both O+ and O2+ are free parameters. 3. **Prior bounds differ** (see Section 3.5.1). Cullen+25 use log Te ∈ [3.0, 4.52] and log ne ∈ [1.0, 3.0]. --- ### 3.6 Broad Balmer component handling When a `BroadFitResult` or `MCMCBroadFitResult` is passed to `compute_abundances()`, broad Balmer component fluxes (Hα, Hβ, Hγ, Hδ) are automatically **summed** with the narrow component fluxes. This ensures the total hydrogen flux is used for the Balmer decrement and abundance normalisation (Iline / I). For MCMC results, broad and narrow posteriors are summed sample-by-sample to preserve correlations. --- ## 4. Full bibliography ### Atomic physics and emissivities - Aggarwal, K. M. & Keenan, F. P. 1999, ApJS, 123, 311 — O2+ collision strengths - Aller, L. H. 1984, *Physics of Thermal Gaseous Nebulae* (Reidel) — Hβ emissivity formula - Froese Fischer, C. & Tachiev, G. 2004, ADNDT, 87, 1 — Transition probabilities - Kisielius, R. et al. 2009, ApJ, 694, 1209 — O+ collision strengths - Luridiana, V., Morisset, C. & Shaw, R. A. 2015, A&A, 573, A42 — PyNEB - Storey, P. J. & Hummer, D. G. 1995, MNRAS, 272, 41 — H I recombination - Storey, P. J., Sochi, T. & Badnell, N. R. 2014, MNRAS, 441, 3028 — O2+ collision strengths - Storey, P. J. & Zeippen, C. J. 2000, MNRAS, 312, 813 — [NII], [OIII] transition probabilities - Tayal, S. S. 2011, ApJS, 195, 12 — N+ collision strengths - Tayal, S. S. & Zatsarinny, O. 2010, ApJS, 188, 32 — S+ collision strengths ### Dust attenuation and extinction - Calzetti, D. et al. 2000, ApJ, 533, 682 — Starburst attenuation law - Cardelli, J. A., Clayton, G. C. & Mathis, J. S. 1989, ApJ, 345, 245 — MW extinction - Noll, S. et al. 2009, A&A, 507, 1793 — UV bump + slope modification - Osterbrock, D. E. & Ferland, G. J. 2006, *Astrophysics of Gaseous Nebulae and Active Galactic Nuclei*, 2nd ed. — Case B Balmer ratios - Salim, S. et al. 2018, ApJ, 859, 11 — Modified Calzetti parameters ### Electron temperature and Te–Te relations - Andrews, B. H. & Martini, P. 2013, ApJ, 765, 140 — Te–Te correction - Campbell, A., Terlevich, R. & Melnick, J. 1986, MNRAS, 223, 811 — Original Te–Te relation - Curti, M. et al. 2017, MNRAS, 465, 1384 — Te–Te discrepancy - DESI Collaboration 2025, arXiv:2601.02463 — DESI DR2 Te–Te relation - Garnett, D. R. 1992, AJ, 103, 1330 — Classical Te–Te relation - Nicholls, D. C. et al. 2014, ApJ, 786, 155 — Te–Te relations - Pagel, B. E. J. et al. 1992, MNRAS, 255, 325 — Te–Te relation - Pilyugin, L. S. et al. 2009, A&A, 508, 1043 — Te–Te relation - Pilyugin, L. S. et al. 2010, ApJ, 720, 1738 — Te–Te discrepancy ### Electron density - Isobe, Y. et al. 2023, ApJ, 956, 139 — Density insensitivity at low ne ### Ionic abundances and ICFs - Amayo, A. et al. 2021, MNRAS, 505, 2361 — N/O = N+/O+ justification - Asplund, M. et al. 2009, ARA&A, 47, 481 — Solar abundances - Berg, D. A. et al. 2021, ApJ, 922, 170 — O3+ contribution negligible - Garnett, D. R. et al. 1997, ApJ, 489, 63 — C/O ionisation correction factor - Izotov, Y. I. et al. 2006, A&A, 448, 955 — ICFs (N, Ne, S, Ar) - Nava, A. et al. 2006, ApJ, 645, 1076 — N/O = N+/O+ justification ### Strong-line calibrations - Sanders, R. L. et al. 2025, ApJ (submitted) — Simultaneous O3/O2/R23/O32 calibrations ### Forward modelling - Cullen, F. et al. 2025, MNRAS (EXCELS) — Forward modelling approach - Scholte, D. et al. 2025, MNRAS (EXCELS) — Methodology section 4.3 ### MCMC and sampling - Foreman-Mackey, D. et al. 2013, PASP, 125, 306 — emcee - Foreman-Mackey, D. 2016, JOSS, 1, 24 — corner.py - Gelman, A. & Rubin, D. B. 1992, Statistical Science, 7, 457 — R̂ - Koposov, S. et al. 2022, doi:10.5281/zenodo.7388523 — dynesty - Lange, J. U. 2023, MNRAS, 525, 3181 — nautilus - Speagle, J. S. 2020, MNRAS, 493, 3132 — dynesty ### IGM and Lyman-alpha - Inoue, A. K. et al. 2014, MNRAS, 442, 1805 — IGM transmission model ### Line database - Morton, D. C. 2003, ApJS, 149, 205 — UV line wavelengths - NIST Atomic Spectra Database — Rest-frame vacuum wavelengths ### General - Harris, C. R. et al. 2020, Nature, 585, 357 — NumPy - Hunter, J. D. 2007, Computing in Science & Engineering, 9, 90 — matplotlib - Virtanen, P. et al. 2020, Nature Methods, 17, 261 — SciPy - Astropy Collaboration 2022, ApJ, 935, 167 — Astropy