Gravitational Wave SBI

Project Overview

This project addresses source parameter estimation for simulated binary black hole gravitational wave signals. The goal is to infer the physical properties of the source from noisy time-domain waveforms, using a simulation-based inference pipeline rather than a classical likelihood-based approach.

The pipeline generates two waveform polarizations, h+ and h×, with PyCBC and trains a BayesFlow model to learn full posterior distributions over the source parameters. This makes the project not only a point-prediction task, but an uncertainty-aware Bayesian inference problem.

Problem Statement

When two black holes orbit and merge, they emit gravitational waves whose frequency and amplitude encode information about the masses, spins, distance, and inclination of the binary system. In realistic conditions, the signal is weak and contaminated by noise, which makes parameter recovery challenging.

Main objective: learn the posterior distribution of binary black hole source parameters from noisy simulated gravitational wave strain data.

Simulation Setup

Waveforms were generated using PyCBC with the aligned-spin IMRPhenomD waveform model. Each sample contains two time-domain polarizations and is processed into a fixed-length two-channel time series.

Waveform model: IMRPhenomD aligned-spin binary black hole waveform.
Input channels: h+ and h× polarizations.
Length: 4096 time steps per sample.
Sampling: Δt = 1/2048 s with f_lower = 20 Hz.
Augmentation: amplitude jitter and light white noise.
Dataset: 12,000 simulations split into train, validation, and test sets.

Parameterization

The simulator uses physical parameters such as component masses, aligned spins, luminosity distance, and inclination. For training, the target space was re-parameterized to improve learning stability and match known gravitational-wave degeneracies.

Target Parameter	Description	Reason
m1	Primary black hole mass	Affects frequency evolution and merger timing
m2	Secondary black hole mass	Constrained through chirp-mass and mass-ratio information
log10 D	Log luminosity distance	Stabilizes distance learning across a broad range
χeff	Effective aligned spin	Captures the dominant spin imprint in the waveform phase
χa	Spin asymmetry	Represents antisymmetric spin information
cosι	Cosine of inclination	Improves geometry representation and numerical stability

Model Architecture

The model follows an encoder-plus-flow design. A TimeSeriesNetwork summarizes the two-channel waveform into a compact representation, and a conditional Flow Matching network transforms a base distribution into posterior samples over the six-dimensional parameter vector.

Summary Network

BayesFlow TimeSeriesNetwork for 4096 × 2 waveform inputs.
9 convolutional layers with 200 filters.
Kernel size 12 and pooling factor 2.
MLP layers [256, 256, 128].
128-dimensional summary representation.
Dropout = 0.20 for regularization.

Inference Network

BayesFlow FlowMatching inference network.
10 flow blocks.
Conditional posterior sampling for six target parameters.
Flexible posterior representation for correlated and skewed parameters.

Training Setup

The model was trained with Adam and monitored using validation loss. Early stopping and learning-rate reduction were used to stabilize convergence and avoid overfitting.

Optimizer: Adam.
Learning rate: 5 × 10−4.
Batch size: 32.
Training budget: 55 epochs.
Callbacks: EarlyStopping and ReduceLROnPlateau.
Split: 90% train, 5% validation, 5% test.

Results

The model performed very strongly on parameters that are well informed by the waveform, especially luminosity distance and inclination. Mass and spin parameters showed different levels of difficulty, reflecting known physical degeneracies.

R² = 0.995 cosι was recovered with excellent accuracy.

R² = 0.976 log10 distance was strongly constrained.

R² = 0.802 χeff was learned well due to its strong phasing imprint.

Raw Test Performance

Parameter	R²	Coverage 68%	Coverage 90%	Interpretation
M1	0.428	68.2%	92.3%	Moderate recovery due to mass degeneracy.
M2	0.728	69.0%	90.3%	Good recovery through chirp-mass information.
log10 D	0.976	73.0%	93.0%	Very strong distance estimation.
χeff	0.802	64.0%	88.3%	Good spin recovery with slight under-coverage.
χa	0.171	71.2%	90.2%	Hardest parameter due to weak information.
cosι	0.995	85.5%	97.7%	Excellent point recovery, but intervals slightly wide.

Diagnostics

The project used several diagnostics to evaluate whether the posterior distributions were meaningful, not only whether posterior means were accurate. This included training curves, recovery plots, PIT histograms, Simulation-Based Calibration, and posterior predictive checks.

Simulation-Based Calibration

SBC was used to check whether the true parameters fall into the posterior distribution as expected. The z-score summaries showed generally well-calibrated uncertainty, with parameter-specific issues such as over-coverage for cosι and mild under-coverage for χeff.

Posterior Predictive Checks

Posterior predictive checks were performed by drawing posterior samples, re-simulating waveforms, and comparing them to the observed test waveforms. Posterior mean waveforms were smoother than noisy data, as expected, because averaging posterior draws reduces noise and high-frequency variation.

Post-Training Calibration

Lightweight post-training calibration was applied without retraining the model. The goal was to improve posterior uncertainty quality while keeping posterior means intact.

log10 D: scale-only adjustment by SNR tertile reduced interval widths and moved coverage closer to nominal.
χeff: a q-binned Beta-PIT warp improved PIT uniformity and slightly improved 68% coverage.
cosι: diagnostic results suggested mild over-coverage, meaning posterior intervals were too wide.

Limitations and Future Work

The project intentionally used a simplified simulation setting to focus on implementing and validating a working BayesFlow pipeline. The main limitations are related to physical realism and signal complexity.

Only simulated data were used, not real detector data.
The noise model was light white noise, not colored non-stationary detector noise.
No multi-detector response was included.
The waveform family did not include precession or higher-order modes.
χa remained difficult to infer because the signal contains limited information about antisymmetric spin.

Future work could include multi-detector observations, whitening and bandpassing, richer waveform models, and calibration-aware training objectives.

Outcome

This project strengthened my understanding of simulation-based inference, amortized Bayesian inference, posterior diagnostics, scientific simulation, and uncertainty calibration. It also showed the importance of validating posterior distributions with calibration diagnostics, not only relying on point accuracy.

Submission

Course

Framework

Diagnostics

Project Overview

Problem Statement

Simulation Setup

Parameterization

Model Architecture

Summary Network

Inference Network

Training Setup

Results

Raw Test Performance

Diagnostics

Simulation-Based Calibration

Posterior Predictive Checks

Post-Training Calibration

Limitations and Future Work

Outcome