Project Overview
This project addresses source parameter estimation for simulated binary black hole gravitational
wave signals. The goal is to infer the physical properties of the source from noisy time-domain
waveforms, using a simulation-based inference pipeline rather than a classical likelihood-based approach.
The pipeline generates two waveform polarizations, h+ and h×, with PyCBC and trains a BayesFlow
model to learn full posterior distributions over the source parameters. This makes the project not only
a point-prediction task, but an uncertainty-aware Bayesian inference problem.
Problem Statement
When two black holes orbit and merge, they emit gravitational waves whose frequency and amplitude
encode information about the masses, spins, distance, and inclination of the binary system. In realistic
conditions, the signal is weak and contaminated by noise, which makes parameter recovery challenging.
Main objective: learn the posterior distribution of binary black hole source parameters
from noisy simulated gravitational wave strain data.
Simulation Setup
Waveforms were generated using PyCBC with the aligned-spin IMRPhenomD waveform model. Each sample
contains two time-domain polarizations and is processed into a fixed-length two-channel time series.
- Waveform model: IMRPhenomD aligned-spin binary black hole waveform.
- Input channels: h+ and h× polarizations.
- Length: 4096 time steps per sample.
- Sampling: Δt = 1/2048 s with f_lower = 20 Hz.
- Augmentation: amplitude jitter and light white noise.
- Dataset: 12,000 simulations split into train, validation, and test sets.
Parameterization
The simulator uses physical parameters such as component masses, aligned spins, luminosity distance,
and inclination. For training, the target space was re-parameterized to improve learning stability and
match known gravitational-wave degeneracies.
| Target Parameter |
Description |
Reason |
| m1 |
Primary black hole mass |
Affects frequency evolution and merger timing |
| m2 |
Secondary black hole mass |
Constrained through chirp-mass and mass-ratio information |
| log10 D |
Log luminosity distance |
Stabilizes distance learning across a broad range |
| χeff |
Effective aligned spin |
Captures the dominant spin imprint in the waveform phase |
| χa |
Spin asymmetry |
Represents antisymmetric spin information |
| cosι |
Cosine of inclination |
Improves geometry representation and numerical stability |
Model Architecture
The model follows an encoder-plus-flow design. A TimeSeriesNetwork summarizes the two-channel waveform
into a compact representation, and a conditional Flow Matching network transforms a base distribution
into posterior samples over the six-dimensional parameter vector.
Summary Network
- BayesFlow TimeSeriesNetwork for 4096 × 2 waveform inputs.
- 9 convolutional layers with 200 filters.
- Kernel size 12 and pooling factor 2.
- MLP layers [256, 256, 128].
- 128-dimensional summary representation.
- Dropout = 0.20 for regularization.
Inference Network
- BayesFlow FlowMatching inference network.
- 10 flow blocks.
- Conditional posterior sampling for six target parameters.
- Flexible posterior representation for correlated and skewed parameters.
Training Setup
The model was trained with Adam and monitored using validation loss. Early stopping and learning-rate
reduction were used to stabilize convergence and avoid overfitting.
- Optimizer: Adam.
- Learning rate: 5 × 10−4.
- Batch size: 32.
- Training budget: 55 epochs.
- Callbacks: EarlyStopping and ReduceLROnPlateau.
- Split: 90% train, 5% validation, 5% test.
Results
The model performed very strongly on parameters that are well informed by the waveform, especially
luminosity distance and inclination. Mass and spin parameters showed different levels of difficulty,
reflecting known physical degeneracies.
R² = 0.995
cosι was recovered with excellent accuracy.
R² = 0.976
log10 distance was strongly constrained.
R² = 0.802
χeff was learned well due to its strong phasing imprint.
Raw Test Performance
| Parameter |
R² |
Coverage 68% |
Coverage 90% |
Interpretation |
| M1 |
0.428 |
68.2% |
92.3% |
Moderate recovery due to mass degeneracy. |
| M2 |
0.728 |
69.0% |
90.3% |
Good recovery through chirp-mass information. |
| log10 D |
0.976 |
73.0% |
93.0% |
Very strong distance estimation. |
| χeff |
0.802 |
64.0% |
88.3% |
Good spin recovery with slight under-coverage. |
| χa |
0.171 |
71.2% |
90.2% |
Hardest parameter due to weak information. |
| cosι |
0.995 |
85.5% |
97.7% |
Excellent point recovery, but intervals slightly wide. |
Diagnostics
The project used several diagnostics to evaluate whether the posterior distributions were meaningful,
not only whether posterior means were accurate. This included training curves, recovery plots, PIT
histograms, Simulation-Based Calibration, and posterior predictive checks.
Simulation-Based Calibration
SBC was used to check whether the true parameters fall into the posterior distribution as expected.
The z-score summaries showed generally well-calibrated uncertainty, with parameter-specific issues such
as over-coverage for cosι and mild under-coverage for χeff.
Posterior Predictive Checks
Posterior predictive checks were performed by drawing posterior samples, re-simulating waveforms, and
comparing them to the observed test waveforms. Posterior mean waveforms were smoother than noisy data,
as expected, because averaging posterior draws reduces noise and high-frequency variation.
Post-Training Calibration
Lightweight post-training calibration was applied without retraining the model. The goal was to improve
posterior uncertainty quality while keeping posterior means intact.
-
log10 D: scale-only adjustment by SNR tertile reduced interval widths and moved coverage closer to nominal.
-
χeff: a q-binned Beta-PIT warp improved PIT uniformity and slightly improved 68% coverage.
-
cosι: diagnostic results suggested mild over-coverage, meaning posterior intervals were too wide.
Limitations and Future Work
The project intentionally used a simplified simulation setting to focus on implementing and validating
a working BayesFlow pipeline. The main limitations are related to physical realism and signal complexity.
- Only simulated data were used, not real detector data.
- The noise model was light white noise, not colored non-stationary detector noise.
- No multi-detector response was included.
- The waveform family did not include precession or higher-order modes.
- χa remained difficult to infer because the signal contains limited information about antisymmetric spin.
Future work could include multi-detector observations, whitening and bandpassing, richer waveform models,
and calibration-aware training objectives.
Outcome
This project strengthened my understanding of simulation-based inference, amortized Bayesian inference,
posterior diagnostics, scientific simulation, and uncertainty calibration. It also showed the importance of
validating posterior distributions with calibration diagnostics, not only relying on point accuracy.
Simulation-Based Inference
BayesFlow
PyCBC
Flow Matching
Gravitational Waves
Posterior Calibration
SBC
PPC
Deep Learning