Add offline decoding from Stim detector-sample files#83
Merged
Conversation
Introduces a file-based path that decodes Stim detector samples (samples_{X,Z}.dets + metadata_{X,Z}.json) with the same Ising pre-decoder + PyMatching pipeline used in-memory, and a generator workflow for reproducible reference inputs.
New surface area:
- qec/surface_code/stim_sample_io.py: on-disk Stim sample contract, structural + optional noise-fingerprint validation against a rebuilt memory circuit.
- data/predecoder_transform.py: canonical Stim detectors -> (trainX, x_syn_diff, z_syn_diff); shared by file-based datapipe and the buffer-fused GPU module (parity test enforces this).
- workflows/run.py: workflow.task=generate_stim_data and PREDECODER_DECODE_MODE={pymatching_only, ising_decoding_pymatching}.
- scripts/offline_smoketest.sh + tests/test_offline_stim_decoding.py: end-to-end smoke + IO/transform/decoder coverage.
- README + cookbook: BYO-samples contract, generator usage, strict-validation semantics.
Plumbing in data/datapipe_stim.py, data/factory.py, evaluation/{inference,logical_error_rate}.py, export/generate_test_data.py, scripts/local_run.sh; opt-in [Inference Summary] JSON marker via PREDECODER_EMIT_INFERENCE_SUMMARY=1.
Smoke-tested at d=7/n_rounds=7 (Fast, R=9) and d=13/n_rounds=13 (Accurate, R=13); LERs and PyMatching speedup match the reference example in README.md.
Signed-off-by: kvmto <kmato@nvidia.com>
Run yapf --style=.style.yapf (based_on_style=google, column_limit=100, dedent_closing_brackets=true, split_before_closing_bracket=true) over the Python files touched by the previous commit. No semantic changes. Signed-off-by: kvmto <kmato@nvidia.com>
bmhowe23
reviewed
May 28, 2026
Collaborator
bmhowe23
left a comment
There was a problem hiding this comment.
Thanks, Kevin. Most of my comments are about code duplication.
ivanbasov
reviewed
May 28, 2026
ivanbasov
reviewed
May 28, 2026
- predecoder_transform.py: short-circuit the T==1 / T>=2 detector-index selection so the T==1 path no longer materializes a wrong-length intermediate. Per @ivanbasov. - stim_sample_io.py: make schema_version load-bearing — reject values newer than SCHEMA_VERSION (and non-int garbage); default missing key to v1 so legacy files keep loading. Per @ivanbasov. - offline_smoketest.sh: stop reprinting the per-basis table that code/evaluation/inference.py already emits. Keep the [Inference Summary] JSON-marker parse and print one headline line for CI scrapers. README example updated. Per @bmhowe23. Signed-off-by: kvmto <kmato@nvidia.com>
Extract the buffer-fused dets -> (trainX, x_syn_diff, z_syn_diff)
transform into `_predecoder_transform_core` in
`code/data/predecoder_transform.py`. Both the file-datapipe entry point
(`dets_to_predecoder_inputs`) and the GPU/ONNX export path
(`PreDecoderMemoryEvalModule._batch_to_trainx_and_syndromes`) now
delegate to it.
- The helper builds the perms/grids ad-hoc per call from
`compute_stab*_to_data_index_map` and `normalized_weight_mapping_*`,
then casts syndromes to int32 at the return boundary to preserve its
public contract.
- The eval module forwards its pre-registered buffers straight through;
`__init__` is unchanged. The ONNX export graph is byte-identical
before and after (383 nodes / 34697 bytes for X, 388 nodes / 35074
bytes for Z).
Drops the two parity tests that locked the previous duplicates
together and corrects the cross-check oracle's docstring count
("fourth" -> "second"). The `predecoder_transform.py` module docstring
no longer claims a separate GPU implementation; it points at the
shared core.
Signed-off-by: kvmto <kmato@nvidia.com>
Append ++model_id=${MODEL_ID} to the ising_decoding_pymatching phase's
EXTRA_PARAMS when MODEL_ID is set, so running the smoketest with the
Accurate checkpoint (R=13) no longer requires manually overriding
hydra params.
Signed-off-by: kvmto <kmato@nvidia.com>
ivanbasov
reviewed
May 29, 2026
Member
ivanbasov
left a comment
There was a problem hiding this comment.
Re-reviewed after the latest push. The schema_version and T==1 fixes are both clean (replied on those threads), and the eval-module dedup in 9059d78 is the right call — nicely done verifying the ONNX graph is byte-identical.
One concern remains about the second deleted test — details inline.
QCDataPipePreDecoder_Memory_inference._precompute_transformations is a separate, pre-existing implementation of the dets → (trainX, x_syn_diff, z_syn_diff) transform that was not folded into _predecoder_transform_core in 9059d78. Locking it against the canonical helper keeps in-memory and file inference from silently drifting. Restore test_in_memory_datapipe_matches_canonical_helper_on_its_own_dets and the QCDataPipePreDecoder_Memory_inference import, and bump the _reference_tensors_from_measurements docstring from "second" to "third" implementation to reflect the actual count: (1) canonical core, (2) in-memory XOR-diff datapipe, (3) this oracle. Signed-off-by: kvmto <kmato@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a file-based path that decodes Stim detector samples (
samples_{X,Z}.dets+metadata_{X,Z}.json) with the same Ising pre-decoder + PyMatching pipeline used in-memory, plus a generator workflow for reproducible reference inputs.What's new
qec/surface_code/stim_sample_io.py— on-disk Stim sample contract, with structural and optional noise-fingerprint validation against a rebuilt memory circuit before decoding.data/predecoder_transform.py— canonical Stim detectors →(trainX, x_syn_diff, z_syn_diff)transform; shared by the file-based datapipe and the buffer-fused GPU module. A parity test intests/test_offline_stim_decoding.pyenforces algorithmic equality.workflows/run.py— newworkflow.task=generate_stim_dataandPREDECODER_DECODE_MODEcontrols:pymatching_only(baseline; loadstorch.nn.Identity(), no checkpoint required) andising_decoding_pymatching(Ising pre-decoder + PyMatching).scripts/offline_smoketest.sh+tests/test_offline_stim_decoding.py— end-to-end smoke and IO/transform/decoder coverage.README.md+cookbook/predecoder.ipynb— file contract docs, BYO-samples instructions, generator usage, strict-validation semantics.Plumbing through
data/datapipe_stim.py,data/factory.py,evaluation/{inference,logical_error_rate}.py,export/generate_test_data.py,scripts/local_run.sh; opt-in[Inference Summary]JSON marker viaPREDECODER_EMIT_INFERENCE_SUMMARY=1for downstream tooling.Validation behaviour
p_error,noise_model_sha256) are fatal by default; downgrade to warnings withPREDECODER_STIM_STRICT_NOISE=0(for p_error sweeps / calibration studies).Smoke test
Ran end-to-end against the shipped checkpoints at both supported windows. Numbers line up with the reference example in
README.md.d=7, n_rounds=7, O1, Fast (R=9, model_id=1), 262,144 shots/basis
d=13, n_rounds=13, O1, Accurate (R=13, model_id=4), 262,144 shots/basis
Test plan
bash code/scripts/offline_smoketest.sh(d=7, Fast)[Inference Summary]JSON marker parses for both decode modesunit-tests(covers newtest_offline_stim_decoding.py)spdx-header-checkon all 4 new files