Corruption-pattern survey tool for FEC design#85
Merged
Conversation
Follow-up A from #83. Adds per-path RSSI / EVM / SNR to every <devourer-stream> line so corruption_analysis.py can correlate BER with link quality on a per-frame basis instead of relying on aggregated statistics. * demo/main.cpp: <devourer-stream>rate=R len=L crc_err=X icv_err=Y rssi=A,B evm=A,B snr=A,B body=HEX. Same source as the Tier-2 diagnostics in <devourer-body>; no new RX-status fields, just surfacing what FrameParser already populates. * tools/precoder/corruption_analysis.py: parses the new fields, reports - SNR distribution (min/p25/med/p75/max) for chip-clean vs chip-corrupt populations - BER per 5-dB SNR bucket Uses max(snr_A, snr_B) as the "effective" SNR — on single-antenna 1T1R sticks path B reads 0 (no signal, not "0 dB"), so a naive min would always report 0 and the bucket view collapses; max picks the active path on 1T1R and the stronger path on 2T2R single-stream operation. * stream_rx.py / tun_p2p.py / precoder_stream_roundtrip.py: regex updated to tolerate the new optional rssi/evm/snr fields (none read them yet — pass-through compatibility). Verification Hardware (500 frames at default TX power, RTL8812AU → T2U Plus RTL8821AU, ch 6): phy SNR (stronger path, dB): chip-clean : n=467 min=0 p25=30 med=33 p75=38 max=51 chip-corrupt : n=0 BER by SNR bucket (stronger path, 5-dB buckets): bucket frames bits-cmp bit-err BER 0-5 dB 1 192 0 0.000e+00 20-25 dB 11 2112 0 0.000e+00 25-30 dB 76 14592 0 0.000e+00 30-35 dB 178 34176 0 0.000e+00 35-40 dB 122 23424 0 0.000e+00 40-45 dB 55 10560 0 0.000e+00 45-50 dB 19 3648 0 0.000e+00 50-55 dB 5 960 0 0.000e+00 Bench link is too clean for chip-corrupt events even at the SNR tails, which matches the post-PR-investigation finding for #83: at bench distance the loss is at PHY sync, not FCS. The analyser is ready for noisier deployments / range-extended captures (follow-up B). Offline smoke (synthetic 5-clean@28dB + 5-corrupt@5dB injection) correctly buckets BER=0 in the 25-30 dB bucket and BER=1.04e-2 in the 5-10 dB bucket — the per-bucket correlation works as designed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-up B from #83 (and depends on #84's phy soft metrics): adds a chip-side DEVOURER_RX_DUMP_ALL env var that emits a <devourer-corrupt-any> line for every RX frame, plus an aggregate analyser that turns those into FEC-design-grade statistics. * demo/main.cpp: DEVOURER_RX_DUMP_ALL=1 emits one body-less line per frame with len + chip-flag bits + rate + per-path rssi/evm/snr. Body bytes are deliberately omitted (a hot survey would inflate the log past usable size); pkt_len + flags + phy is what the aggregate report needs. * tools/precoder/corruption_survey.py: parses the new lines and reports - headline chip-clean / chip-corrupt counts - corruption rate broken down by DESC_RATE (the CCK vs OFDM split — without this the headline number is dominated by always-clean CCK ACKs and beacons and underestimates what OFDM data faces) - frame-size distribution for chip-clean vs chip-corrupt - phy-metric stats (rssi/evm/snr) per population, filtered to frames where the chip actually populated phy stats (CCK and short mgmt frames report 0/0; we treat those as "no measurement" instead of "0 dB" so the bucket views don't collapse) - per-SNR-bucket corruption rate (where measurable) - temporal clustering (when running live for >1 s; skipped on file/pipe input where all lines arrive at once) Output ends with a heuristic FEC recommendation based on median-vs-peak corruption rate. Bench finding (60 s ch6 capture, busy office environment near several APs): === corruption survey (2266 frames, file/pipe) === chip-clean : 1663 ( 73.4%) chip-corrupt : 603 ( 26.6%) corruption rate : 26.61% no-phy-measurement: 2103 (CCK/short frames, chip reports 0/0) Corruption rate by DESC_RATE: idx name count % corrupt rate 0x00 1M CCK 2075 91.6% 412 19.9% 0x02 5.5M CCK 2 0.1% 2 100.0% 0x03 11M CCK 1 0.0% 1 100.0% 0x04 6M OFDM 17 0.8% 17 100.0% 0x05 9M OFDM 19 0.8% 19 100.0% 0x06 12M OFDM 20 0.9% 20 100.0% 0x07 18M OFDM 31 1.4% 31 100.0% 0x08 24M OFDM 22 1.0% 22 100.0% 0x09 36M OFDM 30 1.3% 30 100.0% 0x0a 48M OFDM 31 1.4% 31 100.0% 0x0b 54M OFDM 18 0.8% 18 100.0% The FEC-design takeaway: 1M CCK is robust at ~20% loss because the modulation is simple; every OFDM rate is 100% corrupt because we're hearing distant APs at marginal SNR. The PoC's 6M OFDM stream link works only because TX and RX are co-located — at any real range the chip will surface FCS failures at high rate and the stream layer needs inter-frame parity (Reed-Solomon / Raptor) to recover, not just per-frame FEC. The tool gives FEC designers the concrete inputs (rate distribution, snr distribution, time clustering) to size the parity block and overhead. Builds on #83 (chip-level filter open) and #84 (phy soft metrics). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
josephnef
added a commit
that referenced
this pull request
Jun 7, 2026
## Summary The corruption survey in #85 showed real-range OFDM frames on this link will see **30–70% loss**. tun_p2p.py's blind `--repeat N` is a fixed-cost workaround that can't compose to handle the tail; this PR ships a real erasure code on top of the existing stream framing. ## Library `raptorq` from cberner (Rust+PyO3 binding to the RFC 6330 reference port). MIT, manylinux abi3 wheels on PyPI, ~26 Gbps enc / ~7 Gbps dec at K=1000 on commodity x86. `uv add raptorq` is the only install step. ## Wire format The existing `stream.py` framing stays untouched. FEC is an **inner envelope** living inside `StreamFrame.payload`: ``` FEC_MAGIC (2) = 0xF52E VERSION/FLAGS (1) = 0 K (1) = source symbols per block KREAL (1) = real source symbols in this block (≤ K). Trailing (K - KREAL) decoded symbols are zero-pad to discard. SYMBOL_SIZE (2) = LE u16 BLOCK_ID (2) = LE u16 wraps RAPTORQ_PKT (var) = lib-managed SBN+ESI+symbol inner overhead = 9 B + raptorq's 4 B SBN/ESI = 13 B ``` Source symbols are themselves concatenations of length-prefixed IP packets: ``` [u16 len_a][packet_a]…[u16 len_b][packet_b]…[zero pad to SYMBOL_SIZE] ``` So small packets (ACK floods) share symbols instead of each burning a whole symbol's worth of airtime. ## Files - `tools/precoder/pyproject.toml` — add `raptorq>=2`. - `tools/precoder/stream_fec.py` — `FecConfig`, `FecEncoder` (concatenation packing + block encoding), `FecDecoder` (block-incremental decode + late-symbol drop + block expiry). - `tools/precoder/test_stream_fec.py` — 19 unit tests: round-trip, loss tolerance 0/20/40% at R/K=1, 50% at R/K=2, unrecoverable-block bookkeeping at 70%, concatenation, partial flush, block-id wrap, MTU enforcement, garbage envelopes. - `tools/precoder/tun_p2p.py` — new `--fec-k`/`--fec-overhead`/`--fec-symbol-size`/`--fec-flush-ms`/`--fec-block-expire-ms` flags. tx_thread feeds packets through the encoder; a parallel `fec_flush_thread` force-encodes partial blocks every flush-ms (sparse traffic doesn't stall). rx_thread feeds payloads through the decoder; decoded IP packets go to TUN. Outer `SeqWindow` dedup is forced OFF when FEC is on (RaptorQ symbols self-dedup via SBN+ESI). New `fec=[...]` segment in the periodic stderr report. Docstring extended. ## Hardware verification Two-netns single-host bench (RTL8812AU `0x8812` + TP-Link Archer T2U Plus / RTL8821AU `2357:0120`, ch 6, no `--repeat`, `ping -c 30 -i 1`): | Config | RTT min/avg/max | Loss | DUP | Blocks ok/lost | |---|---|---:|---:|---:| | `--fec-k 16 --fec-overhead 1.0 --fec-flush-ms 50` | 121 / **160** / 207 ms | 0% | 0 | 30 / 1 (startup) | | `--fec-k 8 --fec-overhead 1.0 --fec-flush-ms 20` | 73 / **95** / 145 ms | 0% | 0 | 30 / 1 (startup) | The K=8 config trades a bit of recovery margin for a 65 ms drop in median RTT. Both decode 100% of source packets on a healthy link; the survey's noisier regimes are what motivates `--fec-overhead > 1`. For comparison from PR #82's earlier numbers (same bench, byte mode): | Mode | Loss | Avg RTT | |---|---:|---:| | Byte mode `--repeat 1` | 10% | 7 ms | | Byte mode `--repeat 4` + dedup | 0% | 10 ms (with up to 25 DUPs per ping eaten by dedup) | | **FEC K=8 R/K=1 flush=20** | **0%** | **95 ms** | FEC moves us from "blind redundancy + dedup" to "real erasure code". The latency cost is the K-source-symbol encode buffer; the win is that the codec scales gracefully to higher loss rates by raising `--fec-overhead` instead of running out at `--repeat=∞`. ## Test plan - [x] `cd tools/precoder && uv run pytest` → 87 passed (31 pipeline + 37 stream + 19 fec) - [x] `python -m pytest tests/precoder_smoke.py tests/precoder_stream_smoke.py` → 8 passed - [x] tun_p2p.py --help parses cleanly (incl. all FEC flags) - [x] Bench: K=16/R=1 and K=8/R=1, both 30/30 ping with 0% loss and 0 DUPs ## Open caveats (documented in script) - Strict block boundaries — no cross-block FEC, no Raptor carousel. Good enough at K=8–16 + 20–50 ms flush; revisit if the latency budget tightens further. - No rateless dynamic overhead — R/K is fixed at construction. A future PR could let RX hint TX to send more repair symbols via a reverse-channel feedback envelope. - Patent note: RFC 6330 has Qualcomm patents largely expired in primary jurisdictions by 2026; cberner's MIT lib explicitly notes this. Builds on #82 (TUN bridge, merged), #83 (corrupted-frame surfacing, merged), #84 (phy soft metrics, open), #85 (corruption survey, open). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up B from #83 (depends on #84's phy soft metrics): a chip-side
DEVOURER_RX_DUMP_ALL=1env var that emits one line per RX frame with the chip's full integrity + phy soft-metric vector, plus an aggregate analyser that turns those into FEC-design-grade statistics.The previous work showed that the chip-corrupt pipeline now reaches the application layer (#83) and that per-frame phy metrics let the analyser correlate BER with SNR (#84). This PR is the third leg: a long-capture survey tool that characterises the actual corruption-pattern distribution real-world deployments face, so a FEC layer on top of the stream link can be sized empirically rather than guessed.
Changes
demo/main.cpp— newDEVOURER_RX_DUMP_ALL=1knob emits<devourer-corrupt-any>len=L crc_err=X icv_err=Y rate=R rssi=A,B evm=A,B snr=A,B. Body bytes are deliberately omitted (a hot survey would inflate the log past usable size); the aggregate report only needs length + flags + phy.tools/precoder/corruption_survey.py— new tool that reads those lines and reports:Bench finding
60-second ch6 capture in a busy office environment with several APs in range:
Reading the result
The FEC-design takeaway:
Follow-ups (for whoever picks up the FEC layer)
Builds on #83 (chip-level filter open, merged) and #84 (phy soft metrics, open).
🤖 Generated with Claude Code