Corruption-pattern survey tool for FEC design by josephnef · Pull Request #85 · OpenIPC/devourer

josephnef · 2026-06-07T13:40:02Z

Summary

Follow-up B from #83 (depends on #84's phy soft metrics): a chip-side DEVOURER_RX_DUMP_ALL=1 env var that emits one line per RX frame with the chip's full integrity + phy soft-metric vector, plus an aggregate analyser that turns those into FEC-design-grade statistics.

The previous work showed that the chip-corrupt pipeline now reaches the application layer (#83) and that per-frame phy metrics let the analyser correlate BER with SNR (#84). This PR is the third leg: a long-capture survey tool that characterises the actual corruption-pattern distribution real-world deployments face, so a FEC layer on top of the stream link can be sized empirically rather than guessed.

Changes

demo/main.cpp — new DEVOURER_RX_DUMP_ALL=1 knob emits <devourer-corrupt-any>len=L crc_err=X icv_err=Y rate=R rssi=A,B evm=A,B snr=A,B. Body bytes are deliberately omitted (a hot survey would inflate the log past usable size); the aggregate report only needs length + flags + phy.
tools/precoder/corruption_survey.py — new tool that reads those lines and reports:
- headline chip-clean vs chip-corrupt counts
- corruption rate broken down by DESC_RATE (the CCK-vs-OFDM split — without this the headline is dominated by always-clean CCK ACKs/beacons and underestimates what OFDM data faces)
- frame-size distribution for each population
- phy-metric stats per population, filtered to frames where the chip populated phy stats (CCK reports 0/0; we treat as "no measurement" instead of "0 dB" so the buckets don't collapse)
- per-SNR-bucket corruption rate (where measurable)
- temporal clustering (live captures only)
- a heuristic FEC recommendation based on median-vs-peak corruption rate

Bench finding

60-second ch6 capture in a busy office environment with several APs in range:

=== corruption survey (2266 frames, file/pipe) ===
chip-clean       :   1663 ( 73.4%)
chip-corrupt     :    603 ( 26.6%)
corruption rate  : 26.61%
no-phy-measurement:  2103  (CCK/short frames, chip reports 0/0)

Corruption rate by DESC_RATE:
   idx name            count      %    corrupt    rate
  0x00 1M CCK           2075  91.6%        412  19.9%
  0x02 5.5M CCK            2   0.1%          2 100.0%
  0x03 11M CCK             1   0.0%          1 100.0%
  0x04 6M OFDM            17   0.8%         17 100.0%
  0x05 9M OFDM            19   0.8%         19 100.0%
  0x06 12M OFDM           20   0.9%         20 100.0%
  0x07 18M OFDM           31   1.4%         31 100.0%
  0x08 24M OFDM           22   1.0%         22 100.0%
  0x09 36M OFDM           30   1.3%         30 100.0%
  0x0a 48M OFDM           31   1.4%         31 100.0%
  0x0b 54M OFDM           18   0.8%         18 100.0%

Reading the result

1M CCK loses ~20% even at this location — CCK is robust but background interference still nukes one in five ACKs/beacons.
Every OFDM rate above CCK is 100% corrupt because we're hearing distant APs at marginal SNR — the chip detects them, decodes them, fails the FCS, and now (with Surface CRC/ICV-corrupted RX frames + analysis tool #83's RCR change) surfaces them.

The FEC-design takeaway:

The PoC's 6M OFDM stream link only works because TX and RX are co-located. At any real range the chip will surface FCS failures at high rate.
The stream layer needs inter-frame parity (Reed-Solomon over N frames + K parity, Raptor, etc.) to recover from blocks of lost frames, not just per-frame FEC.
For a P2P link's typical "moderate range" use case (e.g. OpenIPC long-range video), expect frame loss rates in the 30–70% range. FEC overhead has to be sized accordingly — at 50% loss you need K/N ≈ 0.5 to be reliable.

Follow-ups (for whoever picks up the FEC layer)

Pick a parity scheme (Reed-Solomon is simplest, Raptor scales better) and parametrise N, K against captures from realistic ranges.
Decide where parity rides: in-band on the same SA (current TX path) vs. on a dedicated SA / frame type. In-band keeps the link simple but eats stream airtime.
Consider degrading rate gracefully (rateless codes) so the receiver can decode at whatever fraction of N+K frames it actually receives.

Builds on #83 (chip-level filter open, merged) and #84 (phy soft metrics, open).

🤖 Generated with Claude Code

Follow-up A from #83. Adds per-path RSSI / EVM / SNR to every <devourer-stream> line so corruption_analysis.py can correlate BER with link quality on a per-frame basis instead of relying on aggregated statistics. * demo/main.cpp: <devourer-stream>rate=R len=L crc_err=X icv_err=Y rssi=A,B evm=A,B snr=A,B body=HEX. Same source as the Tier-2 diagnostics in <devourer-body>; no new RX-status fields, just surfacing what FrameParser already populates. * tools/precoder/corruption_analysis.py: parses the new fields, reports - SNR distribution (min/p25/med/p75/max) for chip-clean vs chip-corrupt populations - BER per 5-dB SNR bucket Uses max(snr_A, snr_B) as the "effective" SNR — on single-antenna 1T1R sticks path B reads 0 (no signal, not "0 dB"), so a naive min would always report 0 and the bucket view collapses; max picks the active path on 1T1R and the stronger path on 2T2R single-stream operation. * stream_rx.py / tun_p2p.py / precoder_stream_roundtrip.py: regex updated to tolerate the new optional rssi/evm/snr fields (none read them yet — pass-through compatibility). Verification Hardware (500 frames at default TX power, RTL8812AU → T2U Plus RTL8821AU, ch 6): phy SNR (stronger path, dB): chip-clean : n=467 min=0 p25=30 med=33 p75=38 max=51 chip-corrupt : n=0 BER by SNR bucket (stronger path, 5-dB buckets): bucket frames bits-cmp bit-err BER 0-5 dB 1 192 0 0.000e+00 20-25 dB 11 2112 0 0.000e+00 25-30 dB 76 14592 0 0.000e+00 30-35 dB 178 34176 0 0.000e+00 35-40 dB 122 23424 0 0.000e+00 40-45 dB 55 10560 0 0.000e+00 45-50 dB 19 3648 0 0.000e+00 50-55 dB 5 960 0 0.000e+00 Bench link is too clean for chip-corrupt events even at the SNR tails, which matches the post-PR-investigation finding for #83: at bench distance the loss is at PHY sync, not FCS. The analyser is ready for noisier deployments / range-extended captures (follow-up B). Offline smoke (synthetic 5-clean@28dB + 5-corrupt@5dB injection) correctly buckets BER=0 in the 25-30 dB bucket and BER=1.04e-2 in the 5-10 dB bucket — the per-bucket correlation works as designed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Follow-up B from #83 (and depends on #84's phy soft metrics): adds a chip-side DEVOURER_RX_DUMP_ALL env var that emits a <devourer-corrupt-any> line for every RX frame, plus an aggregate analyser that turns those into FEC-design-grade statistics. * demo/main.cpp: DEVOURER_RX_DUMP_ALL=1 emits one body-less line per frame with len + chip-flag bits + rate + per-path rssi/evm/snr. Body bytes are deliberately omitted (a hot survey would inflate the log past usable size); pkt_len + flags + phy is what the aggregate report needs. * tools/precoder/corruption_survey.py: parses the new lines and reports - headline chip-clean / chip-corrupt counts - corruption rate broken down by DESC_RATE (the CCK vs OFDM split — without this the headline number is dominated by always-clean CCK ACKs and beacons and underestimates what OFDM data faces) - frame-size distribution for chip-clean vs chip-corrupt - phy-metric stats (rssi/evm/snr) per population, filtered to frames where the chip actually populated phy stats (CCK and short mgmt frames report 0/0; we treat those as "no measurement" instead of "0 dB" so the bucket views don't collapse) - per-SNR-bucket corruption rate (where measurable) - temporal clustering (when running live for >1 s; skipped on file/pipe input where all lines arrive at once) Output ends with a heuristic FEC recommendation based on median-vs-peak corruption rate. Bench finding (60 s ch6 capture, busy office environment near several APs): === corruption survey (2266 frames, file/pipe) === chip-clean : 1663 ( 73.4%) chip-corrupt : 603 ( 26.6%) corruption rate : 26.61% no-phy-measurement: 2103 (CCK/short frames, chip reports 0/0) Corruption rate by DESC_RATE: idx name count % corrupt rate 0x00 1M CCK 2075 91.6% 412 19.9% 0x02 5.5M CCK 2 0.1% 2 100.0% 0x03 11M CCK 1 0.0% 1 100.0% 0x04 6M OFDM 17 0.8% 17 100.0% 0x05 9M OFDM 19 0.8% 19 100.0% 0x06 12M OFDM 20 0.9% 20 100.0% 0x07 18M OFDM 31 1.4% 31 100.0% 0x08 24M OFDM 22 1.0% 22 100.0% 0x09 36M OFDM 30 1.3% 30 100.0% 0x0a 48M OFDM 31 1.4% 31 100.0% 0x0b 54M OFDM 18 0.8% 18 100.0% The FEC-design takeaway: 1M CCK is robust at ~20% loss because the modulation is simple; every OFDM rate is 100% corrupt because we're hearing distant APs at marginal SNR. The PoC's 6M OFDM stream link works only because TX and RX are co-located — at any real range the chip will surface FCS failures at high rate and the stream layer needs inter-frame parity (Reed-Solomon / Raptor) to recover, not just per-frame FEC. The tool gives FEC designers the concrete inputs (rate distribution, snr distribution, time clustering) to size the parity block and overhead. Builds on #83 (chip-level filter open) and #84 (phy soft metrics). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

## Summary The corruption survey in #85 showed real-range OFDM frames on this link will see **30–70% loss**. tun_p2p.py's blind `--repeat N` is a fixed-cost workaround that can't compose to handle the tail; this PR ships a real erasure code on top of the existing stream framing. ## Library `raptorq` from cberner (Rust+PyO3 binding to the RFC 6330 reference port). MIT, manylinux abi3 wheels on PyPI, ~26 Gbps enc / ~7 Gbps dec at K=1000 on commodity x86. `uv add raptorq` is the only install step. ## Wire format The existing `stream.py` framing stays untouched. FEC is an **inner envelope** living inside `StreamFrame.payload`: ``` FEC_MAGIC (2) = 0xF52E VERSION/FLAGS (1) = 0 K (1) = source symbols per block KREAL (1) = real source symbols in this block (≤ K). Trailing (K - KREAL) decoded symbols are zero-pad to discard. SYMBOL_SIZE (2) = LE u16 BLOCK_ID (2) = LE u16 wraps RAPTORQ_PKT (var) = lib-managed SBN+ESI+symbol inner overhead = 9 B + raptorq's 4 B SBN/ESI = 13 B ``` Source symbols are themselves concatenations of length-prefixed IP packets: ``` [u16 len_a][packet_a]…[u16 len_b][packet_b]…[zero pad to SYMBOL_SIZE] ``` So small packets (ACK floods) share symbols instead of each burning a whole symbol's worth of airtime. ## Files - `tools/precoder/pyproject.toml` — add `raptorq>=2`. - `tools/precoder/stream_fec.py` — `FecConfig`, `FecEncoder` (concatenation packing + block encoding), `FecDecoder` (block-incremental decode + late-symbol drop + block expiry). - `tools/precoder/test_stream_fec.py` — 19 unit tests: round-trip, loss tolerance 0/20/40% at R/K=1, 50% at R/K=2, unrecoverable-block bookkeeping at 70%, concatenation, partial flush, block-id wrap, MTU enforcement, garbage envelopes. - `tools/precoder/tun_p2p.py` — new `--fec-k`/`--fec-overhead`/`--fec-symbol-size`/`--fec-flush-ms`/`--fec-block-expire-ms` flags. tx_thread feeds packets through the encoder; a parallel `fec_flush_thread` force-encodes partial blocks every flush-ms (sparse traffic doesn't stall). rx_thread feeds payloads through the decoder; decoded IP packets go to TUN. Outer `SeqWindow` dedup is forced OFF when FEC is on (RaptorQ symbols self-dedup via SBN+ESI). New `fec=[...]` segment in the periodic stderr report. Docstring extended. ## Hardware verification Two-netns single-host bench (RTL8812AU `0x8812` + TP-Link Archer T2U Plus / RTL8821AU `2357:0120`, ch 6, no `--repeat`, `ping -c 30 -i 1`): | Config | RTT min/avg/max | Loss | DUP | Blocks ok/lost | |---|---|---:|---:|---:| | `--fec-k 16 --fec-overhead 1.0 --fec-flush-ms 50` | 121 / **160** / 207 ms | 0% | 0 | 30 / 1 (startup) | | `--fec-k 8 --fec-overhead 1.0 --fec-flush-ms 20` | 73 / **95** / 145 ms | 0% | 0 | 30 / 1 (startup) | The K=8 config trades a bit of recovery margin for a 65 ms drop in median RTT. Both decode 100% of source packets on a healthy link; the survey's noisier regimes are what motivates `--fec-overhead > 1`. For comparison from PR #82's earlier numbers (same bench, byte mode): | Mode | Loss | Avg RTT | |---|---:|---:| | Byte mode `--repeat 1` | 10% | 7 ms | | Byte mode `--repeat 4` + dedup | 0% | 10 ms (with up to 25 DUPs per ping eaten by dedup) | | **FEC K=8 R/K=1 flush=20** | **0%** | **95 ms** | FEC moves us from "blind redundancy + dedup" to "real erasure code". The latency cost is the K-source-symbol encode buffer; the win is that the codec scales gracefully to higher loss rates by raising `--fec-overhead` instead of running out at `--repeat=∞`. ## Test plan - [x] `cd tools/precoder && uv run pytest` → 87 passed (31 pipeline + 37 stream + 19 fec) - [x] `python -m pytest tests/precoder_smoke.py tests/precoder_stream_smoke.py` → 8 passed - [x] tun_p2p.py --help parses cleanly (incl. all FEC flags) - [x] Bench: K=16/R=1 and K=8/R=1, both 30/30 ping with 0% loss and 0 DUPs ## Open caveats (documented in script) - Strict block boundaries — no cross-block FEC, no Raptor carousel. Good enough at K=8–16 + 20–50 ms flush; revisit if the latency budget tightens further. - No rateless dynamic overhead — R/K is fixed at construction. A future PR could let RX hint TX to send more repair symbols via a reverse-channel feedback envelope. - Patent note: RFC 6330 has Qualcomm patents largely expired in primary jurisdictions by 2026; cberner's MIT lib explicitly notes this. Builds on #82 (TUN bridge, merged), #83 (corrupted-frame surfacing, merged), #84 (phy soft metrics, open), #85 (corruption survey, open). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

josephnef and others added 2 commits June 7, 2026 16:31

josephnef mentioned this pull request Jun 7, 2026

RaptorQ (RFC 6330) FEC layer for the stream link #86

Merged

4 tasks

josephnef merged commit 1f5c843 into master Jun 7, 2026
5 checks passed

josephnef deleted the corruption-survey branch June 7, 2026 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Corruption-pattern survey tool for FEC design#85

Corruption-pattern survey tool for FEC design#85
josephnef merged 2 commits into
masterfrom
corruption-survey

josephnef commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

josephnef commented Jun 7, 2026

Summary

Changes

Bench finding

Reading the result

Follow-ups (for whoever picks up the FEC layer)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant