Surface CRC/ICV-corrupted RX frames + analysis tool#83
Merged
Conversation
Previously, devourer's FrameParser dropped every RX frame whose chip
flagged CRC or ICV error (`if (crc_err || icv_err) break;`), AND broke
out of the loop entirely — so a single corrupted frame in a USB
aggregate threw away every subsequent frame in the same aggregate too.
The application saw a clean-or-missing-only "packet-erasure" channel,
with no way to know what the corruption looked like.
This PR:
* `src/FrameParser.cpp`: reorder so the pkt_len sanity check (needed to
find the next aggregate boundary) runs first; on crc/icv error we now
log + surface the packet with the flag bits intact on `RxAtrib`
instead of breaking. Consumers can still filter (existing behaviour
if they ignore the flags) or analyse the corruption pattern.
* `demo/main.cpp`: `<devourer-stream>` lines now include
`crc_err=0/1 icv_err=0/1`. Filtering is opt-in via
`DEVOURER_RX_KEEP_CORRUPTED=1` so a stream-mode consumer
(stream_rx.py / tun_p2p.py) doesn't accidentally feed garbage into
the IP stack — the byte-stream pipeline still drops corrupted frames
by default, and the analysis tool opts in explicitly.
* `tools/precoder/corruption_analysis.py`: new tool that reconstructs
the expected TX-side bodies from a source file and compares them
byte-by-byte and bit-by-bit against captured RX frames (clean OR
chip-corrupt). Reports chip-clean vs chip-corrupt counts, total bit
errors / BER, per-frame error stats, and a byte-position histogram —
useful for spotting whether corruption is uniform across the body,
clustered near the SERVICE-field offset, or concentrated in the
trailing OFDM symbols where the 802.11 FCS lives.
* Python regex helpers (`stream_rx.py`, `tun_p2p.py`, the harness)
accept the new optional `crc_err=` / `icv_err=` fields without
breaking on existing logs.
Verification
* Offline synthetic smoke: inject 5 corrupted + 5 clean bodies into a
fake `<devourer-stream>` log, run corruption_analysis.py against the
known source. Reports 5/5 chip-clean, 5/5 chip-corrupt, 10 byte
errors at positions {10, 15} matching the injected XOR pattern,
BER 5.2e-3, all matched seqs recovered correctly.
* On-air run (channel 6, RTL8812AU TX → T2U Plus / RTL8821AU RX,
500 frames at `--repeat 1`): 461 frames captured, 0 chip-corrupt.
The bench link is too clean to produce real FCS failures (the ~8%
loss is sync-level — frames never reached the decoder, not
corrupted-but-recoverable). The fix is for noisier real-world
deployments where chip-corrupt frames will now surface for analysis
or, eventually, FEC-style recovery.
Follow-ups (not in this PR)
* Surface phy-level soft metrics (per-stream EVM/SNR) alongside the
flag so the analyser can correlate corruption with link quality.
* Real-world capture at extended range to characterise actual error
distributions and feed a FEC layer on top of the stream framing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…knob
The parser change alone was insufficient: the chip's WMAC drops
CRC32-error and ICV-error frames BEFORE they hit the RX descriptor
when RCR_ACRC32 / RCR_AICV are off. RadioManagementModule's
monitor-mode RCR config left both bits clear, so the FrameParser's
new "surface corrupted frame" path never fired — corrupted frames
never made it past the chip filter to the host in the first place.
Now `hw_var_set_monitor` reads `DEVOURER_RX_KEEP_CORRUPTED` (same
env var as the demo's filter) and adds `RCR_ACRC32 | RCR_AICV` when
set. Default behaviour is unchanged.
Also adds `DEVOURER_TX_POWER` to StreamTxDemo (default 40 unchanged)
for stress-testing the receive-error path at attenuated SNR.
Verified on the bench: with both KEEP_CORRUPTED bits set, a 25-second
capture surfaced **746 real `crc_err=1` events** from background ch6
traffic, e.g.:
<devourer>RX corrupted frame surfaced: crc_err=1 icv_err=0 pkt_len=364
<devourer>RX corrupted frame surfaced: crc_err=1 icv_err=0 pkt_len=488
<devourer>RX corrupted frame surfaced: crc_err=1 icv_err=0 pkt_len=547
These are 802.11 data frames in the wild whose FCS failed; the
unmodified parser would have dropped every one of them, plus likely
many more in the same USB aggregate after each `break`. The
canonical-SA stream itself stayed clean (8812 → T2U Plus bench link is
too short-range to produce real FCS failures on a 6M-OFDM payload), so
the analyser-vs-known-source path still relies on the synthetic smoke
for end-to-end validation; the chip-level fix is what makes the parser
path actually firing on real-world deployment.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Jun 7, 2026
josephnef
added a commit
that referenced
this pull request
Jun 7, 2026
## Summary Follow-up A from #83. Adds per-path RSSI / EVM / SNR to every `<devourer-stream>` line so `corruption_analysis.py` can correlate BER with link quality on a per-frame basis instead of aggregated-only statistics. ## Changes - **`demo/main.cpp`** — `<devourer-stream>rate=R len=L crc_err=X icv_err=Y rssi=A,B evm=A,B snr=A,B body=HEX`. Same source as the Tier-2 diagnostics in `<devourer-body>`; no new RX-status fields, just surfacing what `FrameParser` already populates on `RxAtrib`. - **`tools/precoder/corruption_analysis.py`** — parses the new fields, reports two new sections: - SNR distribution (min/p25/med/p75/max) for chip-clean vs chip-corrupt populations - BER per 5-dB SNR bucket Uses `max(snr_A, snr_B)` as the "effective" SNR — on single-antenna 1T1R sticks path B reads 0 (no signal, not "0 dB"), so a naive `min` would collapse the bucket view; `max` picks the active path on 1T1R and the stronger path on 2T2R single-stream operation. - **`stream_rx.py` / `tun_p2p.py` / `precoder_stream_roundtrip.py`** — regex updated to tolerate the new optional `rssi=`/`evm=`/`snr=` fields. None of them use the metrics yet (pass-through compatibility). ## Hardware verification 500 frames at default TX power, RTL8812AU → T2U Plus RTL8821AU, ch 6: ``` phy SNR (stronger path, dB): chip-clean : n=467 min=0 p25=30 med=33 p75=38 max=51 chip-corrupt : n=0 BER by SNR bucket (stronger path, 5-dB buckets): bucket frames bits-cmp bit-err BER 0-5 dB 1 192 0 0.000e+00 20-25 dB 11 2112 0 0.000e+00 25-30 dB 76 14592 0 0.000e+00 30-35 dB 178 34176 0 0.000e+00 35-40 dB 122 23424 0 0.000e+00 40-45 dB 55 10560 0 0.000e+00 45-50 dB 19 3648 0 0.000e+00 50-55 dB 5 960 0 0.000e+00 ``` Bench link is too clean for chip-corrupt events even at the SNR tails — same finding as the post-PR-investigation in #83 (loss is at PHY sync, not FCS). The analyser is ready for noisier deployments / range-extended captures (follow-up B). ## Offline analyser smoke Synthetic 5-clean@28dB + 5-corrupt@5dB injection. Analyser correctly buckets: ``` BER by SNR bucket (stronger path, 5-dB buckets): bucket frames bits-cmp bit-err BER 5-10 dB 5 960 10 1.042e-02 25-30 dB 5 960 0 0.000e+00 ``` The per-bucket correlation works as designed — corrupted samples land in the 5-10 dB bucket at 1.04×10⁻² BER, clean samples land at high SNR with BER 0. Builds on #83 (merged). Next: follow-up B — characterise real-world background corruption patterns (burst-length distribution, byte-position distribution) to inform stream-layer FEC design. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
josephnef
added a commit
that referenced
this pull request
Jun 7, 2026
## Summary Follow-up B from #83 (depends on #84's phy soft metrics): a chip-side `DEVOURER_RX_DUMP_ALL=1` env var that emits one line per RX frame with the chip's full integrity + phy soft-metric vector, plus an aggregate analyser that turns those into FEC-design-grade statistics. The previous work showed that the **chip-corrupt** pipeline now reaches the application layer (#83) and that per-frame phy metrics let the analyser correlate BER with SNR (#84). This PR is the third leg: a **long-capture survey** tool that characterises the actual corruption-pattern distribution real-world deployments face, so a FEC layer on top of the stream link can be sized empirically rather than guessed. ## Changes - **`demo/main.cpp`** — new `DEVOURER_RX_DUMP_ALL=1` knob emits `<devourer-corrupt-any>len=L crc_err=X icv_err=Y rate=R rssi=A,B evm=A,B snr=A,B`. Body bytes are deliberately omitted (a hot survey would inflate the log past usable size); the aggregate report only needs length + flags + phy. - **`tools/precoder/corruption_survey.py`** — new tool that reads those lines and reports: - headline chip-clean vs chip-corrupt counts - **corruption rate broken down by DESC_RATE** (the CCK-vs-OFDM split — without this the headline is dominated by always-clean CCK ACKs/beacons and underestimates what OFDM data faces) - frame-size distribution for each population - phy-metric stats per population, filtered to frames where the chip populated phy stats (CCK reports 0/0; we treat as "no measurement" instead of "0 dB" so the buckets don't collapse) - per-SNR-bucket corruption rate (where measurable) - temporal clustering (live captures only) - a heuristic FEC recommendation based on median-vs-peak corruption rate ## Bench finding 60-second ch6 capture in a busy office environment with several APs in range: ``` === corruption survey (2266 frames, file/pipe) === chip-clean : 1663 ( 73.4%) chip-corrupt : 603 ( 26.6%) corruption rate : 26.61% no-phy-measurement: 2103 (CCK/short frames, chip reports 0/0) Corruption rate by DESC_RATE: idx name count % corrupt rate 0x00 1M CCK 2075 91.6% 412 19.9% 0x02 5.5M CCK 2 0.1% 2 100.0% 0x03 11M CCK 1 0.0% 1 100.0% 0x04 6M OFDM 17 0.8% 17 100.0% 0x05 9M OFDM 19 0.8% 19 100.0% 0x06 12M OFDM 20 0.9% 20 100.0% 0x07 18M OFDM 31 1.4% 31 100.0% 0x08 24M OFDM 22 1.0% 22 100.0% 0x09 36M OFDM 30 1.3% 30 100.0% 0x0a 48M OFDM 31 1.4% 31 100.0% 0x0b 54M OFDM 18 0.8% 18 100.0% ``` ## Reading the result - **1M CCK loses ~20%** even at this location — CCK is robust but background interference still nukes one in five ACKs/beacons. - **Every OFDM rate above CCK is 100% corrupt** because we're hearing distant APs at marginal SNR — the chip detects them, decodes them, fails the FCS, and now (with #83's RCR change) surfaces them. The FEC-design takeaway: - The PoC's 6M OFDM stream link only works because TX and RX are co-located. At any real range the chip will surface FCS failures at high rate. - The stream layer needs **inter-frame parity** (Reed-Solomon over N frames + K parity, Raptor, etc.) to recover from blocks of lost frames, not just per-frame FEC. - For a P2P link's typical "moderate range" use case (e.g. OpenIPC long-range video), expect frame loss rates in the 30–70% range. FEC overhead has to be sized accordingly — at 50% loss you need K/N ≈ 0.5 to be reliable. ## Follow-ups (for whoever picks up the FEC layer) - Pick a parity scheme (Reed-Solomon is simplest, Raptor scales better) and parametrise N, K against captures from realistic ranges. - Decide where parity rides: in-band on the same SA (current TX path) vs. on a dedicated SA / frame type. In-band keeps the link simple but eats stream airtime. - Consider degrading rate gracefully (rateless codes) so the receiver can decode at whatever fraction of N+K frames it actually receives. Builds on #83 (chip-level filter open, merged) and #84 (phy soft metrics, open). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
josephnef
added a commit
that referenced
this pull request
Jun 7, 2026
## Summary The corruption survey in #85 showed real-range OFDM frames on this link will see **30–70% loss**. tun_p2p.py's blind `--repeat N` is a fixed-cost workaround that can't compose to handle the tail; this PR ships a real erasure code on top of the existing stream framing. ## Library `raptorq` from cberner (Rust+PyO3 binding to the RFC 6330 reference port). MIT, manylinux abi3 wheels on PyPI, ~26 Gbps enc / ~7 Gbps dec at K=1000 on commodity x86. `uv add raptorq` is the only install step. ## Wire format The existing `stream.py` framing stays untouched. FEC is an **inner envelope** living inside `StreamFrame.payload`: ``` FEC_MAGIC (2) = 0xF52E VERSION/FLAGS (1) = 0 K (1) = source symbols per block KREAL (1) = real source symbols in this block (≤ K). Trailing (K - KREAL) decoded symbols are zero-pad to discard. SYMBOL_SIZE (2) = LE u16 BLOCK_ID (2) = LE u16 wraps RAPTORQ_PKT (var) = lib-managed SBN+ESI+symbol inner overhead = 9 B + raptorq's 4 B SBN/ESI = 13 B ``` Source symbols are themselves concatenations of length-prefixed IP packets: ``` [u16 len_a][packet_a]…[u16 len_b][packet_b]…[zero pad to SYMBOL_SIZE] ``` So small packets (ACK floods) share symbols instead of each burning a whole symbol's worth of airtime. ## Files - `tools/precoder/pyproject.toml` — add `raptorq>=2`. - `tools/precoder/stream_fec.py` — `FecConfig`, `FecEncoder` (concatenation packing + block encoding), `FecDecoder` (block-incremental decode + late-symbol drop + block expiry). - `tools/precoder/test_stream_fec.py` — 19 unit tests: round-trip, loss tolerance 0/20/40% at R/K=1, 50% at R/K=2, unrecoverable-block bookkeeping at 70%, concatenation, partial flush, block-id wrap, MTU enforcement, garbage envelopes. - `tools/precoder/tun_p2p.py` — new `--fec-k`/`--fec-overhead`/`--fec-symbol-size`/`--fec-flush-ms`/`--fec-block-expire-ms` flags. tx_thread feeds packets through the encoder; a parallel `fec_flush_thread` force-encodes partial blocks every flush-ms (sparse traffic doesn't stall). rx_thread feeds payloads through the decoder; decoded IP packets go to TUN. Outer `SeqWindow` dedup is forced OFF when FEC is on (RaptorQ symbols self-dedup via SBN+ESI). New `fec=[...]` segment in the periodic stderr report. Docstring extended. ## Hardware verification Two-netns single-host bench (RTL8812AU `0x8812` + TP-Link Archer T2U Plus / RTL8821AU `2357:0120`, ch 6, no `--repeat`, `ping -c 30 -i 1`): | Config | RTT min/avg/max | Loss | DUP | Blocks ok/lost | |---|---|---:|---:|---:| | `--fec-k 16 --fec-overhead 1.0 --fec-flush-ms 50` | 121 / **160** / 207 ms | 0% | 0 | 30 / 1 (startup) | | `--fec-k 8 --fec-overhead 1.0 --fec-flush-ms 20` | 73 / **95** / 145 ms | 0% | 0 | 30 / 1 (startup) | The K=8 config trades a bit of recovery margin for a 65 ms drop in median RTT. Both decode 100% of source packets on a healthy link; the survey's noisier regimes are what motivates `--fec-overhead > 1`. For comparison from PR #82's earlier numbers (same bench, byte mode): | Mode | Loss | Avg RTT | |---|---:|---:| | Byte mode `--repeat 1` | 10% | 7 ms | | Byte mode `--repeat 4` + dedup | 0% | 10 ms (with up to 25 DUPs per ping eaten by dedup) | | **FEC K=8 R/K=1 flush=20** | **0%** | **95 ms** | FEC moves us from "blind redundancy + dedup" to "real erasure code". The latency cost is the K-source-symbol encode buffer; the win is that the codec scales gracefully to higher loss rates by raising `--fec-overhead` instead of running out at `--repeat=∞`. ## Test plan - [x] `cd tools/precoder && uv run pytest` → 87 passed (31 pipeline + 37 stream + 19 fec) - [x] `python -m pytest tests/precoder_smoke.py tests/precoder_stream_smoke.py` → 8 passed - [x] tun_p2p.py --help parses cleanly (incl. all FEC flags) - [x] Bench: K=16/R=1 and K=8/R=1, both 30/30 ping with 0% loss and 0 DUPs ## Open caveats (documented in script) - Strict block boundaries — no cross-block FEC, no Raptor carousel. Good enough at K=8–16 + 20–50 ms flush; revisit if the latency budget tightens further. - No rateless dynamic overhead — R/K is fixed at construction. A future PR could let RX hint TX to send more repair symbols via a reverse-channel feedback envelope. - Patent note: RFC 6330 has Qualcomm patents largely expired in primary jurisdictions by 2026; cberner's MIT lib explicitly notes this. Builds on #82 (TUN bridge, merged), #83 (corrupted-frame surfacing, merged), #84 (phy soft metrics, open), #85 (corruption survey, open). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Previously, devourer's RX path silently dropped every frame whose chip flagged CRC or ICV error — first at the chip's WMAC filter (RCR_ACRC32 / RCR_AICV both cleared in monitor-mode setup), then at FrameParser (
if (crc_err || icv_err) break;, which threw out the bad frame AND every subsequent frame in the same USB aggregate). The application saw a clean-or-missing erasure channel with no way to inspect or recover from corruption.This PR opens both gates behind a single env var (
DEVOURER_RX_KEEP_CORRUPTED=1), keeping default behaviour unchanged for IP-stack consumers, and ships an analysis tool that quantifies the corruption pattern against a known TX source.Changes
src/RadioManagementModule.cpp—hw_var_set_monitoraddsRCR_ACRC32 | RCR_AICVto the monitor-mode RCR whenDEVOURER_RX_KEEP_CORRUPTEDis set. The chip's WMAC filter would otherwise drop corrupted frames before they reach the host at all; this was the silent gating bug that made the parser change a no-op on its own.src/FrameParser.cpp— pkt_len sanity check moves before the crc/icv check (still needed to find the next aggregate boundary). Oncrc_err || icv_errthe parser now logs + surfaces the packet withRxAtrib.crc_err/icv_errintact and continues processing the rest of the aggregate, instead of dropping it AND its aggregate-mates.demo/main.cpp—<devourer-stream>lines now includecrc_err=0/1 icv_err=0/1. Corrupted bodies are gated behind the sameDEVOURER_RX_KEEP_CORRUPTED=1flag, in lockstep with the chip filter.txdemo/stream_tx_demo/main.cpp—DEVOURER_TX_POWERenv var (default 40 unchanged), useful for stress-testing the receive path at attenuated SNR.tools/precoder/corruption_analysis.py— reconstructs expected TX bodies from a source file, compares byte- and bit-wise against captured RX frames (clean or chip-corrupt), reports chip-clean vs chip-corrupt counts, total bit errors / BER, per-frame error distribution, and a byte-position histogram.stream_rx.py,tun_p2p.py, and the roundtrip harness — accept the new optionalcrc_err=/icv_err=fields without breaking older logs.Verification
On-air, real
crc_err=1events through the new path (RTL8821AU / TP-Link Archer T2U Plus2357:0120, channel 6,DEVOURER_RX_KEEP_CORRUPTED=1, ~25 s of background-traffic capture):746 frames whose chip-FCS check failed were surfaced through
FrameParser::recvbuf2recvframe. The unmodified parser would have dropped every one of them, plus their USB-aggregate-mates (eachbreakdiscards the rest of the aggregate — typically 4–8 frames). The real-world deployment value of the fix is exactly this kind of traffic — frames the chip could tell us about but the old path threw on the floor.Where the controlled stream's missing frames went (post-review verification):
We confirmed that the canonical-SA TX→RX stream itself stays clean even with
DEVOURER_TX_POWER=1, by enabling a debug mode that dumps the first 30 header bytes of every corrupted frame regardless of SA match:So the 51 missing frames in
500 sent → 449 receivedare lost at PHY sync, not at FCS — they never reach the chip's decoder so no descriptor is produced. The 10% loss in the earliertun_p2p--repeat 1ping result is the same phenomenon. The bench link is too clean for FCS failures on the controlled stream; the value of this PR is for noisier real-world deployments (and for the 746 background events captured above, which prove the path works on live traffic).Offline analyser validation (synthetic 5-clean + 5-corrupt mix injected into
<devourer-stream>log, run throughcorruption_analysis.py):Exact counts, exact positions — the analyser correctly identifies what was corrupted, where, and how badly.
Follow-ups (not in this PR)
Builds on #82 (TUN p2p bridge), which is on master.
🤖 Generated with Claude Code