Phy-level soft metrics on stream lines + BER-vs-SNR analyser by josephnef · Pull Request #84 · OpenIPC/devourer

josephnef · 2026-06-07T13:31:24Z

Summary

Follow-up A from #83. Adds per-path RSSI / EVM / SNR to every <devourer-stream> line so corruption_analysis.py can correlate BER with link quality on a per-frame basis instead of aggregated-only statistics.

Changes

demo/main.cpp — <devourer-stream>rate=R len=L crc_err=X icv_err=Y rssi=A,B evm=A,B snr=A,B body=HEX. Same source as the Tier-2 diagnostics in <devourer-body>; no new RX-status fields, just surfacing what FrameParser already populates on RxAtrib.
tools/precoder/corruption_analysis.py — parses the new fields, reports two new sections:
- SNR distribution (min/p25/med/p75/max) for chip-clean vs chip-corrupt populations
- BER per 5-dB SNR bucket
  Uses max(snr_A, snr_B) as the "effective" SNR — on single-antenna 1T1R sticks path B reads 0 (no signal, not "0 dB"), so a naive min would collapse the bucket view; max picks the active path on 1T1R and the stronger path on 2T2R single-stream operation.
stream_rx.py / tun_p2p.py / precoder_stream_roundtrip.py — regex updated to tolerate the new optional rssi=/evm=/snr= fields. None of them use the metrics yet (pass-through compatibility).

Hardware verification

500 frames at default TX power, RTL8812AU → T2U Plus RTL8821AU, ch 6:

phy SNR (stronger path, dB):
  chip-clean    : n=467 min=0 p25=30 med=33 p75=38 max=51
  chip-corrupt  : n=0

BER by SNR bucket (stronger path, 5-dB buckets):
  bucket       frames   bits-cmp   bit-err    BER
       0-5 dB        1        192        0   0.000e+00
     20-25 dB       11       2112        0   0.000e+00
     25-30 dB       76      14592        0   0.000e+00
     30-35 dB      178      34176        0   0.000e+00
     35-40 dB      122      23424        0   0.000e+00
     40-45 dB       55      10560        0   0.000e+00
     45-50 dB       19       3648        0   0.000e+00
     50-55 dB        5        960        0   0.000e+00

Bench link is too clean for chip-corrupt events even at the SNR tails — same finding as the post-PR-investigation in #83 (loss is at PHY sync, not FCS). The analyser is ready for noisier deployments / range-extended captures (follow-up B).

Offline analyser smoke

Synthetic 5-clean@28dB + 5-corrupt@5dB injection. Analyser correctly buckets:

BER by SNR bucket (stronger path, 5-dB buckets):
  bucket       frames   bits-cmp   bit-err    BER
      5-10 dB        5        960       10   1.042e-02
     25-30 dB        5        960        0   0.000e+00

The per-bucket correlation works as designed — corrupted samples land in the 5-10 dB bucket at 1.04×10⁻² BER, clean samples land at high SNR with BER 0.

Builds on #83 (merged). Next: follow-up B — characterise real-world background corruption patterns (burst-length distribution, byte-position distribution) to inform stream-layer FEC design.

🤖 Generated with Claude Code

Follow-up A from #83. Adds per-path RSSI / EVM / SNR to every <devourer-stream> line so corruption_analysis.py can correlate BER with link quality on a per-frame basis instead of relying on aggregated statistics. * demo/main.cpp: <devourer-stream>rate=R len=L crc_err=X icv_err=Y rssi=A,B evm=A,B snr=A,B body=HEX. Same source as the Tier-2 diagnostics in <devourer-body>; no new RX-status fields, just surfacing what FrameParser already populates. * tools/precoder/corruption_analysis.py: parses the new fields, reports - SNR distribution (min/p25/med/p75/max) for chip-clean vs chip-corrupt populations - BER per 5-dB SNR bucket Uses max(snr_A, snr_B) as the "effective" SNR — on single-antenna 1T1R sticks path B reads 0 (no signal, not "0 dB"), so a naive min would always report 0 and the bucket view collapses; max picks the active path on 1T1R and the stronger path on 2T2R single-stream operation. * stream_rx.py / tun_p2p.py / precoder_stream_roundtrip.py: regex updated to tolerate the new optional rssi/evm/snr fields (none read them yet — pass-through compatibility). Verification Hardware (500 frames at default TX power, RTL8812AU → T2U Plus RTL8821AU, ch 6): phy SNR (stronger path, dB): chip-clean : n=467 min=0 p25=30 med=33 p75=38 max=51 chip-corrupt : n=0 BER by SNR bucket (stronger path, 5-dB buckets): bucket frames bits-cmp bit-err BER 0-5 dB 1 192 0 0.000e+00 20-25 dB 11 2112 0 0.000e+00 25-30 dB 76 14592 0 0.000e+00 30-35 dB 178 34176 0 0.000e+00 35-40 dB 122 23424 0 0.000e+00 40-45 dB 55 10560 0 0.000e+00 45-50 dB 19 3648 0 0.000e+00 50-55 dB 5 960 0 0.000e+00 Bench link is too clean for chip-corrupt events even at the SNR tails, which matches the post-PR-investigation finding for #83: at bench distance the loss is at PHY sync, not FCS. The analyser is ready for noisier deployments / range-extended captures (follow-up B). Offline smoke (synthetic 5-clean@28dB + 5-corrupt@5dB injection) correctly buckets BER=0 in the 25-30 dB bucket and BER=1.04e-2 in the 5-10 dB bucket — the per-bucket correlation works as designed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

## Summary Follow-up B from #83 (depends on #84's phy soft metrics): a chip-side `DEVOURER_RX_DUMP_ALL=1` env var that emits one line per RX frame with the chip's full integrity + phy soft-metric vector, plus an aggregate analyser that turns those into FEC-design-grade statistics. The previous work showed that the **chip-corrupt** pipeline now reaches the application layer (#83) and that per-frame phy metrics let the analyser correlate BER with SNR (#84). This PR is the third leg: a **long-capture survey** tool that characterises the actual corruption-pattern distribution real-world deployments face, so a FEC layer on top of the stream link can be sized empirically rather than guessed. ## Changes - **`demo/main.cpp`** — new `DEVOURER_RX_DUMP_ALL=1` knob emits `<devourer-corrupt-any>len=L crc_err=X icv_err=Y rate=R rssi=A,B evm=A,B snr=A,B`. Body bytes are deliberately omitted (a hot survey would inflate the log past usable size); the aggregate report only needs length + flags + phy. - **`tools/precoder/corruption_survey.py`** — new tool that reads those lines and reports: - headline chip-clean vs chip-corrupt counts - **corruption rate broken down by DESC_RATE** (the CCK-vs-OFDM split — without this the headline is dominated by always-clean CCK ACKs/beacons and underestimates what OFDM data faces) - frame-size distribution for each population - phy-metric stats per population, filtered to frames where the chip populated phy stats (CCK reports 0/0; we treat as "no measurement" instead of "0 dB" so the buckets don't collapse) - per-SNR-bucket corruption rate (where measurable) - temporal clustering (live captures only) - a heuristic FEC recommendation based on median-vs-peak corruption rate ## Bench finding 60-second ch6 capture in a busy office environment with several APs in range: ``` === corruption survey (2266 frames, file/pipe) === chip-clean : 1663 ( 73.4%) chip-corrupt : 603 ( 26.6%) corruption rate : 26.61% no-phy-measurement: 2103 (CCK/short frames, chip reports 0/0) Corruption rate by DESC_RATE: idx name count % corrupt rate 0x00 1M CCK 2075 91.6% 412 19.9% 0x02 5.5M CCK 2 0.1% 2 100.0% 0x03 11M CCK 1 0.0% 1 100.0% 0x04 6M OFDM 17 0.8% 17 100.0% 0x05 9M OFDM 19 0.8% 19 100.0% 0x06 12M OFDM 20 0.9% 20 100.0% 0x07 18M OFDM 31 1.4% 31 100.0% 0x08 24M OFDM 22 1.0% 22 100.0% 0x09 36M OFDM 30 1.3% 30 100.0% 0x0a 48M OFDM 31 1.4% 31 100.0% 0x0b 54M OFDM 18 0.8% 18 100.0% ``` ## Reading the result - **1M CCK loses ~20%** even at this location — CCK is robust but background interference still nukes one in five ACKs/beacons. - **Every OFDM rate above CCK is 100% corrupt** because we're hearing distant APs at marginal SNR — the chip detects them, decodes them, fails the FCS, and now (with #83's RCR change) surfaces them. The FEC-design takeaway: - The PoC's 6M OFDM stream link only works because TX and RX are co-located. At any real range the chip will surface FCS failures at high rate. - The stream layer needs **inter-frame parity** (Reed-Solomon over N frames + K parity, Raptor, etc.) to recover from blocks of lost frames, not just per-frame FEC. - For a P2P link's typical "moderate range" use case (e.g. OpenIPC long-range video), expect frame loss rates in the 30–70% range. FEC overhead has to be sized accordingly — at 50% loss you need K/N ≈ 0.5 to be reliable. ## Follow-ups (for whoever picks up the FEC layer) - Pick a parity scheme (Reed-Solomon is simplest, Raptor scales better) and parametrise N, K against captures from realistic ranges. - Decide where parity rides: in-band on the same SA (current TX path) vs. on a dedicated SA / frame type. In-band keeps the link simple but eats stream airtime. - Consider degrading rate gracefully (rateless codes) so the receiver can decode at whatever fraction of N+K frames it actually receives. Builds on #83 (chip-level filter open, merged) and #84 (phy soft metrics, open). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

## Summary The corruption survey in #85 showed real-range OFDM frames on this link will see **30–70% loss**. tun_p2p.py's blind `--repeat N` is a fixed-cost workaround that can't compose to handle the tail; this PR ships a real erasure code on top of the existing stream framing. ## Library `raptorq` from cberner (Rust+PyO3 binding to the RFC 6330 reference port). MIT, manylinux abi3 wheels on PyPI, ~26 Gbps enc / ~7 Gbps dec at K=1000 on commodity x86. `uv add raptorq` is the only install step. ## Wire format The existing `stream.py` framing stays untouched. FEC is an **inner envelope** living inside `StreamFrame.payload`: ``` FEC_MAGIC (2) = 0xF52E VERSION/FLAGS (1) = 0 K (1) = source symbols per block KREAL (1) = real source symbols in this block (≤ K). Trailing (K - KREAL) decoded symbols are zero-pad to discard. SYMBOL_SIZE (2) = LE u16 BLOCK_ID (2) = LE u16 wraps RAPTORQ_PKT (var) = lib-managed SBN+ESI+symbol inner overhead = 9 B + raptorq's 4 B SBN/ESI = 13 B ``` Source symbols are themselves concatenations of length-prefixed IP packets: ``` [u16 len_a][packet_a]…[u16 len_b][packet_b]…[zero pad to SYMBOL_SIZE] ``` So small packets (ACK floods) share symbols instead of each burning a whole symbol's worth of airtime. ## Files - `tools/precoder/pyproject.toml` — add `raptorq>=2`. - `tools/precoder/stream_fec.py` — `FecConfig`, `FecEncoder` (concatenation packing + block encoding), `FecDecoder` (block-incremental decode + late-symbol drop + block expiry). - `tools/precoder/test_stream_fec.py` — 19 unit tests: round-trip, loss tolerance 0/20/40% at R/K=1, 50% at R/K=2, unrecoverable-block bookkeeping at 70%, concatenation, partial flush, block-id wrap, MTU enforcement, garbage envelopes. - `tools/precoder/tun_p2p.py` — new `--fec-k`/`--fec-overhead`/`--fec-symbol-size`/`--fec-flush-ms`/`--fec-block-expire-ms` flags. tx_thread feeds packets through the encoder; a parallel `fec_flush_thread` force-encodes partial blocks every flush-ms (sparse traffic doesn't stall). rx_thread feeds payloads through the decoder; decoded IP packets go to TUN. Outer `SeqWindow` dedup is forced OFF when FEC is on (RaptorQ symbols self-dedup via SBN+ESI). New `fec=[...]` segment in the periodic stderr report. Docstring extended. ## Hardware verification Two-netns single-host bench (RTL8812AU `0x8812` + TP-Link Archer T2U Plus / RTL8821AU `2357:0120`, ch 6, no `--repeat`, `ping -c 30 -i 1`): | Config | RTT min/avg/max | Loss | DUP | Blocks ok/lost | |---|---|---:|---:|---:| | `--fec-k 16 --fec-overhead 1.0 --fec-flush-ms 50` | 121 / **160** / 207 ms | 0% | 0 | 30 / 1 (startup) | | `--fec-k 8 --fec-overhead 1.0 --fec-flush-ms 20` | 73 / **95** / 145 ms | 0% | 0 | 30 / 1 (startup) | The K=8 config trades a bit of recovery margin for a 65 ms drop in median RTT. Both decode 100% of source packets on a healthy link; the survey's noisier regimes are what motivates `--fec-overhead > 1`. For comparison from PR #82's earlier numbers (same bench, byte mode): | Mode | Loss | Avg RTT | |---|---:|---:| | Byte mode `--repeat 1` | 10% | 7 ms | | Byte mode `--repeat 4` + dedup | 0% | 10 ms (with up to 25 DUPs per ping eaten by dedup) | | **FEC K=8 R/K=1 flush=20** | **0%** | **95 ms** | FEC moves us from "blind redundancy + dedup" to "real erasure code". The latency cost is the K-source-symbol encode buffer; the win is that the codec scales gracefully to higher loss rates by raising `--fec-overhead` instead of running out at `--repeat=∞`. ## Test plan - [x] `cd tools/precoder && uv run pytest` → 87 passed (31 pipeline + 37 stream + 19 fec) - [x] `python -m pytest tests/precoder_smoke.py tests/precoder_stream_smoke.py` → 8 passed - [x] tun_p2p.py --help parses cleanly (incl. all FEC flags) - [x] Bench: K=16/R=1 and K=8/R=1, both 30/30 ping with 0% loss and 0 DUPs ## Open caveats (documented in script) - Strict block boundaries — no cross-block FEC, no Raptor carousel. Good enough at K=8–16 + 20–50 ms flush; revisit if the latency budget tightens further. - No rateless dynamic overhead — R/K is fixed at construction. A future PR could let RX hint TX to send more repair symbols via a reverse-channel feedback envelope. - Patent note: RFC 6330 has Qualcomm patents largely expired in primary jurisdictions by 2026; cberner's MIT lib explicitly notes this. Builds on #82 (TUN bridge, merged), #83 (corrupted-frame surfacing, merged), #84 (phy soft metrics, open), #85 (corruption survey, open). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

Both fields are already on the RX descriptor: `seq_num` is parsed at FrameParser.cpp:98, `tsfl` was one commented-out line at line 129. The FEC layer (#86 / #87) and any latency-measurement consumer want both visible; this is the data the chip already gives us. * src/FrameParser.h — add `uint32_t tsfl` to rx_pkt_attrib alongside the existing seq_num. * src/FrameParser.cpp — uncomment the TSFL parser: - /* pattrib.tsfl=(byte)GET_RX_STATUS_DESC_TSFL_8812(pdesc); */ + pattrib.tsfl = GET_RX_STATUS_DESC_TSFL_8812(pdesc); Drop the bogus `(byte)` cast — the macro reads all 32 bits of pdesc+20 as a u32, not a byte (verified against rtl8812a_recv.h). * demo/main.cpp — extend the <devourer-stream> printf with `seq=%u tsfl=%u`. Optional fields; PR #84's regex pattern in stream_rx.py / tun_p2p.py / corruption_analysis.py already tolerates the new fields via the same pass-through approach used for rssi/evm/snr (no Python-side change required to keep working). What this enables (out of scope for this PR — just data surfacing) * FEC RX side can dedup by chip-side seq before feeding the codec, so air-level retransmissions stop double-counting at the codec. * One-way latency measurement by diffing TSF against the host clock at TX time — a building block for the F5 TX-RPT goodput numbers and for any adaptive `--fec-overhead` loop. Verification * `cmake --build build -j` clean. * Default behaviour: <devourer-stream> lines now carry seq + tsfl fields; existing Python consumers (regexes are tolerant) keep working. tests/regress.py 4-cell matrix byte-identical. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

## Summary Both fields are already on the RX descriptor: `seq_num` is parsed at `FrameParser.cpp:98`, `tsfl` was one commented-out line at line 129. The FEC layer (#86 / #87) and any latency-measurement consumer want both visible; this PR surfaces what the chip already gives us. ## Changes - **`src/FrameParser.h`** — add `uint32_t tsfl` to `rx_pkt_attrib` alongside the existing `seq_num`. - **`src/FrameParser.cpp`** — uncomment the TSFL parser and drop the bogus `(byte)` cast (the macro reads all 32 bits of `pdesc+20` as a u32, not a byte — verified against `rtl8812a_recv.h`): ```diff - /* pattrib.tsfl=(byte)GET_RX_STATUS_DESC_TSFL_8812(pdesc); */ + pattrib.tsfl = GET_RX_STATUS_DESC_TSFL_8812(pdesc); ``` - **`demo/main.cpp`** — extend the `<devourer-stream>` printf with `seq=%u tsfl=%u`. Optional fields; PR #84's regex pattern in `stream_rx.py` / `tun_p2p.py` / `corruption_analysis.py` already tolerates them via the same pass-through approach used for rssi/evm/snr. ## What this enables (out of scope for this PR — just data surfacing) - FEC RX side can dedup by chip-side seq before feeding the codec, so air-level retransmissions stop double-counting at the codec. - One-way latency measurement by diffing TSF against the host clock at TX time — a building block for the F5 TX-RPT goodput numbers and any adaptive `--fec-overhead` loop. ## Test plan - [x] `cmake --build build -j` clean - [x] `<devourer-stream>` lines on master now carry `seq` + `tsfl` fields; existing Python consumers tolerate the additions via their existing regex pass-through (no Python-side change required). - [ ] Reviewer to run an existing tun_p2p bench and confirm the new fields appear without disturbing throughput / loss numbers. Second in the five-feature C++ series. Followed by: - F3 — selectable stream-carrier rate/BW (uses F1's HT-MCS unlock + this PR's seq/tsfl plumbing for dup detection) - F5 — C2H TX-RPT parser + REG_FIFOPAGE_INFO queue-depth poll - F2 — BB-dbgport per-subcarrier IQ spike (research) Predecessor: F1 (#88). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

This was referenced Jun 7, 2026

Corruption-pattern survey tool for FEC design #85

Merged

RaptorQ (RFC 6330) FEC layer for the stream link #86

Merged

josephnef merged commit fa71838 into master Jun 7, 2026
5 checks passed

josephnef deleted the phy-soft-metrics branch June 7, 2026 14:20

josephnef mentioned this pull request Jun 7, 2026

F4: Surface RX seq_num + TSF low on <devourer-stream> #89

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phy-level soft metrics on stream lines + BER-vs-SNR analyser#84

Phy-level soft metrics on stream lines + BER-vs-SNR analyser#84
josephnef merged 1 commit into
masterfrom
phy-soft-metrics

josephnef commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

josephnef commented Jun 7, 2026

Summary

Changes

Hardware verification

Offline analyser smoke

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant