feat(verification): deterministic audio identity verification engine (whisper.cpp) by kevinheneveld · Pull Request #728 · Listenarrs/Listenarr

kevinheneveld · 2026-07-02T22:51:05Z

What

Slice 1 of audio identity verification: transcribe a book's opening/closing windows locally with whisper.cpp and check what the audio says it is against the record's metadata.

Engine: DeterministicIdentityVerifier + TranscriptMatcher (phonetic-tolerant field matching) + SpokenCreditsExtractor (what the credits claim — seeds later remediation flows) + completeness estimation vs. catalog runtime (partial sets and mega-pack imports downgrade to Uncertain rather than fake confidence).
Verdict semantics (deliberate change from the original design, learned from live use): absence of credits is not evidence of wrong content. A confident Mismatch requires the transcript to actually announce something that contradicts the metadata; plenty-of-speech-but-no-credits yields a new NoSpokenCredits outcome → AgentUnverifiable status (neutral badge, own filter, never lands in needs-review). Music junk that announces "performed by …" still flags.
Runtime: whisper.cpp built from source in a Docker stage (pinned instruction-set flags documented inline; base.en model baked; MIT notices included), env-overridable for dev (LISTENARR_WHISPER_BIN/_MODEL). Stoppable background jobs, SignalR progress, optional low-CPU priority, auto-verify on import (setting, default on).
UI: verification card on the book page (per-field scores, transcript toggle, Verify Audio action), grid shield badges, library filter.
One consolidated migration (verdict columns + settings, hand-set defaults). Manual states are sticky: agent passes never overwrite a human verdict.

Slice 2 (separate PR) adds the remediation flows: relabel to the heard book, wrong-content rejection, re-verify after file transfers.

Tests

92 new (matcher/verifier/extractor/completeness/queue + the three NoSpokenCredits behavior pins). Full suite 1107/1107; vue-tsc (pinned 3.3.4) + eslint + prettier + vitest clean.

🤖 Generated with Claude Code

…(whisper.cpp) ADR-0001 Tier 1: fully-offline identity verification. Clips a book's opening/closing windows via bundled ffmpeg (16 kHz mono WAV), transcribes them with a whisper.cpp CLI baked into the Docker image at build time (x86-64-v3 flags pinned; base.en model; env-overridable via LISTENARR_WHISPER_BIN / LISTENARR_WHISPER_MODEL), and fuzzy-matches the transcript against stored metadata: normalized token windows for titles, plus phonetic + edit-distance matching for author/narrator names, with timestamped decoding and a silent-gap re-probe so music-overlaid credits aren't skipped. Verdicts (status, confidence, per-field scores, heard credits, completeness vs catalog runtime, transcript) persist on the audiobook — one consolidated hand-written migration adds the columns and settings (sample windows, low-CPU priority, verify-on-import default on). Jobs run through an in-memory channel queue + background worker (stoppable mid-transcription, SignalR progress on the settings hub, serial processing — whisper saturates CPU alone). Manual states are sticky; agents only flag. Auto-verify enqueues after import, and the pre-ingest track-shape check rejects music albums before files commit. UI: Verify Audio action + Audio Verification card on the detail page, grid/list badges, and needs-review / no-spoken-credits filters. DELIBERATE DIVERGENCE from the original implementation: "plenty of speech with no trace of title or author" is no longer a confident Mismatch. Live use showed it as the top false-flag — many books open with pure narration, and absence of credits is not evidence of wrong content. A gross mismatch now also requires announcement-shaped content in the transcript (extracted claims or credit markers); without any, the verdict is the new NoSpokenCredits outcome → AgentUnverifiable status: a neutral, agent-writable state with its own badge and filter bucket, excluded from needs-review. Supporting changes: IProcessRunner gains a priority-class overload (idle-priority whisper/ffmpeg); IFfmpegService gains ffmpeg-binary install/promotion alongside ffprobe (FfmpegService partial); DownloadImportService extracts per-file metadata up front and reuses it as the import loop's cache (pre-ingest inspection costs no extra ffprobe calls); AudioSampleSet cleanup is injected by infrastructure so the application layer holds no filesystem code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

vite build fails with 'Unclosed block' — vue-tsc does not parse <style> sections, so this only surfaced at bundle time. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

kevinheneveld and others added 2 commits July 2, 2026 14:49

fix(verification): restore truncated .monitored-badge CSS block

31fe1aa

vite build fails with 'Unclosed block' — vue-tsc does not parse <style> sections, so this only surfaced at bundle time. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(verification): deterministic audio identity verification engine (whisper.cpp)#728

feat(verification): deterministic audio identity verification engine (whisper.cpp)#728
kevinheneveld wants to merge 2 commits into
Listenarrs:canaryfrom
kevinheneveld:feat/audio-verification-engine

kevinheneveld commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

kevinheneveld commented Jul 2, 2026

What

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant