Skip to content

feat(verification): deterministic audio identity verification engine (whisper.cpp)#728

Draft
kevinheneveld wants to merge 2 commits into
Listenarrs:canaryfrom
kevinheneveld:feat/audio-verification-engine
Draft

feat(verification): deterministic audio identity verification engine (whisper.cpp)#728
kevinheneveld wants to merge 2 commits into
Listenarrs:canaryfrom
kevinheneveld:feat/audio-verification-engine

Conversation

@kevinheneveld

Copy link
Copy Markdown
Contributor

What

Slice 1 of audio identity verification: transcribe a book's opening/closing windows locally with whisper.cpp and check what the audio says it is against the record's metadata.

  • Engine: DeterministicIdentityVerifier + TranscriptMatcher (phonetic-tolerant field matching) + SpokenCreditsExtractor (what the credits claim — seeds later remediation flows) + completeness estimation vs. catalog runtime (partial sets and mega-pack imports downgrade to Uncertain rather than fake confidence).
  • Verdict semantics (deliberate change from the original design, learned from live use): absence of credits is not evidence of wrong content. A confident Mismatch requires the transcript to actually announce something that contradicts the metadata; plenty-of-speech-but-no-credits yields a new NoSpokenCredits outcome → AgentUnverifiable status (neutral badge, own filter, never lands in needs-review). Music junk that announces "performed by …" still flags.
  • Runtime: whisper.cpp built from source in a Docker stage (pinned instruction-set flags documented inline; base.en model baked; MIT notices included), env-overridable for dev (LISTENARR_WHISPER_BIN/_MODEL). Stoppable background jobs, SignalR progress, optional low-CPU priority, auto-verify on import (setting, default on).
  • UI: verification card on the book page (per-field scores, transcript toggle, Verify Audio action), grid shield badges, library filter.
  • One consolidated migration (verdict columns + settings, hand-set defaults). Manual states are sticky: agent passes never overwrite a human verdict.

Slice 2 (separate PR) adds the remediation flows: relabel to the heard book, wrong-content rejection, re-verify after file transfers.

Tests

92 new (matcher/verifier/extractor/completeness/queue + the three NoSpokenCredits behavior pins). Full suite 1107/1107; vue-tsc (pinned 3.3.4) + eslint + prettier + vitest clean.

🤖 Generated with Claude Code

kevinheneveld and others added 2 commits July 2, 2026 14:49
…(whisper.cpp)

ADR-0001 Tier 1: fully-offline identity verification. Clips a book's
opening/closing windows via bundled ffmpeg (16 kHz mono WAV), transcribes
them with a whisper.cpp CLI baked into the Docker image at build time
(x86-64-v3 flags pinned; base.en model; env-overridable via
LISTENARR_WHISPER_BIN / LISTENARR_WHISPER_MODEL), and fuzzy-matches the
transcript against stored metadata: normalized token windows for titles,
plus phonetic + edit-distance matching for author/narrator names, with
timestamped decoding and a silent-gap re-probe so music-overlaid credits
aren't skipped. Verdicts (status, confidence, per-field scores, heard
credits, completeness vs catalog runtime, transcript) persist on the
audiobook — one consolidated hand-written migration adds the columns and
settings (sample windows, low-CPU priority, verify-on-import default on).

Jobs run through an in-memory channel queue + background worker
(stoppable mid-transcription, SignalR progress on the settings hub,
serial processing — whisper saturates CPU alone). Manual states are
sticky; agents only flag. Auto-verify enqueues after import, and the
pre-ingest track-shape check rejects music albums before files commit.
UI: Verify Audio action + Audio Verification card on the detail page,
grid/list badges, and needs-review / no-spoken-credits filters.

DELIBERATE DIVERGENCE from the original implementation: "plenty of
speech with no trace of title or author" is no longer a confident
Mismatch. Live use showed it as the top false-flag — many books open
with pure narration, and absence of credits is not evidence of wrong
content. A gross mismatch now also requires announcement-shaped content
in the transcript (extracted claims or credit markers); without any, the
verdict is the new NoSpokenCredits outcome → AgentUnverifiable status: a
neutral, agent-writable state with its own badge and filter bucket,
excluded from needs-review.

Supporting changes: IProcessRunner gains a priority-class overload
(idle-priority whisper/ffmpeg); IFfmpegService gains ffmpeg-binary
install/promotion alongside ffprobe (FfmpegService partial);
DownloadImportService extracts per-file metadata up front and reuses it
as the import loop's cache (pre-ingest inspection costs no extra ffprobe
calls); AudioSampleSet cleanup is injected by infrastructure so the
application layer holds no filesystem code.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
vite build fails with 'Unclosed block' — vue-tsc does not parse <style>
sections, so this only surfaced at bundle time.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant