feat(verification): deterministic audio identity verification engine (whisper.cpp)#728
Draft
kevinheneveld wants to merge 2 commits into
Draft
feat(verification): deterministic audio identity verification engine (whisper.cpp)#728kevinheneveld wants to merge 2 commits into
kevinheneveld wants to merge 2 commits into
Conversation
…(whisper.cpp) ADR-0001 Tier 1: fully-offline identity verification. Clips a book's opening/closing windows via bundled ffmpeg (16 kHz mono WAV), transcribes them with a whisper.cpp CLI baked into the Docker image at build time (x86-64-v3 flags pinned; base.en model; env-overridable via LISTENARR_WHISPER_BIN / LISTENARR_WHISPER_MODEL), and fuzzy-matches the transcript against stored metadata: normalized token windows for titles, plus phonetic + edit-distance matching for author/narrator names, with timestamped decoding and a silent-gap re-probe so music-overlaid credits aren't skipped. Verdicts (status, confidence, per-field scores, heard credits, completeness vs catalog runtime, transcript) persist on the audiobook — one consolidated hand-written migration adds the columns and settings (sample windows, low-CPU priority, verify-on-import default on). Jobs run through an in-memory channel queue + background worker (stoppable mid-transcription, SignalR progress on the settings hub, serial processing — whisper saturates CPU alone). Manual states are sticky; agents only flag. Auto-verify enqueues after import, and the pre-ingest track-shape check rejects music albums before files commit. UI: Verify Audio action + Audio Verification card on the detail page, grid/list badges, and needs-review / no-spoken-credits filters. DELIBERATE DIVERGENCE from the original implementation: "plenty of speech with no trace of title or author" is no longer a confident Mismatch. Live use showed it as the top false-flag — many books open with pure narration, and absence of credits is not evidence of wrong content. A gross mismatch now also requires announcement-shaped content in the transcript (extracted claims or credit markers); without any, the verdict is the new NoSpokenCredits outcome → AgentUnverifiable status: a neutral, agent-writable state with its own badge and filter bucket, excluded from needs-review. Supporting changes: IProcessRunner gains a priority-class overload (idle-priority whisper/ffmpeg); IFfmpegService gains ffmpeg-binary install/promotion alongside ffprobe (FfmpegService partial); DownloadImportService extracts per-file metadata up front and reuses it as the import loop's cache (pre-ingest inspection costs no extra ffprobe calls); AudioSampleSet cleanup is injected by infrastructure so the application layer holds no filesystem code. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
vite build fails with 'Unclosed block' — vue-tsc does not parse <style> sections, so this only surfaced at bundle time. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Slice 1 of audio identity verification: transcribe a book's opening/closing windows locally with whisper.cpp and check what the audio says it is against the record's metadata.
DeterministicIdentityVerifier+TranscriptMatcher(phonetic-tolerant field matching) +SpokenCreditsExtractor(what the credits claim — seeds later remediation flows) + completeness estimation vs. catalog runtime (partial sets and mega-pack imports downgrade to Uncertain rather than fake confidence).NoSpokenCreditsoutcome →AgentUnverifiablestatus (neutral badge, own filter, never lands in needs-review). Music junk that announces "performed by …" still flags.base.enmodel baked; MIT notices included), env-overridable for dev (LISTENARR_WHISPER_BIN/_MODEL). Stoppable background jobs, SignalR progress, optional low-CPU priority, auto-verify on import (setting, default on).Slice 2 (separate PR) adds the remediation flows: relabel to the heard book, wrong-content rejection, re-verify after file transfers.
Tests
92 new (matcher/verifier/extractor/completeness/queue + the three NoSpokenCredits behavior pins). Full suite 1107/1107; vue-tsc (pinned 3.3.4) + eslint + prettier + vitest clean.
🤖 Generated with Claude Code