feat(library): duplicate record + duplicate copy detection#729
Draft
kevinheneveld wants to merge 4 commits into
Draft
feat(library): duplicate record + duplicate copy detection#729kevinheneveld wants to merge 4 commits into
kevinheneveld wants to merge 4 commits into
Conversation
When files actually belong to a different book the library already
tracks (two books' tracks imported onto one record, or a collection
being split into its real books), a "Move Files to Another Book" action
on the detail page opens a transfer dialog with per-file checkboxes and
a library search ranked by token overlap.
POST /library/{id}/files/transfer: DB ownership reassigns always; the
physical file moves into the destination folder best-effort (failures
leave it in place with a warning — Organize can relocate later).
Collisions with a row the target already owns at the same path are
detected BEFORE any disk move and skipped with a warning, and any
per-file DB failure becomes a warning rather than aborting the whole
transfer. A source left without audio gets its legacy single-file
columns reset; history entries land on both records.
Row reassignment uses a targeted IAudiobookFileRepository.ReassignAsync
(detached stub, only the reassigned columns marked modified) so bulk
transfers can't trip EF identity-map conflicts on overlapping
navigation graphs.
Warning/error toasts are now sticky by default (dismissed via the close
button) — transfer warnings carry information the user needs to read
and act on; info/success keep the 5s auto-dismiss.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…k detail page
Deleting duplicates or moving a subset of files one at a time doesn't
scale (e.g. a duplicated book: the same audio under two different
filename schemes on one record). Add a selection checkbox to each file
row (shift-click extends a range) and a toolbar with select-all, "Move
selected…" and "Delete selected". Bulk move hands the selection to
TransferFilesModal via its initialFileIds preselection; bulk delete
loops a per-file delete behind one confirm with a shared "also delete
from disk" option and a summary toast. Selection clears on reload.
The per-file delete this rides on is new here too:
DELETE /library/{id}/files/{fileId}?deleteFromDisk= — removes the
AudiobookFile row (ownership guarded: the file must belong to the
addressed audiobook), optionally deletes the file from disk
(disk-delete failures surface as warnings; the row is still removed),
and records a "File Removed" history entry.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s real books
A multi-book dump imported onto one record (live case: 773 files
spanning 34 books) previously required manual filesystem surgery. A new
"Split Collection" action clusters the record's files into per-book
groups and lets each group be moved to the record it belongs to,
deleted (a redundant duplicate copy), or left alone.
GET /library/{id}/split/preview (read-only): clusters by subdirectory
first (strongest signal), then embedded Album/Title tag for flat files
(a bulk-renamed collection has useless filenames but the tags still
name the real book), then filename stem with track/list numbering
stripped. The numbering STYLE stays part of a group's identity so two
copies of one book ("Title-NN" vs "Title (N)") yield two groups; a
trailing volume designator's number is kept so volumes don't collapse;
an unnumbered file merges into its numbered siblings only when
duration/size says it plausibly IS one track. Only the raw ffprobe runs
in parallel — higher-level metadata reads share a non-thread-safe
DbContext and probing them concurrently made clusters
non-deterministic. Each group gets a suggested destination: same-author
records first, longest normalized-title containment wins
(SplitDestinationSuggester + TitleMatcher.Normalize).
Applying is client-driven: a sequence of the existing file-transfer and
per-file-delete calls per group, with per-group progress and a summary
toast. FileClustering is pure and deterministic, covered by unit tests
with fixtures from the live collection that motivated the feature.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A library accretes two flavors of duplication: the same book tracked twice (added from search and again from a scan, or two catalog paths to one edition), and one record holding the same audio twice under two filename schemes. GET /library/duplicates (read-only sweep): duplicate RECORDS group by identical ASIN first, then by normalized title + primary author. Conservative by design — two distinct non-empty ASINs are treated as different editions, and same-title records with different subtitles AND different years stay untouched; false positives erode trust in the list. Each group carries per-record file counts/sizes and a suggested keeper (most files, then oldest id). Duplicate COPIES reuse FileClustering signatures: a record is flagged when two flat-file clusters share a stem under different numbering styles, or two multi-file flat clusters cover near-identical total runtime. Disc/part subfolders (CD1/CD2) never count — dir clusters are excluded from this signal. Settings → General gains a Duplicates section: on-demand scan, groups with keeper highlighted and record-only delete (files stay on disk), and a duplicate-copies list linking to the book page where multi-select and Split Collection already handle the cleanup. The endpoint returns actionable per-book rows so a future dashboard can surface the counts as clickable categories. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
GET /library/duplicates(read-only sweep, no ffprobe) plus a Settings → General section:FileClusteringsignatures — same filename stem with a different numbering style, or two multi-file clusters with totals within 25%, flags a record as likely holding the same book twice (resolved on the book page via multi-select / split). Subdirectory clusters are excluded so CD1/CD2 layouts don't false-positive.Per-book rows are returned (not just counts) so an upcoming dashboard can surface these as clickable health categories.
Tests
7 new workflow tests (group formation, both conservative exclusions, keeper choice, two-scheme copy detection, single-scheme non-flag). Full suite 1063/1063; vue-tsc (3.3.4) + vite build + eslint + prettier + vitest clean.
🤖 Generated with Claude Code