Skip to content

feat(library): "Split collection" — break a multi-book record into its real books#722

Draft
kevinheneveld wants to merge 3 commits into
Listenarrs:canaryfrom
kevinheneveld:feat/split-collection
Draft

feat(library): "Split collection" — break a multi-book record into its real books#722
kevinheneveld wants to merge 3 commits into
Listenarrs:canaryfrom
kevinheneveld:feat/split-collection

Conversation

@kevinheneveld

Copy link
Copy Markdown
Contributor

Stacked on #720 (which stacks on #719) — split's apply phase is a sequence of those PRs' file-transfer and per-file-delete calls. Only the top commit (feat(library): "Split collection"…) is new here; the diff collapses to just it as the base PRs merge.

What

A Split Collection action on the audiobook detail page for multi-book dumps imported onto one record (the motivating live case: 773 files spanning 34 books on a single record).

  • GET /library/{id}/split/preview (LibrarySplitPreviewWorkflow, read-only) clusters the record's files into per-book groups:
    1. Subdirectory — the strongest signal, wins even over embedded tags (trusting a shared mis-tag across folders merged genuinely different books).
    2. Embedded Album/Title tag for flat files — a collection bulk-renamed to the parent record's name has useless filenames, but each file's tag still names the real book.
    3. Filename stem with track/list numbering stripped (friday-01_77.mp3 … friday-44_77.mp3 → one friday group). The numbering style stays part of a group's identity so two copies of one book (Title-NN vs Title (N)) yield two groups; a trailing volume designator's own number is preserved so Vol. 1/Vol. 2 don't collapse; an unnumbered file joins its numbered siblings only when duration/size says it plausibly is one track.
  • Each group gets a suggested destination: same-author records first, longest normalized-title containment wins (SplitDestinationSuggester + a new punctuation-tolerant TitleMatcher.Normalize).
  • Applying is client-driven: per group, move (existing transfer endpoint), delete (existing per-file delete, behind a danger confirm), or leave — with per-group progress and a summary toast.
  • Determinism: only the raw ffprobe runs in parallel (bounded ×4); higher-level metadata reads share a request-scoped, non-thread-safe DbContext, and probing those concurrently made the same file read its tag on one run and blank on the next — i.e. non-deterministic clusters. ffprobe unavailability degrades gracefully to path/stem clustering.

Tests

FileClustering is pure and deterministic; its unit tests use real filename shapes from the live collection that motivated the feature (subdirectory-wins, stacked markers, volume preservation, duplicate-copy separation, unnumbered-first-track merge, embedded-tag clustering). Full suite: 1056/1056 passing. vue-tsc + eslint + prettier clean.

🤖 Generated with Claude Code

kevinheneveld and others added 3 commits July 1, 2026 17:39
When files actually belong to a different book the library already
tracks (two books' tracks imported onto one record, or a collection
being split into its real books), a "Move Files to Another Book" action
on the detail page opens a transfer dialog with per-file checkboxes and
a library search ranked by token overlap.

POST /library/{id}/files/transfer: DB ownership reassigns always; the
physical file moves into the destination folder best-effort (failures
leave it in place with a warning — Organize can relocate later).
Collisions with a row the target already owns at the same path are
detected BEFORE any disk move and skipped with a warning, and any
per-file DB failure becomes a warning rather than aborting the whole
transfer. A source left without audio gets its legacy single-file
columns reset; history entries land on both records.

Row reassignment uses a targeted IAudiobookFileRepository.ReassignAsync
(detached stub, only the reassigned columns marked modified) so bulk
transfers can't trip EF identity-map conflicts on overlapping
navigation graphs.

Warning/error toasts are now sticky by default (dismissed via the close
button) — transfer warnings carry information the user needs to read
and act on; info/success keep the 5s auto-dismiss.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…k detail page

Deleting duplicates or moving a subset of files one at a time doesn't
scale (e.g. a duplicated book: the same audio under two different
filename schemes on one record). Add a selection checkbox to each file
row (shift-click extends a range) and a toolbar with select-all, "Move
selected…" and "Delete selected". Bulk move hands the selection to
TransferFilesModal via its initialFileIds preselection; bulk delete
loops a per-file delete behind one confirm with a shared "also delete
from disk" option and a summary toast. Selection clears on reload.

The per-file delete this rides on is new here too:
DELETE /library/{id}/files/{fileId}?deleteFromDisk= — removes the
AudiobookFile row (ownership guarded: the file must belong to the
addressed audiobook), optionally deletes the file from disk
(disk-delete failures surface as warnings; the row is still removed),
and records a "File Removed" history entry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s real books

A multi-book dump imported onto one record (live case: 773 files
spanning 34 books) previously required manual filesystem surgery. A new
"Split Collection" action clusters the record's files into per-book
groups and lets each group be moved to the record it belongs to,
deleted (a redundant duplicate copy), or left alone.

GET /library/{id}/split/preview (read-only): clusters by subdirectory
first (strongest signal), then embedded Album/Title tag for flat files
(a bulk-renamed collection has useless filenames but the tags still
name the real book), then filename stem with track/list numbering
stripped. The numbering STYLE stays part of a group's identity so two
copies of one book ("Title-NN" vs "Title (N)") yield two groups; a
trailing volume designator's number is kept so volumes don't collapse;
an unnumbered file merges into its numbered siblings only when
duration/size says it plausibly IS one track. Only the raw ffprobe runs
in parallel — higher-level metadata reads share a non-thread-safe
DbContext and probing them concurrently made clusters
non-deterministic. Each group gets a suggested destination: same-author
records first, longest normalized-title containment wins
(SplitDestinationSuggester + TitleMatcher.Normalize).

Applying is client-driven: a sequence of the existing file-transfer and
per-file-delete calls per group, with per-group progress and a summary
toast. FileClustering is pure and deterministic, covered by unit tests
with fixtures from the live collection that motivated the feature.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant