feat: media-use v2 (heygen-CLI-only media OS) + non-destructive studio crop#1923
Open
miguel-heygen wants to merge 43 commits into
Open
feat: media-use v2 (heygen-CLI-only media OS) + non-destructive studio crop#1923miguel-heygen wants to merge 43 commits into
miguel-heygen wants to merge 43 commits into
Conversation
Ordered, capability-based registry — search/generate/process slots, tried deterministically with heygen-CLI first. Disabled slots (fal aggregator, Iconify, voice TTS, local) carry their gate reason (B-Q1/B-Q2/U2) so enabling them later is a flag flip, not a refactor. resolve cascade now registry-driven; providers.mjs kept as a back-compat shim. Regenerated skills-manifest.
specs.mjs probes CPU/RAM/GPU/VRAM (Apple Silicon unified memory + nvidia-smi), injectable for tests. local-models.mjs is a declarative table of user-installed models per capability+tier (tts/asr/upscale) with size/needs/install/invoke; selectModel picks the highest runnable tier or recommends the CLI path. No license gate — models are user-installed, local-use-only.
fal aggregator + ElevenLabs/HeyGen-TTS voice providers built as real CLI adapters, registered default-OFF and flipped on by env flag (MEDIA_USE_ENABLE_*) read at call time. Bin's answer becomes a flag, not a code change. Tests prove off-by-default + flip-on ordering (heygen stays first).
runLocalModel(capability) picks the best tier for the machine (selectModel), checks the tool is on PATH, fills + runs its invoke template, returns the output path — or a clear install/CLI recommendation when not runnable. Injectable which/exec; never throws. Consumed by the process verb.
…ion (#11, #16) Auto-promote every fetched asset into the global content-addressed cache so it's reusable across all hyperframes projects (cachePut now dedups by sha; resolve promotes on register, non-fatal). search.mjs: paginated text search (no vector DB). usage.mjs: tag/partition assets by in-use vs unused (scans compositions) — powers the Studio Asset-tab filter.
resolve --from <file|direct-url> freezes a user-supplied asset + registers + auto-promotes it. isDirectMediaUrl rejects platform pages (no yt-dlp). brand resolve with no frame.md/design.md now upsells the HyperFrames design flow instead of a generic miss.
references/operations.md: local-tool recipes (ffmpeg cut/reframe/montage, auto-editor, scenedetect) + local-vs-HeyGen transform table (bg-removal/upscale/ lipsync/translate) with side-by-side guidance. Outputs register via --from (auto-promoted). Per OP1, media-use guides to the tools rather than re-wrapping ffmpeg/heygen as bespoke verbs.
Adds 'In use' / 'Unused' filter chips alongside the category filters, reusing the existing usedPaths (composition references) the in-use badge + used-first sort already compute. filterByUsage/countUsage extracted as pure functions + unit tested (6 vitest). Chips show only on a real used/unused split (no misleading 0s). Verified in agent-browser: Studio boots, project loads, Asset tab renders + filter path runs. Note: chip display binds to usePlayerStore.elements (same source as the existing badge); that store was empty in a synthetic dev project where the timeline is driven by the iframe __clipManifest — confirm/realign usedPaths source in real studio usage (follow-up; affects badge+sort+filter equally).
deriveUsedPaths extracted + hardened: element src is the raw authored value, so it can be relative (assets/x.png), ./-prefixed, the served /api/.../preview/ form, or carry a ?query — normalize all to the bare project path so it matches the asset-list entries (the in-use badge / sort / filter all key on this). The prior inline strip only handled the served form. Source stays usePlayerStore.elements (correct — hydrated from the timeline/__clipManifest); not switched. +2 vitest.
Premise: Bin approved B-Q1 (voice/TTS) + B-Q2 (third-party CLIs). fal (bgm/sfx/ image generate) and voice (ElevenLabs + heygen tts) flip from gated to live, marked paid. Cost guard (X4) ships with them: runProviders skips paid providers unless ctx.allowPaid; resolve gains --allow-paid (opt-in) + --local-only (block). Free heygen catalog stays first. Iconify stays gated (not named). SKILL providers table + flags documented. Tests updated (43 green). Skill consolidation/deletion (rest of #15) deliberately NOT done — separate, not implied by approval.
Move the audio engine (audio.mjs + lib + heygen-tts + wait-bgm + lyria-recipe + bundled SFX + references) as a self-contained subtree into skills/media-use/audio/ via git mv — zero internal path edits (HERE/../assets/sfx + lyria offsets preserved). CLI primitives still wrapped, not moved. Smoke test proves the bundled SFX library resolves from the new location. hyperframes-media now holds only SKILL.md (retired in U22).
…e/audio (U21) faceless-explainer / pr-to-video / product-launch-video wrappers: DEFAULT_ENGINE → media-use/audio/scripts/audio.mjs (engine resolves from each, verified). Fixed the relocated engine docs' stale self-paths (tts.md) + an inaccurate attribution comment. Grep gate: no skills/ reference to hyperframes-media/scripts remains (except general-video doc routing, swept in U22).
Delete the hyperframes-media skill (engine relocated in U20). Sweep all 32 referencing files → media-use: engine paths → media-use/audio/scripts, reference links → media-use/audio/references, skill-name/routing mentions → /media-use. Catalog dedup: drop the duplicate audio row from README/CLAUDE/skills.mdx (the single media-use row now covers resolve + audio), router + hyperframes-core domain list point to media-use, skill count 19→18. Manifest regenerated (18, hyperframes-media gone). Grep gate clean (only plan docs reference it as history). dist/ artifacts regenerate from source on build.
Merge the two /media-use rows in the hyperframes router capability map into one; fix the remaining 'all 19' install-count mentions in README/CLAUDE/skills.mdx → 18. Follow-on cleanup from the hyperframes-media retirement rename.
Frontmatter reframed to the full media OS (resolve/generate/operate/remember, incl. voice + the audio engine). New Audio engine section documents audio/scripts/audio.mjs, the request/meta contract, the auto-degrade switch, and the relocated per-topic references (tts/bgm/sfx/transcribe/remove-background/ captions). A reader can now do everything hyperframes-media offered from media-use alone.
…U24) Non-project-scoped GET /api/assets/global reads only ~/.media/manifest.jsonl and returns reusable records — the data behind the Studio Asset tab's cross-project view. Skips malformed lines (torn write won't 500). Registered in createStudioApi alongside the other route groups. 3 vitest. AssetsTab toggle next.
SKILL.md gains a 'What it owns' matrix mapping each HyperFrames media gap to its media-use entrypoint. coverage.test.mjs enforces every row — image/icon providers, voice + audio engine, consolidated engine + bundled SFX, ops reference, global cache + ingest, local-model tiers, every resolve type has a provider — so a claim can't silently rot. 7 tests.
This project / All projects toggle in the Asset tab. The global view is a self-contained GlobalAssetsView component (owns its /api/assets/global fetch + render) — keeps AssetsTab under the file-size cap and lowers its complexity. Lists the reusable cross-project cache (type + label); category/usage chips are local-only; search filters both. globalAssetRows pure helper + 3 vitest. Verified live in agent-browser: 'All projects' shows assets from ~/.media.
…cal-only M2: heygen.tts is free and first in the voice cascade; ElevenLabs is the paid fallback, so voice resolves by default without --allow-paid. M3: --local-only now skips every network provider, not just paid ones.
…m API M1: freezeUrl streams the body and aborts at the 256MB cap (and checks content-length first), so a hostile URL can't buffer past the cap into memory. m11: isDirectMediaUrl rejects loopback/private/link-local hosts on --from ingest. m13: /assets/global projects to public fields, no absolute cached_path to the browser.
- voice cascade in SKILL now reflects free HeyGen TTS first, ElevenLabs paid fallback - --local-only documented as 'skip every network provider', not just paid - drop the nonexistent 'organize --promote' CLI; promotion is automatic - point the local-model row at the actual runner (local-run.mjs) without overclaiming auto-cascade - operations.md: clip-path crop is a composition technique, not a shipped Studio feature - SFX library is 19 files, not 21; usage.mjs no longer claims it powers the Studio filter - clarify the faceless/pr-to-video audio adapters intentionally reuse the product-launch model
The voice provider guessed 'heygen voice tts --text', which the CLI rejects. Real command (verified against heygen v0.1.6 --help / --response-schema): 'heygen voice speech create --text <t> --voice-id <id>' returning data.audio_url + data.duration. Default voice-id = first starfish-engine voice (deterministic, overridable via ctx.voiceId). Verified live: a default voice resolve now produces a real audio file with no --allow-paid. Registry header no longer calls fal/voice flag-gated seams — they are live; Iconify is the only remaining gated provider.
Verified against the official docs (not guessed --help): - fal: use the genmedia CLI (genmedia run <model> --prompt .. --download .. --json -> downloaded_files[].path/url), not 'fal run --input'. The pip 'fal' is the serverless-deploy CLI and can't run hosted models. - ElevenLabs: the official @elevenlabs/cli is agents-only (no TTS). Use the community elevenlabs-cli (tts "<text>" --output <file>): positional text, file output. Both download a local file, returned as localPath for freezing. SKILL providers section now names the correct binaries + install/auth. Live generation still needs FAL_KEY / ELEVENLABS_API_KEY (absent here).
…mes CLI The 'When to use' line credited transcription and background removal to the audio engine; they actually come from the hyperframes CLI (transcribe, remove-background), as the audio-engine section already states. The engine itself does TTS / BGM / SFX / caption timing.
- Cost: paid generation is the agent's call, no user-confirmation prompt
(free-first still tries cache + heygen before anything paid).
- Local models: heygen free-usage is the default (TTS, bg-removal via the
heygen CLI); local open-source is the opt-out fallback only ('if user no,
then local'), not the headline answer. Coverage test description updated to
match while still asserting the fallback table is populated.
…nd --allow-paid Remove the third-party CLI adapters (fal via genmedia, ElevenLabs via elevenlabs-cli) and the paid-provider cost guard entirely; the heygen CLI is the only external CLI media-use shells. --local-only stays. A guard test asserts no fal/elevenlabs provider can silently return.
With fal/ElevenLabs gone and Iconify never enabled, every provider is always-on heygen (or the local design spec) — remove the envFlag/gated enablement layer entirely. The guard test now asserts only heygen / design_spec providers exist.
Crop a selected element visually: a Crop toggle in the Clip panel arms edge handles on the overlay; dragging live-previews clip-path inset on the iframe element and commits once through the existing inline-style patch path (undo/persist like any style edit). Per-side T/R/B/L inset fields join the Clip section; radius is preserved, insets clamped. Render-time only: the media file is untouched. Clip-path helpers move to clipPathHelpers.ts (re-exported) with 1- and 4-value inset parse/format support.
…-line handles, dim, hug Crop mode now behaves like an NLE crop tool: - Canvas toolbar gets a Crop button (shown whenever a croppable element is selected); crop mode lives in the player store so the toolbar, the Clip panel, and the overlay share one switch. - Double-click a selected element to enter crop mode (timestamp-based: selection re-keys the box on every click, so native dblclick never fires); Escape exits. - While cropping, the element's clip is lifted so the full content stays visible; the cropped-out region is dimmed and a crop frame with edge handles sits ON the crop lines (handles track the insets as you drag). - Drag end commits through the normal style path (one undo step per drag); leaving crop mode re-applies the committed crop. - Outside crop mode the selection box hugs the visible cropped region instead of the full element bounds.
The crop frame is now a drag surface: dragging it slides the whole visible region — opposing insets shift together so the crop size stays constant, clamped inside the element bounds. Same commit path as the edge handles (one undo step per drag).
…om the visible region Box-originated gestures (drag/resize/rotate) took their origin from the full-bounds overlayRect and wrote box.style directly, so after a drag the box sat at the element origin instead of hugging the crop. Pass the hugged rect (full OverlayRect shape) as the gesture's box geometry — the box now tracks the visible region through the whole gesture, and rotation centers on what's actually on screen.
…of moving it Gestures write the selection box's position directly during drags, so rendering the box at the hugged (cropped) rect made two writers disagree: after any drag the box parked at the full-bounds position (or shrank twice mid-drag). Keep the box at the element's full bounds — the basis the gesture machinery owns — and render the hug as the element's inset clip-path scaled into overlay space and applied to the box itself (the pre-existing pattern for clipped elements). The resize handle shifts to the visible region's corner so the clip doesn't swallow it.
…, rotate handle anchors to crop - The clip-on-box hug swallowed the box border everywhere the crop edge didn't touch the element edge (a heavy crop showed no outline at all). Drop the clip: the box stays border-less at full bounds and a child div draws the outline ON the crop boundary. - Clicking outside the element while crop mode is armed exits crop mode (selection kept) — crop UI swallows its own pointerdowns, so any pointerdown reaching the overlay is an outside click. - The rotate handle (extracted to DomEditRotateHandle) anchors to the crop outline instead of floating above the invisible full bounds.
The hover outline drew the element's full bounds, ignoring the crop. Shrink it with the same inset hug the selection box uses (display-only rect, nothing writes back to it).
Add toVisibleOverlayRect (projected rect shrunk by the element's inset crop) and use it wherever the overlay reasons about what's ON SCREEN: - snap targets: other elements' guides align to their visible edges - snap moving rect: dragging a cropped element snaps its visible edges - marquee: hit-test against the visible region, not cropped-out space - child dashes + group item rects (display; group commits use dx/dy) - off-canvas indicators: a crop that keeps the visible part on-canvas no longer flags the element as off-canvas The selection box keeps the full rect — it is the gesture coordinate basis; its hug is the crop-outline child.
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
| mkdirSync(dirname(destPath), { recursive: true }); | ||
| writeFileSync(destPath, bytes); | ||
| return bytes.length; | ||
| writeFileSync(destPath, Buffer.concat(chunks, total)); |
Two joins of capabilities the framework already ships: - transcript-cut: edit footage by editing its transcript. Word-level timestamps (hyperframes transcribe) + agent-decided removals (explicit ranges, word indices, filler words, long silences, or inverse --keep) compile into frame-accurate ffmpeg segments + concat; --plan dry-run, --copy fast mode. Output re-enters the ledger via resolve --from. - audio-duck: BGM ducking as declaration, not baking. Speech spans from audio_meta.json / transcribe.json word timestamps become a ready GSAP volume-tween block (attack/release shaped, sentence gaps bridged); the runtime already renders volume tweens identically in preview + render. - operations.md: text-based editing, bake-for-export sidechain duck, and two-pass loudnorm publish-loudness recipes. - SKILL.md gaps table + coverage test extended for both claims. Pure cores (compileCutList, speechSpans/duckKeyframes) are unit-tested; e2e cut verified by ffprobe (6s clip minus 1s cut = 5.02s output).
…rsing CodeQL flagged the inset() matcher (nested optional whitespace around a lazy group). Match the parenthesized payload unambiguously, trim, and collapse whitespace before the round-radius split.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two related tracks, cleanly separable by path (can split on request):
media-use v2 (
skills/media-use/)Implements the provider-CLIs decisions: one skill for every media need, heygen-CLI-only, free-first.
heygenCLI is the only external CLI media-use shells; no third-party CLIs, no--allow-paid.bgm,sfx,image,icon,voice(HeyGen TTS, free) + localbrand(frame.md/design.md tokens with design-flow upsell).hyperframes-mediaskill retired; the shared audio engine (TTS/BGM/SFX/captions) now lives undermedia-use/audio/. All catalog surfaces updated in lockstep (20 skills).~/.media), auto-promote, cache-on-first determinism,resolve --fromingest (local files + direct public URLs only, SSRF-guarded),--local-onlyoffline mode.resolve --from./api/assets/globalroute.Non-destructive Studio crop (
packages/studio/, decision OP2)Crop any element visually; persists as
clip-path: inset(...)through the normal style-commit path (one undo step per drag, original media untouched).toVisibleOverlayRectprimitive.Testing
Not in scope (parked by decision)