fix(design): correct kubevirt datamover restore workflow#2255
Conversation
Fix four issues in the qemu-img restore workflow validated against synthetic qcow2 files matching KubeVirt VMBackup output format: 1. Fix rebase order for 3+ checkpoints — each incremental must rebase onto its parent (chain), not all onto the full backup (flat). Flat rebase loses intermediate incremental changes (silent data loss). 2. Add required -F qcow2 backing format flag — modern qemu-img (v10.1+) fails without it. 3. Eliminate unnecessary dd step — qemu-img convert writes directly to block devices, removing the intermediate raw file and saving scratch space equal to the full virtual disk size. 4. Resolve FIXME on -u flag — unsafe rebase is correct because standalone qcow2 files from VMBackup have no backing file set, but the downloader must validate chain completeness and qcow2 headers before rebasing since -u skips all validation. Validated with test script at: https://gist.github.com/kaovilai/5ac5d2563d4d3c60be090475f2ac2c06 Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
Caution Review failedAn error occurred during the review process. Please try again later. Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughThe design document updates the KubeVirt datamover use case and rewrites the restore workflow to validate qcow2 chains, rebase incrementals in order, convert directly to the target block device, and delete scratch files. ChangesKubeVirt datamover documentation
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes 🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Add inline references to the corrected restore workflow linking to: - KubeVirt VEP "Restore from Backup" section (chained rebase approach) - QEMU 6.1 changelog and qemu-img docs (-F flag requirement) - Validation gist with test evidence Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
- Specify that expected chain comes from per-VM manifest checkpointChain - Reword "valid qcow2 header" to "readable qcow2 image" to match what the qemu-img info check actually proves Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Replace vague "ensure not using CSI snapshots or fs-backup" with concrete guidance: set volume policy action to "kubevirt" for VM PVCs, which the BIA plugin uses to route volumes to the kubevirt datamover. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
| - Volume backups will be incremental when possible (first backup for a given volume will be full, subsequent backups for that same volume will be incremental) | ||
| - Assuming that kubevirt incremental volume backups should be much faster than CSI snapshots followed by incremental kopia snapshot copy, the expectation is that users might run kubevirt datamover backups more frequently than they would for CSI-based backups. | ||
| - If users are backing up more frequently with this method, they should ensure that they are not using CSI snapshots or fs-backup via the resource/volume policy configuration in the backup. | ||
| - Users should set the volume policy action to `"kubevirt"` for VM PVCs in the backup's resource/volume policy configuration. This prevents Velero from also taking CSI snapshots or fs-backups of the same volumes (the BIA plugin skips any PVC whose action is not `"kubevirt"` or `"skip"`). |
There was a problem hiding this comment.
| - Users should set the volume policy action to `"kubevirt"` for VM PVCs in the backup's resource/volume policy configuration. This prevents Velero from also taking CSI snapshots or fs-backups of the same volumes (the BIA plugin skips any PVC whose action is not `"kubevirt"` or `"skip"`). | |
| - Users should set the volume policy action to `"custom"` with `"datamover: kubevirt"` in the action parameters for VM PVCs in the backup's resource/volume policy configuration. This prevents Velero from also taking CSI snapshots or fs-backups of the same volumes (the BIA plugin skips any PVC whose action is not `"kubevirt"` or `"skip"`). |
There was a problem hiding this comment.
The "kubevirt"->"custom" change is made elsewhere in the design doc in #2254
Apply Scott's review suggestion — volume policy action is "custom" with "datamover: kubevirt" in action parameters, not a bare "kubevirt" action. This aligns with the volume policy changes in PR openshift#2254. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
/lgtm |
|
/hold for more review time. will unhold in a bit. |
|
@kaovilai: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai, shubham-pampattiwar, sseago The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
Summary
Fixes four issues in the kubevirt datamover design doc's qemu-img restore workflow, validated by manually testing with synthetic qcow2 files matching KubeVirt VMBackup output format (standalone files, no backing file references):
Fix rebase order for 3+ checkpoints (CRITICAL) — each incremental must rebase onto its parent in chain order (
inc1→full,inc2→inc1), not all onto the full backup. Flat rebase loses intermediate incremental changes (silent data loss).Add required
-F qcow2backing format flag — modern qemu-img (v10.1+ in CentOS Stream 9) fails without it:"backing format must be specified".Eliminate unnecessary
ddstep —qemu-img convertwrites directly to block devices, removing the intermediate raw file and saving scratch space equal to the full virtual disk size.Resolve FIXME on
-uflag — unsafe rebase is correct because standalone qcow2 files from VMBackup have no backing file set. However,-uskips all validation, so the downloader must validate chain completeness and qcow2 headers before rebasing.Validation
Test script and detailed findings: https://gist.github.com/kaovilai/5ac5d2563d4d3c60be090475f2ac2c06
Test plan
docs/design/kubevirt-datamover.md🤖 Generated with Claude Code
Summary by CodeRabbit
qemu-imgguidance.