Skip to content

fix(design): correct kubevirt datamover restore workflow#2255

Merged
openshift-merge-bot[bot] merged 5 commits into
openshift:oadp-devfrom
kaovilai:fix/kubevirt-datamover-restore-workflow
Jun 25, 2026
Merged

fix(design): correct kubevirt datamover restore workflow#2255
openshift-merge-bot[bot] merged 5 commits into
openshift:oadp-devfrom
kaovilai:fix/kubevirt-datamover-restore-workflow

Conversation

@kaovilai

@kaovilai kaovilai commented Jun 25, 2026

Copy link
Copy Markdown
Member

Summary

Fixes four issues in the kubevirt datamover design doc's qemu-img restore workflow, validated by manually testing with synthetic qcow2 files matching KubeVirt VMBackup output format (standalone files, no backing file references):

  1. Fix rebase order for 3+ checkpoints (CRITICAL) — each incremental must rebase onto its parent in chain order (inc1→full, inc2→inc1), not all onto the full backup. Flat rebase loses intermediate incremental changes (silent data loss).

  2. Add required -F qcow2 backing format flag — modern qemu-img (v10.1+ in CentOS Stream 9) fails without it: "backing format must be specified".

  3. Eliminate unnecessary dd stepqemu-img convert writes directly to block devices, removing the intermediate raw file and saving scratch space equal to the full virtual disk size.

  4. Resolve FIXME on -u flag — unsafe rebase is correct because standalone qcow2 files from VMBackup have no backing file set. However, -u skips all validation, so the downloader must validate chain completeness and qcow2 headers before rebasing.

Validation

Test script and detailed findings: https://gist.github.com/kaovilai/5ac5d2563d4d3c60be090475f2ac2c06

Test Result
VEP chain rebase (inc1→full, inc2→inc1) PASS — md5sums match
Design doc flat rebase (all→full) FAIL — intermediate changes lost
Direct convert to target (skip dd) PASS — dd step unnecessary
Missing file in chain Silent data loss — rebase -u succeeds but output wrong
Corrupt qcow2 header Not detected — rebase -u returns exit code 0

Test plan

  • Review updated restore workflow steps in docs/design/kubevirt-datamover.md
  • Verify rebase commands match KubeVirt VEP
  • Confirm validation requirements are documented before rebase step

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Updated the kubevirt datamover “DataDownload reconciler (restore)” workflow to use an explicit qcow2 chained restore approach.
    • Documented restoring by downloading required qcow2 files, pre-validating each intermediate in the chain, rebasing incrementals onto their immediate parent, and converting the top-of-chain qcow2 directly to the target device.
    • Clarified safer restore behavior to reduce risk of silent intermediary loss/corruption, including relevant qemu-img guidance.
    • Added a concrete Velero volume policy configuration for VM PVCs to prevent overlapping snapshot/backup actions.

Fix four issues in the qemu-img restore workflow validated against
synthetic qcow2 files matching KubeVirt VMBackup output format:

1. Fix rebase order for 3+ checkpoints — each incremental must rebase
   onto its parent (chain), not all onto the full backup (flat). Flat
   rebase loses intermediate incremental changes (silent data loss).

2. Add required -F qcow2 backing format flag — modern qemu-img (v10.1+)
   fails without it.

3. Eliminate unnecessary dd step — qemu-img convert writes directly to
   block devices, removing the intermediate raw file and saving scratch
   space equal to the full virtual disk size.

4. Resolve FIXME on -u flag — unsafe rebase is correct because
   standalone qcow2 files from VMBackup have no backing file set, but
   the downloader must validate chain completeness and qcow2 headers
   before rebasing since -u skips all validation.

Validated with test script at:
https://gist.github.com/kaovilai/5ac5d2563d4d3c60be090475f2ac2c06

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Caution

Review failed

An error occurred during the review process. Please try again later.

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a4363f39-3338-4477-8bce-9d079e0d9724

📥 Commits

Reviewing files that changed from the base of the PR and between de99360 and ba71534.

📒 Files selected for processing (1)
  • docs/design/kubevirt-datamover.md
✅ Files skipped from review due to trivial changes (1)
  • docs/design/kubevirt-datamover.md

Walkthrough

The design document updates the KubeVirt datamover use case and rewrites the restore workflow to validate qcow2 chains, rebase incrementals in order, convert directly to the target block device, and delete scratch files.

Changes

KubeVirt datamover documentation

Layer / File(s) Summary
PVC policy guidance
docs/design/kubevirt-datamover.md
The use-case text adds a Velero volume policy action of custom with datamover: kubevirt for VM PVCs.
Chained qcow2 restore flow
docs/design/kubevirt-datamover.md
The restore procedure now validates checkpoint-chain qcow2 files, rebases incrementals onto their parents, converts the top qcow2 directly to the target block device, and deletes scratch files, with updated backing-format and validation references.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately summarizes the main change: correcting the kubevirt datamover restore workflow.
Description check ✅ Passed The description covers the motivation, validation, and test plan, though it uses different headings than the template.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PASS: The PR touches a design doc only; that file contains no Ginkgo test titles or dynamic test-name patterns.
Test Structure And Quality ✅ Passed The PR only changes docs/design/kubevirt-datamover.md; no Ginkgo test code or test files are modified, so this test-quality check is not applicable.
Microshift Test Compatibility ✅ Passed PR only updates docs/design/kubevirt-datamover.md; no new Ginkgo e2e tests or MicroShift-sensitive APIs/features were added.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Docs-only change in docs/design/kubevirt-datamover.md; no new Ginkgo e2e tests were added, so SNO cluster assumptions are not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed Only docs/design/kubevirt-datamover.md changed; no manifests, operator code, or controllers were modified, so no topology-aware scheduling risk is introduced.
Ote Binary Stdout Contract ✅ Passed No stdout writes in process-level code; main/init/BeforeSuite use stderr/GinkgoWriter, and remaining fmt.Printf calls are inside spec/helper bodies.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PASS: The PR only edits docs/design/kubevirt-datamover.md; no Ginkgo e2e tests or runtime networking code were added, so the IPv6/disconnected check is not applicable.
No-Weak-Crypto ✅ Passed Changed doc adds qemu-img restore steps only; no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons found.
Container-Privileges ✅ Passed Only docs/design/kubevirt-datamover.md changed; no container/K8s manifests or privilege settings were added.
No-Sensitive-Data-In-Logs ✅ Passed Only the design doc was updated; scans found no passwords/tokens/secrets/PII or log statements, just generic placeholder examples.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

kaovilai and others added 2 commits June 25, 2026 16:19
Add inline references to the corrected restore workflow linking to:
- KubeVirt VEP "Restore from Backup" section (chained rebase approach)
- QEMU 6.1 changelog and qemu-img docs (-F flag requirement)
- Validation gist with test evidence

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
- Specify that expected chain comes from per-VM manifest checkpointChain
- Reword "valid qcow2 header" to "readable qcow2 image" to match what
  the qemu-img info check actually proves

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
@openshift-ci openshift-ci Bot requested review from mpryc and sseago June 25, 2026 20:26
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2026
Replace vague "ensure not using CSI snapshots or fs-backup" with
concrete guidance: set volume policy action to "kubevirt" for VM PVCs,
which the BIA plugin uses to route volumes to the kubevirt datamover.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Comment thread docs/design/kubevirt-datamover.md Outdated
- Volume backups will be incremental when possible (first backup for a given volume will be full, subsequent backups for that same volume will be incremental)
- Assuming that kubevirt incremental volume backups should be much faster than CSI snapshots followed by incremental kopia snapshot copy, the expectation is that users might run kubevirt datamover backups more frequently than they would for CSI-based backups.
- If users are backing up more frequently with this method, they should ensure that they are not using CSI snapshots or fs-backup via the resource/volume policy configuration in the backup.
- Users should set the volume policy action to `"kubevirt"` for VM PVCs in the backup's resource/volume policy configuration. This prevents Velero from also taking CSI snapshots or fs-backups of the same volumes (the BIA plugin skips any PVC whose action is not `"kubevirt"` or `"skip"`).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Users should set the volume policy action to `"kubevirt"` for VM PVCs in the backup's resource/volume policy configuration. This prevents Velero from also taking CSI snapshots or fs-backups of the same volumes (the BIA plugin skips any PVC whose action is not `"kubevirt"` or `"skip"`).
- Users should set the volume policy action to `"custom"` with `"datamover: kubevirt"` in the action parameters for VM PVCs in the backup's resource/volume policy configuration. This prevents Velero from also taking CSI snapshots or fs-backups of the same volumes (the BIA plugin skips any PVC whose action is not `"kubevirt"` or `"skip"`).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "kubevirt"->"custom" change is made elsewhere in the design doc in #2254

Apply Scott's review suggestion — volume policy action is "custom" with
"datamover: kubevirt" in action parameters, not a bare "kubevirt" action.
This aligns with the volume policy changes in PR openshift#2254.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
@kaovilai

Copy link
Copy Markdown
Member Author

Note

Responses generated with Claude

Applied Scott's suggestion in ba71534 — updated to "custom" with "datamover: kubevirt" to align with the volume policy changes in #2254.

@sseago

sseago commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

/lgtm

@kaovilai

Copy link
Copy Markdown
Member Author

/hold for more review time. will unhold in a bit.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 25, 2026
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 25, 2026
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

@kaovilai: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, shubham-pampattiwar, sseago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [kaovilai,shubham-pampattiwar,sseago]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kaovilai

Copy link
Copy Markdown
Member Author

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 25, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 4d450a0 into openshift:oadp-dev Jun 25, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants