Skip to content

fix(vm): allow disk updates when VM is stopped#2407

Open
hardcoretime wants to merge 1 commit into
mainfrom
fix/vm-allow-disk-update-when-vm-stopped
Open

fix(vm): allow disk updates when VM is stopped#2407
hardcoretime wants to merge 1 commit into
mainfrom
fix/vm-allow-disk-update-when-vm-stopped

Conversation

@hardcoretime
Copy link
Copy Markdown
Contributor

Description

Fixed the issue where disk changes on a stopped VirtualMachine with WaitForFirstConsumer (wffc) volume binding mode were not being applied to the KVVM (intvirtvm) resource.

Changes made to pkg/controller/kvbuilder/kvvm_utils.go:

  1. Added isVmRunning check to the paravirtualization condition in applyBlockDeviceRefs to allow new disks when VM is stopped
  2. Updated cleanupRemovedStaticDisks function to accept isVmRunning parameter and remove all disks unconditionally when VM is stopped (not just non-hotpluggable ones)

Added unit tests in pkg/controller/kvbuilder/kvvm_utils_test.go to verify:

  • When VM is stopped: all disks not in VM spec are removed (regardless of hotpluggable flag)
  • When VM is running: only non-hotpluggable disks not in VM spec are removed

Why do we need it, and what problem does it solve?

The DataExport e2e test was failing when using storage class with WaitForFirstConsumer (wffc) volume binding mode. The scenario:

  1. Create a VM with one disk (vd-data)
  2. Stop the VM
  3. Run DataExport which:
    • Removes the original disk (vd-data)
    • Creates new disks from snapshot/source (vd-restored-from-disk, vd-restored-from-snapshot)
  4. Start the VM

The VM got stuck in starting phase because:

  • Old disk (vd-data) was NOT removed from KVVM (because it was marked as hotpluggable)
  • New restored disks were NOT added to KVVM (because paravirtualization was enabled and VM had existing volumes)

This happened because the code was designed for running VMs with hotplug handler, not for stopped VMs where changes should be applied directly.

What is the expected result?

When a VM is stopped and disks are changed in the VM spec (disks added/removed), the KVVM resource should be updated accordingly. After the fix:

  • Old disks not in spec are removed from KVVM (when VM is stopped)
  • New disks in spec are added to KVVM (when VM is stopped)
  • For running VMs, the behavior remains unchanged (hotplug handler manages disk changes)

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: vm
type: fix
summary: "Fixed disk update for stopped VMs with WaitForFirstConsumer storage class. Previously, when a VM was stopped and disks were changed, the KVVM was not updated, causing the VM to get stuck in starting phase."

@hardcoretime hardcoretime requested review from loktev-d and removed request for Isteb4k and yaroslavborbat May 26, 2026 23:19
@hardcoretime hardcoretime added this to the v1.9.0 milestone May 26, 2026
When a VM with paravirtualization is stopped and disks are changed in
VM spec, the KVVM was not being updated. This caused the old disks
to remain and new disks to not be added.

Additionally, added wait for 'lsblk' command to be ready in the
block_device_hotplug e2e test to prevent intermittent failures.

Changes:
- Add isVmRunning check to paravirtualization condition in applyBlockDeviceRefs
  to allow new disks when VM is stopped
- Update cleanupRemovedStaticDisks to remove all disks unconditionally
  when VM is stopped
- Add unit tests for cleanupRemovedStaticDisks with isVmRunning parameter
- Add util.UntilGuestCommandsReady for lsblk in block_device_hotplug test

Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
@hardcoretime hardcoretime force-pushed the fix/vm-allow-disk-update-when-vm-stopped branch from 0622ef2 to 01e81fd Compare May 26, 2026 23:32
@hardcoretime hardcoretime added the e2e/run Run e2e test on cluster of PR author label May 27, 2026
@deckhouse-BOaTswain
Copy link
Copy Markdown
Contributor

Workflow has started.
Follow the progress here: Workflow Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

e2e/run Run e2e test on cluster of PR author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants