Skip to content

[WIP] Preserve UEFI firmware across hibernation in OpenHCL#3771

Open
mebersol wants to merge 16 commits into
microsoft:mainfrom
mebersol:user/mebersol/openhcl-hibernation-firmware
Open

[WIP] Preserve UEFI firmware across hibernation in OpenHCL#3771
mebersol wants to merge 16 commits into
microsoft:mainfrom
mebersol:user/mebersol/openhcl-hibernation-firmware

Conversation

@mebersol

Copy link
Copy Markdown
Collaborator

Summary

Adds support for preserving the exact UEFI firmware image across a hibernate/resume cycle for hibernation-enabled (non-isolated) VMs in OpenHCL. This guarantees a resumed guest sees an identical firmware-provided view (ACPI/SMBIOS/memory layout) even if the host's OpenHCL/UEFI changed between the original cold boot and the resume.

Motivation

A hibernating guest captures OS state that assumes a specific firmware-provided hardware view. If the firmware image differs after resume, that view can shift and break the resumed guest. Legacy HCL guards against this with a firmware-version "power token"; this change goes further and snapshots the actual firmware image into VMGS so the resumed guest is restored bit-for-bit.

What it does

  • VMGS layout: Renames FileId 14 as HIBERNATION_TOKEN (an 8-byte little-endian token) and adds FileId 19 HIBERNATION_FIRMWARE to hold the firmware image snapshot. vmgstool understands both names.
  • Cold boot: Snapshots the pristine UEFI firmware image (as loaded from IGVM, before the dynamic config blob is written into VTL0) from guest RAM into the HIBERNATION_FIRMWARE VMGS file.
  • Hibernate: Writes the hibernate token recording that matching VHD is in hibernated state.
  • Resume: If the token indicates a firmware image was stored, restores it from VMGS into VTL0 guest RAM. The token is cleared on every resume so a later clean boot never restores a stale image.
  • vmgs_broker: Adds delete_file to the broker/client so the halt task and boot path can manage the token concurrently with the worker.

Behavior / logging details

  • The 32 MB VMGS minimum-size gate is enforced; if the backing store is too small to hold the firmware image, hibernation simply won't preserve the firmware (logged as a warning, not an error).
  • On resume, whether to attempt a restore is gated on the hibernate token value:
    • Token indicates a firmware image was stored → restore; a failure here is a real error.
    • Token indicates no image was stored → skip the restore and log a warning.
  • The hibernate token is only written/cleared when hibernation is enabled; power off / reset clear it (to NONE) only in that case.

Scope / limitations

  • Targets non-isolated VMs. CVM/isolated support is deferred (requires additional measurement/attestation design).

Testing

  • cargo clippy --all-targets, cargo doc --no-deps, and cargo xtask fmt pass for the modified crates.

mebersol added 16 commits June 17, 2026 09:09
…N_FIRMWARE

Phase 0 of OpenHCL hibernation firmware support. Repurposes FileId 14 as an 8-byte power token (HIBERNATION_TOKEN) and introduces FileId 19 (HIBERNATION_FIRMWARE) to hold a snapshot of the UEFI firmware image. Updates vmgstool's parse_file_id accordingly.
Mirror the legacy HCL HclPowerServices behavior: write an 8-byte power token to VMGS FileId HIBERNATION_TOKEN on a power transition (consumed/cleared on resume). On hibernate, write u64::MAX when the VMGS backing store is >=32MB (large enough to hold a firmware snapshot) else 0x1; clear to 0x0 on power off/reset. Adds a device_size query to Vmgs, the vmgs broker RPC, and VmgsClient.
For hibernate-enabled VMs, query the VMGS size once at boot and save a flag indicating whether the backing store is large enough (>=32MB) to hold a firmware image snapshot. halt_task uses this saved flag to choose the power token value (u64::MAX vs 0x1) instead of querying on the power-off path. Adds a TODO to actually stash the firmware image.
…se 1)

At cold boot, for hibernate-enabled VMs whose VMGS is large enough, read the pristine UEFI firmware image out of VTL0 guest memory and store it in vmgs::FileId::HIBERNATION_FIRMWARE before any dynamic config is written into the firmware region (i.e. before write_uefi_config). If the snapshot fails or this is not a UEFI boot, the hibernation_firmware_stored flag is downgraded so the hibernate power token reflects that no firmware was stored.
On a cold boot that is resuming a hibernated guest (detected via the HIBERNATION_TOKEN power token == FIRMWARE_STORED), restore the previously snapshotted UEFI firmware image from vmgs::FileId::HIBERNATION_FIRMWARE back into VTL0 guest memory before write_uefi_config runs, so the resumed guest sees an identical firmware binary even if the host firmware changed. The token is consumed (set to NONE) after a successful restore so a later clean boot does not restore stale firmware. Normal boots continue to snapshot the pristine firmware (Phase 1).
Rename snapshot_firmware_to_vmgs -> store_firmware_to_vmgs and restore_firmware_from_vmgs -> load_firmware_from_vmgs. Move VMGS_FIRMWARE_THRESHOLD_BYTES out of the hibernation_token module to a standalone const, since it is not a token value.
Expose Vmgs::delete_file through the vmgs_broker VmgsBrokerRpc/VmgsClient. On hibernation resume, consume the power token by deleting the HIBERNATION_TOKEN file (mirroring legacy HCL dataStore->DeleteFile) instead of overwriting it with a NONE value.
…ection

store_firmware_to_vmgs now verifies the firmware image fits within the VMGS backing store before writing. The resume check treats any non-zero hibernation power token as resuming.
Rename the mod hibernation_token / write_hibernation_token / read_hibernation_token / delete_hibernation_token to power_token throughout and update comments to consistently call it the power token.
Revert the power_token naming to hibernate_token for the mod and helper functions, updating call sites and comments accordingly.
Previously the flag was pre-set from VMGS capacity (whether the store was large enough), which was misleading since it implied firmware was stored before any store/restore happened. Now first determine whether we are resuming or able to store the firmware, perform the operation, and only set hibernation_firmware_stored to true once an image is actually present in VMGS. The redundant VMGS_FIRMWARE_THRESHOLD_BYTES capacity heuristic is removed; store_firmware_to_vmgs already gates on the actual firmware size vs device size.
Re-add the VMGS_FIRMWARE_THRESHOLD_BYTES (32MB) minimum-size requirement inside store_firmware_to_vmgs. This is a required overall-size gate (room for the firmware image alongside other VMGS files), distinct from the tight per-image fit check. Keeping it in the store helper preserves the outcome-driven hibernation_firmware_stored: the caller downgrades the flag to false when the store bails.
@mebersol mebersol requested a review from a team as a code owner June 17, 2026 21:53
Copilot AI review requested due to automatic review settings June 17, 2026 21:53
@smalis-msft

Copy link
Copy Markdown
Contributor

Do we have vmm test coverage of this?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds OpenHCL support for preserving the exact UEFI firmware image across a hibernate/resume cycle by introducing a VMGS “hibernation token” plus a dedicated VMGS file to store a firmware snapshot, and plumbing the required VMGS broker operations.

Changes:

  • Introduces FileId::HIBERNATION_TOKEN (14) and FileId::HIBERNATION_FIRMWARE (19), and updates vmgstool to recognize the new IDs.
  • Adds VMGS backing-store size querying (device_size) and a delete_file VMGS broker RPC to support token consumption/management.
  • Implements Underhill boot/halt logic to snapshot firmware on cold boot, restore on resume, and write/clear tokens across power transitions.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
vm/vmgs/vmgstool/src/main.rs Adds parsing support for HIBERNATION_TOKEN and updates HIBERNATION_FIRMWARE mapping.
vm/vmgs/vmgs/src/vmgs_impl.rs Exposes VMGS backing-store size via Vmgs::device_size().
vm/vmgs/vmgs_format/src/lib.rs Defines new VMGS FileIds for hibernation token and firmware snapshot.
vm/vmgs/vmgs_broker/src/client.rs Adds client APIs for device_size() and delete_file().
vm/vmgs/vmgs_broker/src/broker.rs Adds broker RPC variants/handlers for DeviceSize and DeleteFile.
openhcl/underhill_core/src/worker.rs Adds firmware snapshot/restore + hibernate-token handling in boot path and halt task.

Comment on lines +2272 to +2279
Err(err) => {
tracing::error!(
CVM_ALLOWED,
error = err.as_ref() as &dyn std::error::Error,
"failed to restore UEFI firmware image on hibernation resume"
);
false
}
Comment on lines +4088 to +4095
async fn read_hibernate_token(vmgs_client: &vmgs_broker::VmgsClient) -> Option<u64> {
let buf = vmgs_client
.read_file(vmgs::FileId::HIBERNATION_TOKEN)
.await
.ok()?;
let bytes = <[u8; 8]>::try_from(buf.as_slice()).ok()?;
Some(u64::from_le_bytes(bytes))
}
});
}

let mut firmware = vec![0u8; len as usize];
}
};

// Hibernation firmware compatibility (OpenHCL-only). Determine whether a

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this compatibility hazard not also apply to OpenVMM as the host? Should this code live somewhere more shareable?

/// snapshot for hibernation. This is a minimum overall size so that the
/// firmware image fits alongside the other VMGS files, not just a tight fit of
/// the image itself.
const VMGS_FIRMWARE_THRESHOLD_BYTES: u64 = 32 * 1024 * 1024;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have these VMGS threshold and hibernate token definitions in underhill worker? I figure this is a prototype in progress so probably doesn't matter now...

@github-actions

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants