[WIP] Preserve UEFI firmware across hibernation in OpenHCL#3771
[WIP] Preserve UEFI firmware across hibernation in OpenHCL#3771mebersol wants to merge 16 commits into
Conversation
…N_FIRMWARE Phase 0 of OpenHCL hibernation firmware support. Repurposes FileId 14 as an 8-byte power token (HIBERNATION_TOKEN) and introduces FileId 19 (HIBERNATION_FIRMWARE) to hold a snapshot of the UEFI firmware image. Updates vmgstool's parse_file_id accordingly.
Mirror the legacy HCL HclPowerServices behavior: write an 8-byte power token to VMGS FileId HIBERNATION_TOKEN on a power transition (consumed/cleared on resume). On hibernate, write u64::MAX when the VMGS backing store is >=32MB (large enough to hold a firmware snapshot) else 0x1; clear to 0x0 on power off/reset. Adds a device_size query to Vmgs, the vmgs broker RPC, and VmgsClient.
For hibernate-enabled VMs, query the VMGS size once at boot and save a flag indicating whether the backing store is large enough (>=32MB) to hold a firmware image snapshot. halt_task uses this saved flag to choose the power token value (u64::MAX vs 0x1) instead of querying on the power-off path. Adds a TODO to actually stash the firmware image.
…se 1) At cold boot, for hibernate-enabled VMs whose VMGS is large enough, read the pristine UEFI firmware image out of VTL0 guest memory and store it in vmgs::FileId::HIBERNATION_FIRMWARE before any dynamic config is written into the firmware region (i.e. before write_uefi_config). If the snapshot fails or this is not a UEFI boot, the hibernation_firmware_stored flag is downgraded so the hibernate power token reflects that no firmware was stored.
On a cold boot that is resuming a hibernated guest (detected via the HIBERNATION_TOKEN power token == FIRMWARE_STORED), restore the previously snapshotted UEFI firmware image from vmgs::FileId::HIBERNATION_FIRMWARE back into VTL0 guest memory before write_uefi_config runs, so the resumed guest sees an identical firmware binary even if the host firmware changed. The token is consumed (set to NONE) after a successful restore so a later clean boot does not restore stale firmware. Normal boots continue to snapshot the pristine firmware (Phase 1).
Rename snapshot_firmware_to_vmgs -> store_firmware_to_vmgs and restore_firmware_from_vmgs -> load_firmware_from_vmgs. Move VMGS_FIRMWARE_THRESHOLD_BYTES out of the hibernation_token module to a standalone const, since it is not a token value.
Expose Vmgs::delete_file through the vmgs_broker VmgsBrokerRpc/VmgsClient. On hibernation resume, consume the power token by deleting the HIBERNATION_TOKEN file (mirroring legacy HCL dataStore->DeleteFile) instead of overwriting it with a NONE value.
…ection store_firmware_to_vmgs now verifies the firmware image fits within the VMGS backing store before writing. The resume check treats any non-zero hibernation power token as resuming.
Rename the mod hibernation_token / write_hibernation_token / read_hibernation_token / delete_hibernation_token to power_token throughout and update comments to consistently call it the power token.
Revert the power_token naming to hibernate_token for the mod and helper functions, updating call sites and comments accordingly.
Previously the flag was pre-set from VMGS capacity (whether the store was large enough), which was misleading since it implied firmware was stored before any store/restore happened. Now first determine whether we are resuming or able to store the firmware, perform the operation, and only set hibernation_firmware_stored to true once an image is actually present in VMGS. The redundant VMGS_FIRMWARE_THRESHOLD_BYTES capacity heuristic is removed; store_firmware_to_vmgs already gates on the actual firmware size vs device size.
Re-add the VMGS_FIRMWARE_THRESHOLD_BYTES (32MB) minimum-size requirement inside store_firmware_to_vmgs. This is a required overall-size gate (room for the firmware image alongside other VMGS files), distinct from the tight per-image fit check. Keeping it in the store helper preserves the outcome-driven hibernation_firmware_stored: the caller downgrades the flag to false when the store bails.
|
Do we have vmm test coverage of this? |
There was a problem hiding this comment.
Pull request overview
This PR adds OpenHCL support for preserving the exact UEFI firmware image across a hibernate/resume cycle by introducing a VMGS “hibernation token” plus a dedicated VMGS file to store a firmware snapshot, and plumbing the required VMGS broker operations.
Changes:
- Introduces
FileId::HIBERNATION_TOKEN(14) andFileId::HIBERNATION_FIRMWARE(19), and updatesvmgstoolto recognize the new IDs. - Adds VMGS backing-store size querying (
device_size) and adelete_fileVMGS broker RPC to support token consumption/management. - Implements Underhill boot/halt logic to snapshot firmware on cold boot, restore on resume, and write/clear tokens across power transitions.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/vmgs/vmgstool/src/main.rs | Adds parsing support for HIBERNATION_TOKEN and updates HIBERNATION_FIRMWARE mapping. |
| vm/vmgs/vmgs/src/vmgs_impl.rs | Exposes VMGS backing-store size via Vmgs::device_size(). |
| vm/vmgs/vmgs_format/src/lib.rs | Defines new VMGS FileIds for hibernation token and firmware snapshot. |
| vm/vmgs/vmgs_broker/src/client.rs | Adds client APIs for device_size() and delete_file(). |
| vm/vmgs/vmgs_broker/src/broker.rs | Adds broker RPC variants/handlers for DeviceSize and DeleteFile. |
| openhcl/underhill_core/src/worker.rs | Adds firmware snapshot/restore + hibernate-token handling in boot path and halt task. |
| Err(err) => { | ||
| tracing::error!( | ||
| CVM_ALLOWED, | ||
| error = err.as_ref() as &dyn std::error::Error, | ||
| "failed to restore UEFI firmware image on hibernation resume" | ||
| ); | ||
| false | ||
| } |
| async fn read_hibernate_token(vmgs_client: &vmgs_broker::VmgsClient) -> Option<u64> { | ||
| let buf = vmgs_client | ||
| .read_file(vmgs::FileId::HIBERNATION_TOKEN) | ||
| .await | ||
| .ok()?; | ||
| let bytes = <[u8; 8]>::try_from(buf.as_slice()).ok()?; | ||
| Some(u64::from_le_bytes(bytes)) | ||
| } |
| }); | ||
| } | ||
|
|
||
| let mut firmware = vec![0u8; len as usize]; |
| } | ||
| }; | ||
|
|
||
| // Hibernation firmware compatibility (OpenHCL-only). Determine whether a |
There was a problem hiding this comment.
Would this compatibility hazard not also apply to OpenVMM as the host? Should this code live somewhere more shareable?
| /// snapshot for hibernation. This is a minimum overall size so that the | ||
| /// firmware image fits alongside the other VMGS files, not just a tight fit of | ||
| /// the image itself. | ||
| const VMGS_FIRMWARE_THRESHOLD_BYTES: u64 = 32 * 1024 * 1024; |
There was a problem hiding this comment.
Does it make sense to have these VMGS threshold and hibernate token definitions in underhill worker? I figure this is a prototype in progress so probably doesn't matter now...
Summary
Adds support for preserving the exact UEFI firmware image across a hibernate/resume cycle for hibernation-enabled (non-isolated) VMs in OpenHCL. This guarantees a resumed guest sees an identical firmware-provided view (ACPI/SMBIOS/memory layout) even if the host's OpenHCL/UEFI changed between the original cold boot and the resume.
Motivation
A hibernating guest captures OS state that assumes a specific firmware-provided hardware view. If the firmware image differs after resume, that view can shift and break the resumed guest. Legacy HCL guards against this with a firmware-version "power token"; this change goes further and snapshots the actual firmware image into VMGS so the resumed guest is restored bit-for-bit.
What it does
FileId14 asHIBERNATION_TOKEN(an 8-byte little-endian token) and addsFileId19HIBERNATION_FIRMWAREto hold the firmware image snapshot.vmgstoolunderstands both names.HIBERNATION_FIRMWAREVMGS file.delete_fileto the broker/client so the halt task and boot path can manage the token concurrently with the worker.Behavior / logging details
NONE) only in that case.Scope / limitations
Testing
cargo clippy --all-targets,cargo doc --no-deps, andcargo xtask fmtpass for the modified crates.