Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions experiments/astraea-sandbox/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Astraea-in-Hyperlight Experiment

## Goal
Run Astraea's core comparison logic inside a Hyperlight micro-VM sandbox to:
1. Learn the Hyperlight guest development model
2. Discover API gaps / pain points to contribute back
3. Evaluate feasibility of sandboxed data comparison for ODV

## Architecture

```
Host (normal Rust binary) Guest (no_std + alloc, x86_64-hyperlight-none)
┌─────────────────────────┐ ┌──────────────────────────────────┐
│ 1. Parse CSV files │ │ │
│ 2. Serialize rows to │──map_region──> │ 3. Deserialize rows │
│ shared memory │ │ 4. Compare with tolerance │
│ │<──return────── │ 5. Return diff categories │
│ 6. Read results │ │ │
└─────────────────────────┘ └──────────────────────────────────┘
```

## Guest Functions

- `compare_rows(left_ptr, left_len, right_ptr, right_len, config_ptr, config_len) -> result_ptr`
- Input: serialized row pairs + comparison config (simple binary format, not JSON)
- Output: serialized diff result (match/mismatch per column)

## What Works in Guest (alloc available)
- Vec, String, HashMap (via alloc)
- Custom allocator (hyperlight provides one)
- Float parsing, tolerance comparison
- Sorting, deduplication

## What Doesn't Work (no std)
- File I/O (csv crate) -- host parses, guest compares
- Full regex -- use regex-automata with alloc, or simple string matching
- Threads -- single-threaded only
- Networking -- obviously

## Data Passing Strategy
- Host serializes rows as length-prefixed byte arrays into a mapped region
- Guest reads from known GPA offset
- Guest writes results to scratch region
- Simple binary protocol (no serde_json in guest)

## Files to Create
1. `src/guests/astraea_guest/Cargo.toml` -- guest crate
2. `src/guests/astraea_guest/src/main.rs` -- guest entry point
3. `examples/astraea_sandbox.rs` -- host-side example

## Open Questions
- Is map_region the right way to pass large data? Or should we use the PEB scratch area?
- What's the max practical data size for a single comparison batch?
- Can we reuse the sandbox across multiple comparisons (multi-use sandbox)?
- What's the overhead vs calling Astraea directly? (benchmark needed)
9 changes: 9 additions & 0 deletions experiments/astraea-sandbox/guest/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[package]
name = "astraea-guest"
version = "0.1.0"
edition = "2021"

[dependencies]
hyperlight-guest = { path = "../../src/hyperlight_guest" }
hyperlight-guest-bin = { path = "../../src/hyperlight_guest_bin" }
hyperlight-common = { path = "../../src/hyperlight_common", default-features = false }
39 changes: 39 additions & 0 deletions experiments/astraea-sandbox/guest/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#![no_std]
#![no_main]

extern crate alloc;

use alloc::format;
use alloc::string::String;
use hyperlight_guest_bin::guest_function;

extern crate hyperlight_guest;

/// Compare two float values within a tolerance.
/// Returns "match" or "mismatch|left|right|diff".
#[guest_function("CompareValues")]
fn compare_values(left: String, right: String, tolerance: String) -> String {
let tol: f64 = tolerance.parse().unwrap_or(1e-6);

// Try numeric comparison first
match (left.parse::<f64>(), right.parse::<f64>()) {
(Ok(l), Ok(r)) => {
let diff = (l - r).abs();
let max_abs = l.abs().max(r.abs());
// Combined absolute + relative tolerance
if diff <= tol || (max_abs > 0.0 && diff / max_abs <= tol) {
String::from("match")
} else {
format!("mismatch|{}|{}|{}", left, right, diff)
}
}
_ => {
// String comparison
if left == right {
String::from("match")
} else {
format!("mismatch|{}|{}", left, right)
}
}
}
}
118 changes: 118 additions & 0 deletions scripts/KVM_TESTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# KVM Testing on AWS

Hyperlight's CI runs on Azure with Hyper-V. These scripts let you test on **real KVM hardware** via AWS EC2 — useful for validating Linux/KVM-specific behavior that can't be caught in Hyper-V CI.

## Quick Start

```bash
# Online mode (instance has internet — simplest)
./scripts/kvm-test.sh

# Offline mode (air-gapped instance, pre-vendored deps)
# Step 1: Prepare the bucket (once, or when deps change)
./scripts/prepare-offline.sh s3://my-bucket

# Step 2: Run tests
VENDOR_BUCKET=s3://my-bucket ./scripts/kvm-test.sh --offline
```

One command → launches a KVM-capable instance → builds → tests → terminates. ~25 minutes, ~$0.15.

## Scripts

| Script | Purpose |
|--------|---------|
| `kvm-test.sh` | End-to-end: launch instance, install, build, test, terminate |
| `prepare-offline.sh` | Populate S3 bucket with toolchain + vendor for offline mode |
| `vendor-all.sh` | Create a complete vendor directory (handles multi-lockfile problem) |

### `kvm-test.sh`

```
Options:
--offline Use pre-vendored S3 bucket (no internet needed on instance)
--ami AMI_ID Skip install, use a pre-baked AMI
--bake Create an AMI after install for faster future runs
--keep Don't terminate instance (for debugging)
--filter PATTERN Run only matching tests (e.g. "map_region")
--timeout MIN Cost guard (default: 45 min)
--instance-type Override instance type (default: c8i.2xlarge)
--region Override region (default: us-east-1)
```

**Prerequisites:**
- AWS CLI v2 with valid credentials
- `session-manager-plugin` (`brew install --cask session-manager-plugin`)
- IAM permissions: EC2, SSM, IAM (and S3 if `--offline`)

### `prepare-offline.sh`

Populates an S3 bucket with everything `kvm-test.sh --offline` needs:

```bash
./scripts/prepare-offline.sh s3://my-bucket
./scripts/prepare-offline.sh s3://my-bucket --rust-version 1.89.0
```

Run once, then iterate with `kvm-test.sh --offline`. Re-run when:
- Rust version changes
- Dependencies change (new crates in Cargo.lock)
- You modify guest crate lockfiles

### `vendor-all.sh`

Creates a complete vendor directory for fully offline builds. Handles the tricky part: Hyperlight has **multiple independent lockfiles** (workspace root + 3 guest crates), and `cargo-hyperlight` builds the stdlib sysroot which needs crates pinned by the Rust toolchain's own lockfile.

```bash
./scripts/vendor-all.sh # vendor to ./vendor-all/
./scripts/vendor-all.sh /tmp/output # custom output path
```

## Why Vendoring is Hard

`cargo-hyperlight` uses `-Zbuild-std` to compile guest binaries for `x86_64-hyperlight-none`. This triggers a sysroot build that:

1. Uses the **guest crate's** `Cargo.lock` (not the workspace root's)
2. Needs stdlib crates at versions pinned by the **Rust toolchain's** lockfile
3. These versions often differ from what the workspace uses (e.g., `cfg-if 1.0.1` for stdlib vs `1.0.4` for the repo)

A naive `cargo vendor` only covers the root workspace. `vendor-all.sh` handles all three cases by downloading the exact versions needed from crates.io and placing them in a single vendor directory with Cargo's multi-version naming convention (e.g., `cfg-if` for 1.0.4 and `cfg-if-0` for 1.0.1).

## Instance Types

KVM nested virtualization requires:
- **Intel**: `c8i`, `c7i`, `m7i`, `r7i` families
- **AMD**: `c7a`, `m7a` families
- Must explicitly enable via `CpuOptions.NestedVirtualization`

## Cost

| Instance | vCPUs | RAM | $/hr | Typical run (25 min) |
|----------|-------|-----|------|---------------------|
| c8i.2xlarge | 8 | 16 GB | $0.34 | ~$0.15 |
| c8i.4xlarge | 16 | 32 GB | $0.68 | ~$0.28 |

The cost guard auto-terminates after `--timeout` minutes (default 45) to cap spend at ~$0.26 worst case.

## Composability

The scripts compose to replicate the full workflow:

```
prepare-offline.sh ─── populates S3 bucket (run once)
kvm-test.sh --offline ─── launches instance, installs from bucket, builds, tests
├── uses vendor-all.sh logic (embedded in all-vendor.tar.gz)
├── builds guest binaries via just build-rust-guests
├── runs cargo test --package hyperlight-host
└── terminates instance
```

For iterating on code changes, update just the repo tarball:
```bash
COPYFILE_DISABLE=1 tar czf /tmp/hyperlight-repo.tar.gz --exclude='.git' --exclude='target' -C /path/to/repo .
aws s3 cp /tmp/hyperlight-repo.tar.gz s3://my-bucket/hyperlight-repo.tar.gz
VENDOR_BUCKET=s3://my-bucket ./scripts/kvm-test.sh --offline
```
Loading
Loading