Skip to content

Commit 42db218

Browse files
committed
VM Lifecycle page
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alexellis2@gmail.com>
1 parent 80d6617 commit 42db218

File tree

4 files changed

+284
-62
lines changed

4 files changed

+284
-62
lines changed

docs/platform/lifecycle.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# VM Lifecycle
2+
3+
Two things shape how you integrate with Slicer: how VMs are named, and how long they live. Both follow from a single primitive, the **host group**, defined in the daemon's config file.
4+
5+
## Host groups
6+
7+
A host group is a named pool of VMs that share the same hardware profile (vCPU, RAM, storage), network, and image. The daemon reads its host groups from its YAML config (typically produced by `slicer new`) at startup. You can define multiple groups in a single config. The common pattern is one group for a long-lived control plane plus one for ephemeral tenant or workload VMs:
8+
9+
```yaml
10+
config:
11+
host_groups:
12+
# One persistent, always-on VM created at daemon start.
13+
- name: ctrl
14+
storage: image
15+
storage_size: 25G
16+
count: 1
17+
vcpu: 4
18+
ram_gb: 8
19+
userdata: |
20+
apt-get update -qy
21+
apt-get install -qy nginx postgresql
22+
network:
23+
bridge: brctrl0
24+
tap_prefix: ctrl
25+
gateway: 192.168.137.1/24
26+
27+
# No pre-allocated VMs, everything is API-launched.
28+
# Isolated networking, firewalled subnet with egress controlled by allow/drop.
29+
- name: sbox
30+
storage: image
31+
storage_size: 25G
32+
count: 0
33+
vcpu: 2
34+
ram_gb: 4
35+
network:
36+
mode: "isolated"
37+
drop: []
38+
allow: ["0.0.0.0/0"]
39+
40+
image: "ghcr.io/openfaasltd/slicer-systemd-min:6.1.90-x86_64-latest"
41+
hypervisor: firecracker
42+
api:
43+
port: 8080
44+
bind_address: "127.0.0.1"
45+
auth:
46+
enabled: true
47+
```
48+
49+
Run `slicer new --help` for the full set of flags and `slicer new NAME > slicer.yaml` to generate a starter config.
50+
51+
Host groups must be defined in the YAML file before starting the daemon. They cannot be added dynamically at runtime. If you have that need, consider the [Slicer per tenant](/platform/instance-per-tenant/) model, where each tenant gets its own daemon and its own host group config.
52+
53+
### What `count:` does
54+
55+
- **`count: N`**: Slicer creates and **protects** N VMs in that group at startup. They're persistent by construction: the daemon restores them after its own restart, and they come back automatically if the host reboots. Use this for the control plane, a shared database, or anything that must be present whenever the daemon is running.
56+
- **`count: 0`**: no pre-allocated VMs. Callers create and delete VMs on demand through the API (`POST /hostgroup/NAME/nodes`). This is the right shape for sandboxes, per-job workers, and tenant workloads.
57+
58+
Both shapes can coexist in the same daemon. The split above, one persistent control-plane host group plus one on-demand sandbox host group, is how most multi-tenant deployments are structured.
59+
60+
### VM size at launch
61+
62+
The host group's `vcpu` / `ram_gb` are the **default and maximum** for VMs in that group. When you launch a VM through the API you can:
63+
64+
- Omit `cpus` / `ram_bytes` entirely - the VM gets the host group's defaults.
65+
- Request **the same or less** than the defaults - honoured as-is.
66+
- Request **more** than the defaults - rejected with `400`.
67+
68+
So if `sbox` is defined at 2 vCPU / 4 GiB, a client can legitimately launch a 1 vCPU / 1 GiB worker inside it but cannot launch an 8 vCPU / 16 GiB worker. To offer bigger VMs, define a separate host group with a bigger profile.
69+
70+
## Naming: pets vs. cattle
71+
72+
You don't pick the hostname. When Slicer creates a VM in a host group, it assigns the name: `<hostgroup>-1`, `<hostgroup>-2`, `<hostgroup>-3`, and so on. Numbers increment per host group.
73+
74+
This is deliberate. Host groups are pools of interchangeable VMs, not a hand-managed machine register. No name collisions, no "that name is already taken" errors. Slicer tracks the real hostname internally; your application should track meaning through **tags**.
75+
76+
## Tags for stable identity
77+
78+
Tags are a free-form array of strings attached to each VM. Pass them at creation time:
79+
80+
```bash
81+
curl -X POST http://127.0.0.1:8080/hostgroup/sbox/nodes \
82+
-H "Content-Type: application/json" \
83+
-d '{
84+
"tags": ["user=alice", "job=build-4821", "display=Alice dev environment"],
85+
"cpus": 2,
86+
"ram_bytes": 4294967296
87+
}'
88+
```
89+
90+
Any string is valid. The convention that works well in practice is `key=value`: easy to filter, easy to render in a UI. Use it to carry whatever your application needs to reason about later, for example a sandbox expiry deadline: `expires_at=2026-04-14 08:28:00`.
91+
92+
### Looking up a VM by tag
93+
94+
The list endpoint filters on either exact match or prefix:
95+
96+
```bash
97+
# exact
98+
GET /nodes?tag=user=alice
99+
100+
# prefix (matches any tag starting with "user=")
101+
GET /nodes?tag_prefix=user=
102+
```
103+
104+
Also available on a specific host group:
105+
106+
```bash
107+
GET /hostgroup/sbox/nodes?tag_prefix=user=
108+
```
109+
110+
### How to represent Slicer VMs in your product
111+
112+
Whether your product surfaces VMs to humans (a dashboard, a CLI, a support tool) or to other systems (a scheduler, a billing pipeline, an API), the shape is the same:
113+
114+
1. **Create** with tags carrying the display name, owner, and any internal IDs from your product like tenant, namespace, billing ID, environment, and so on.
115+
2. **List / look up** VMs. On a shared daemon (Slicer per host) scope results with `tag_prefix=owner=` or similar. On a per-tenant daemon the unfiltered list already belongs to one tenant, so a plain `GET /nodes` is enough.
116+
3. **Render** the tag value where an end user sees a VM; keep the auto-assigned hostname as the internal handle your product uses to address it.
117+
4. **Manage** (start, stop, delete) via the real hostname, carried alongside the friendly tag in whatever record your product already stores.
118+
119+
This keeps Slicer's naming model out of your product's domain language while still giving you precise control over each VM.
120+
121+
## Lifecycle
122+
123+
### Ephemeral is the default
124+
125+
VMs launched through the API (`POST /hostgroup/NAME/nodes`) are **ephemeral** by default. They run until one of three things happens, and in every case the disk is removed and there is no automatic restart:
126+
127+
- **DELETE via the API**: the VM stops and the disk is removed.
128+
- **Guest exits on its own** (`sudo reboot`, kernel panic, and similar): the daemon's reaper notices and cleans up the record and the disk.
129+
- **Daemon restart**: ephemeral VM records are not carried across, so the VMs are gone.
130+
131+
This is the right shape for code execution, CI jobs, batch processing, and anything where the VM is disposable.
132+
133+
### Persistent API-launched VMs
134+
135+
For VMs that should survive daemon restarts, such as long-running dev environments, tenant workspaces, and user-facing sandboxes, set `persistent: true` at creation:
136+
137+
```bash
138+
curl -X POST http://127.0.0.1:8080/hostgroup/sbox/nodes \
139+
-H "Content-Type: application/json" \
140+
-d '{
141+
"persistent": true,
142+
"tags": ["user=alice", "purpose=dev"],
143+
"cpus": 2,
144+
"ram_bytes": 4294967296
145+
}'
146+
```
147+
148+
Or with the CLI:
149+
150+
```bash
151+
slicer vm launch sbox --persistent --tag user=alice
152+
```
153+
154+
Persistent VMs:
155+
156+
- Survive daemon restarts. The daemon re-attaches to them on startup.
157+
- Are **not** deleted when the VM stops. Their disk is retained.
158+
- Can be stopped deliberately without losing state, via `POST /vm/HOSTNAME/shutdown` or `sudo reboot` inside the guest.
159+
- Stay around until you explicitly `DELETE` them through the API or CLI. Delete removes the disk; there is no undelete.
160+
161+
Bring a stopped persistent VM back up with:
162+
163+
```bash
164+
slicer vm relaunch HOSTNAME
165+
```
166+
167+
or the equivalent REST call:
168+
169+
```bash
170+
POST /vm/HOSTNAME/relaunch
171+
```
172+
173+
Relaunch is the intended recovery path whenever a persistent VM has been shut down cleanly, whether by the API, the guest, or a host reboot. Config-declared VMs from `count: N` are a special case: the daemon re-launches them automatically whenever it starts, so no manual `relaunch` is needed for those.
174+
175+
## Where to go next
176+
177+
Most integrations land in one of the shapes below. Pick the row that's closest to what you're building and follow the deployment link.
178+
179+
| Use case | Lifecycle | Deployment | Networking |
180+
| --- | --- | --- | --- |
181+
| Code execution / agent sandbox | Ephemeral | [Slicer per tenant](/platform/instance-per-tenant/) | [Isolated](/reference/networking/) + allowlist |
182+
| CI/CD job runners | Ephemeral | [Slicer per host](/platform/single-instance/) | [Isolated](/reference/networking/) |
183+
| Batch processing | Ephemeral | [Slicer per host](/platform/single-instance/) | Bridge |
184+
| Dev environments | Persistent | [Slicer per tenant](/platform/instance-per-tenant/) | Bridge |
185+
| Tenant workspaces | Persistent | [Slicer per tenant](/platform/instance-per-tenant/) | [Isolated](/reference/networking/) |
186+
| Named resources in your product | Persistent | [Slicer per tenant](/platform/instance-per-tenant/) | Bridge |
187+
| Control plane / shared services | Persistent | [Slicer per host](/platform/single-instance/) | Bridge |
188+
189+
Networking choices are rules of thumb. If your tenants execute untrusted code, default to [isolated mode](/reference/networking/) with an explicit egress allowlist rather than bridge.
190+
191+
## See also
192+
193+
* [Slicer per host](/platform/single-instance/): one daemon, tenants share it, ownership tracked via tags.
194+
* [Slicer per tenant](/platform/instance-per-tenant/): one daemon per tenant, isolated networking, Unix sockets.
195+
* [Go SDK](/platform/go-sdk/)
196+
* [TypeScript SDK](/platform/typescript-sdk/)
197+
* [REST API reference](/reference/api/): exact request/response shapes.

docs/platform/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Your data never leaves your network, there is no third-party control plane, and
2626
| Copy files to/from VM | `POST/GET /vm/HOSTNAME/cp` | Binary or tar mode, set uid/gid/permissions |
2727
| Agent health | `HEAD/GET /vm/HOSTNAME/health` | Check readiness, userdata completion |
2828
| VM stats | `GET /nodes/stats` | CPU, memory, disk, network usage |
29-
| Pause / resume | `POST /vm/HOSTNAME/pause\|resume` | Freeze CPU, resume instantly |
29+
| Pause / resume | `POST /vm/HOSTNAME/pause`<br>`POST /vm/HOSTNAME/resume` | Freeze CPU, resume instantly |
3030
| Shutdown / reboot | `POST /vm/HOSTNAME/shutdown` | Graceful shutdown or reboot |
3131
| Secrets | `POST/GET/DELETE /secrets` | Inject credentials into VMs |
3232
| Serial logs | `GET /vm/HOSTNAME/logs` | Boot logs and serial console output |

0 commit comments

Comments
 (0)