Skip to content

feat: add io_uring micro-benchmark playground#406

Closed
xgerman wants to merge 1 commit into
documentdb:mainfrom
xgerman:developer/io-uring-benchmark
Closed

feat: add io_uring micro-benchmark playground#406
xgerman wants to merge 1 commit into
documentdb:mainfrom
xgerman:developer/io-uring-benchmark

Conversation

@xgerman

@xgerman xgerman commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What

Adds documentdb-playground/io-uring-benchmark/ — a reproducible cloud (AKS/EKS) playground that measures PostgreSQL 18 io_method = io_uring vs worker vs sync on DocumentDB, driving the documentdb/micro-benchmarks Locust read-query suite.

Why / proven feasibility

A spike on a local kind cluster (kernel 6.12, io_uring_disabled=0) established the full enable path:

Finding Evidence
PG18 18-minimal-trixie ships io_uring initdb bootstrap postgres attempts io_uring queue setup
The only blocker is container seccomp RuntimeDefaultFATAL: could not setup io_uring queue: Operation not permitted
Relaxing seccomp turns it on Unconfined → PG18.3 starts, SHOW io_method = io_uring, pg_stat_io reads flow

CNPG natively exposes spec.seccompProfile, but the DocumentDB CR has no seccomp field, so the playground relaxes seccomp on the CNPG postgres pods via a Kyverno mutate policy.

Contents

  • CR variants manifests/documentdb-{sync,worker,io_uring}.yaml — differ only in spec.postgres.parameters.io_method.
  • seccomp — Kyverno policies (Unconfined quick-start + hardened Localhost), the curated CNPG io_uring profile (from CNPG #10446), and a node-installer DaemonSet.
  • harness — an in-cluster Locust runner + an 00…99 script pipeline + analyze.py (median + speedup-vs-sync table).
  • README — proven feasibility, the seccomp deep-dive, and an I/O-bound methodology (dataset > RAM, one variable, repeats + median, pg_stat_io corroboration).

How to run

cd documentdb-playground/io-uring-benchmark
./scripts/00-prereqs.sh && ./scripts/10-deploy-operator.sh && ./scripts/20-deploy-seccomp.sh
STORAGE_CLASS=<fast-ssd> ./scripts/30-deploy-documentdb.sh
DOCUMENT_COUNT=10000000 ./scripts/40-load-data.sh
./scripts/50-run-matrix.sh && ./scripts/60-collect.sh

Notes

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Adds documentdb-playground/io-uring-benchmark/, a reproducible cloud (AKS/EKS)
playground that measures PostgreSQL 18 io_method=io_uring vs worker vs sync on
DocumentDB using the documentdb/micro-benchmarks Locust read suite.

Includes:
- Three DocumentDB CR variants driven by spec.postgres.parameters.io_method.
- Kyverno mutate policies (Unconfined quick-start + hardened Localhost) plus the
  CNPG io_uring seccomp profile and a node installer DaemonSet, since the
  DocumentDB CR has no seccomp field and CNPG defaults postgres to RuntimeDefault
  (which strips the io_uring_* syscalls and crash-loops PG18).
- An in-cluster Locust runner, an 00..99 script pipeline, and analyze.py that
  emits a median + speedup-vs-sync summary table.
- A README documenting the proven feasibility (seccomp blocker + fix, verified on
  kind) and the I/O-bound benchmark methodology.

Depends on operator support for spec.postgres.parameters (PR documentdb#307).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: German Eichberger <geeichbe@microsoft.com>
@documentdb-triage-tool documentdb-triage-tool Bot added documentation Improvements or additions to documentation ecosystem enhancement New feature or request labels Jun 17, 2026
@documentdb-triage-tool

Copy link
Copy Markdown

🤖 Auto-triaged by documentdb-triage-tool.

Applied: ecosystem, documentation, enhancement
Project fields suggested: Component playground · Priority P3 · Effort XL · Status In Progress
Confidence: 0.85 (mixed)

Reasoning

component from path globs (playground, docs); effort from diff stats (2067+0 LOC, 23 files); LLM: Adds a new multi-file io_uring benchmarking playground (manifests, scripts, Kyverno policies, harness, README) scoped entirely to the playground component; it's a nice-to-have experimental feature with no functional or blocking impact.

If a label is wrong, remove it manually and ping @patty-chow so the rules can be tuned. The bot will not re-label items that already have component labels.

@xgerman

xgerman commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

Superseded by #407, which makes io_uring a first-class operator-native opt-in feature gate (spec.featureGates.IOUring: true). The operator now sets io_method=io_uring and relaxes the postgres container seccomp profile via CNPG's native Cluster.Spec.SeccompProfile — no Kyverno policy required. The benchmark harness in this PR remains available on branch developer/io-uring-benchmark for reference; closing in favor of the productized feature.

@xgerman xgerman closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ecosystem enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant