Skip to content

feat(procedure): per-stage concurrency cap (max_concurrent) + wait_for_completion for batch deploys#1487

Open
djodjo02130 wants to merge 2 commits into
moghtech:mainfrom
djodjo02130:feat/procedure-stage-max-concurrent
Open

feat(procedure): per-stage concurrency cap (max_concurrent) + wait_for_completion for batch deploys#1487
djodjo02130 wants to merge 2 commits into
moghtech:mainfrom
djodjo02130:feat/procedure-stage-max-concurrent

Conversation

@djodjo02130

@djodjo02130 djodjo02130 commented Jun 19, 2026

Copy link
Copy Markdown

Problem

Procedure stages run all their executions in parallel (join_all) with no limit, so a stage that fans out to many executions — a Batch* deploy across hundreds of resources, or a "run everything" procedure — launches them all at once and can saturate the host (CPU / RAM / network). There is no way to throttle this.

Two changes together give procedures a proper worker-pool / task-queue behavior.

1. max_concurrent per stage

Optional field on ProcedureStage:

  • 0 (default) → unchanged: every execution runs at once (fully backwards compatible).
  • n > 0 → the stage runs as a worker pool of size n: only n executions run at a time, the rest are queued and started as running ones finish.

Implementation swaps join_all(futures) for stream::iter(futures).buffer_unordered(limit). All executions still complete before the stage returns and the first error is still propagated.

2. wait_for_completion on Deploy

max_concurrent caps concurrent executions. Most executions block until done, but a Deploy is fire-and-forget — it resolves as soon as the container is started, not when it exits. So for one-shot / batch containers the cap would throttle deploy issuance but not how many actually run.

wait_for_completion: bool (default false) makes the Deploy execution poll the container (InspectContainer) until it exits (24h safety cap) before resolving. Combined with max_concurrent, a stage of Deploy { wait_for_completion = true } becomes a real worker pool: max_concurrent = 10 ⇒ at most 10 containers running at once.

[[procedure.config.stage]]
name = "run-all-jobs"
max_concurrent = 10
executions = [
  { execution.type = "Deploy", execution.params.deployment = "job-01", execution.params.wait_for_completion = true },
  # ...
]

Notes

  • Both fields are opt-in and backwards compatible (serde defaults 0 / false).
  • wait_for_completion targets Server deployments; ignored for Swarm services.
  • Configurable via TOML and the procedure UI (stage "Max parallel" input + Deploy "Wait for completion" switch).
  • TS types regenerated via typeshare; docs updated.

Validation

cargo build, cargo test, and cargo fmt --all -- --check all pass.

…stage

Procedure stages run all their executions in parallel (`join_all`) with no limit,
so a stage that fans out to many executions (e.g. a Batch deploy across hundreds of
resources, or a "run all" procedure) can saturate the host's CPU/RAM/network.

Add an optional `max_concurrent` field to `ProcedureStage`:
  * `0` (default) -> unchanged: every execution runs at once (backwards compatible);
  * `n > 0`       -> the stage runs as a worker pool of size n: only n executions run
                     at a time, the rest are queued and started as running ones finish.

Implementation: replace `join_all(futures)` with
`stream::iter(futures).buffer_unordered(limit)` in the stage executor; all executions
still complete before the stage returns and the first error is propagated.

- entity: `ProcedureStage.max_concurrent: I64` (serde default 0)
- core: bounded-concurrency stage executor + 3 built-in procedure literals
- types: regenerated TS (`types.ts`, `types.d.ts`)
- ui: "Max parallel" NumberInput on the stage editor + newStage factory
- docs: procedures.md field + example
…iner exits

A `Deploy` execution is fire-and-forget: it resolves as soon as the container is
*started*, not when it *exits*. So a procedure stage's `max_concurrent` cap (added in
the previous commit) throttles how fast deploys are *issued*, but not how many
one-shot / batch containers actually *run* at once — they all end up started.

Add `wait_for_completion: bool` (default false) to `Deploy`. When true, after the
container is started the core polls its state (`InspectContainer`) until it exits
(or is gone), with a 24h safety cap, before the execution resolves. Combined with
`max_concurrent`, a stage of `Deploy { wait_for_completion = true }` executions then
runs as a true worker pool: `max_concurrent = 10` => at most 10 containers running
at a time, the rest queued until a slot frees.

- entity: `Deploy.wait_for_completion: bool` (#[serde(default)])
- core: poll-until-exit after deploy (Server target; ignored for Swarm services)
- types: regenerated via typeshare
- ui: "Wait for completion" switch on the Deploy execution
- docs: procedures.md note + example (max_concurrent + wait_for_completion)
@djodjo02130 djodjo02130 changed the title feat(procedure): cap parallel executions per stage with max_concurrent feat(procedure): per-stage concurrency cap (max_concurrent) + wait_for_completion for batch deploys Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant