feat(procedure): per-stage concurrency cap (max_concurrent) + wait_for_completion for batch deploys#1487
Open
djodjo02130 wants to merge 2 commits into
Open
Conversation
…stage
Procedure stages run all their executions in parallel (`join_all`) with no limit,
so a stage that fans out to many executions (e.g. a Batch deploy across hundreds of
resources, or a "run all" procedure) can saturate the host's CPU/RAM/network.
Add an optional `max_concurrent` field to `ProcedureStage`:
* `0` (default) -> unchanged: every execution runs at once (backwards compatible);
* `n > 0` -> the stage runs as a worker pool of size n: only n executions run
at a time, the rest are queued and started as running ones finish.
Implementation: replace `join_all(futures)` with
`stream::iter(futures).buffer_unordered(limit)` in the stage executor; all executions
still complete before the stage returns and the first error is propagated.
- entity: `ProcedureStage.max_concurrent: I64` (serde default 0)
- core: bounded-concurrency stage executor + 3 built-in procedure literals
- types: regenerated TS (`types.ts`, `types.d.ts`)
- ui: "Max parallel" NumberInput on the stage editor + newStage factory
- docs: procedures.md field + example
…iner exits
A `Deploy` execution is fire-and-forget: it resolves as soon as the container is
*started*, not when it *exits*. So a procedure stage's `max_concurrent` cap (added in
the previous commit) throttles how fast deploys are *issued*, but not how many
one-shot / batch containers actually *run* at once — they all end up started.
Add `wait_for_completion: bool` (default false) to `Deploy`. When true, after the
container is started the core polls its state (`InspectContainer`) until it exits
(or is gone), with a 24h safety cap, before the execution resolves. Combined with
`max_concurrent`, a stage of `Deploy { wait_for_completion = true }` executions then
runs as a true worker pool: `max_concurrent = 10` => at most 10 containers running
at a time, the rest queued until a slot frees.
- entity: `Deploy.wait_for_completion: bool` (#[serde(default)])
- core: poll-until-exit after deploy (Server target; ignored for Swarm services)
- types: regenerated via typeshare
- ui: "Wait for completion" switch on the Deploy execution
- docs: procedures.md note + example (max_concurrent + wait_for_completion)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Procedure stages run all their executions in parallel (
join_all) with no limit, so a stage that fans out to many executions — aBatch*deploy across hundreds of resources, or a "run everything" procedure — launches them all at once and can saturate the host (CPU / RAM / network). There is no way to throttle this.Two changes together give procedures a proper worker-pool / task-queue behavior.
1.
max_concurrentper stageOptional field on
ProcedureStage:0(default) → unchanged: every execution runs at once (fully backwards compatible).n > 0→ the stage runs as a worker pool of sizen: onlynexecutions run at a time, the rest are queued and started as running ones finish.Implementation swaps
join_all(futures)forstream::iter(futures).buffer_unordered(limit). All executions still complete before the stage returns and the first error is still propagated.2.
wait_for_completiononDeploymax_concurrentcaps concurrent executions. Most executions block until done, but aDeployis fire-and-forget — it resolves as soon as the container is started, not when it exits. So for one-shot / batch containers the cap would throttle deploy issuance but not how many actually run.wait_for_completion: bool(default false) makes theDeployexecution poll the container (InspectContainer) until it exits (24h safety cap) before resolving. Combined withmax_concurrent, a stage ofDeploy { wait_for_completion = true }becomes a real worker pool:max_concurrent = 10⇒ at most 10 containers running at once.Notes
0/false).wait_for_completiontargets Server deployments; ignored for Swarm services.Validation
cargo build,cargo test, andcargo fmt --all -- --checkall pass.