Skip to content

feat: add STS web identity and stabilize live e2e#128

Merged
GatewayJ merged 1 commit into
rustfs:mainfrom
GatewayJ:codex/sts-live-e2e-stability
May 22, 2026
Merged

feat: add STS web identity and stabilize live e2e#128
GatewayJ merged 1 commit into
rustfs:mainfrom
GatewayJ:codex/sts-live-e2e-stability

Conversation

@GatewayJ
Copy link
Copy Markdown
Member

Type of Change

  • New Feature
  • Bug Fix
  • Documentation
  • Performance Improvement
  • Test/CI
  • Refactor
  • Other:

Related Issues

N/A

Summary of Changes

This PR adds the operator STS web identity path and stabilizes the live e2e workflow around it.

  • Add the namespaced PolicyBinding API, generated CRDs, RBAC, Helm/k8s-dev manifests, and an operator STS service endpoint.
  • Add STS request parsing, TokenReview identity validation, PolicyBinding lookup, session policy merging, RustFS admin/client calls, XML response rendering, and console runtime wiring.
  • Add STS unit, manifest, and live e2e coverage.
  • Make e2e-live-run repeatable by resetting Tenant/PVC/PV/hostPath fixtures before the suites run, while sts_functional reuses the Ready smoke Tenant instead of recreating storage.
  • Preload cert-manager images into Kind during live environment creation and verify cert-manager rollout before TLS live suites.

Checklist

  • I have read and followed the CONTRIBUTING.md guidelines
  • Passed make pre-commit (fmt-check + clippy + test + console-lint + console-fmt-check)
  • Added/updated necessary tests
  • Documentation updated (if needed)
  • CHANGELOG.md updated under [Unreleased] (if user-visible change)
  • CI/CD passed (if applicable)

Impact

  • Breaking change (CRD/API compatibility)
  • Requires doc/config/deployment update
  • Other impact: adds a new STS endpoint and live e2e environment reset behavior.

Verification

make pre-commit
make e2e-live-create
make e2e-live-run
make e2e-live-run

Additional Notes

The repeated make e2e-live-run verification checks that local PVs and hostPath data are reset between live runs and that STS reuses the smoke Tenant without destabilizing PVC binding.


Thank you for your contribution! Please ensure your PR follows the community standards (CODE_OF_CONDUCT.md) and sign the CLA if this is your first contribution.

@GatewayJ GatewayJ force-pushed the codex/sts-live-e2e-stability branch from 1a1645e to 84c5551 Compare May 19, 2026 04:32
@GatewayJ GatewayJ marked this pull request as ready for review May 22, 2026 02:47
@GatewayJ GatewayJ added this pull request to the merge queue May 22, 2026
Merged via the queue into rustfs:main with commit 0f8353e May 22, 2026
2 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84c5551c4c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/lib.rs
let sts_tls_config = crate::sts::tls::OperatorStsTlsConfig::from_env();
let tls_server_config = if sts_tls_config.enabled {
let material =
crate::sts::tls::load_or_create_sts_tls_material(&client, &sts_tls_config).await?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep controller running when STS bootstrap fails

This ? makes STS bootstrap a hard prerequisite for the entire operator process. If STS TLS setup fails (for example, missing/invalid sts-tls with OPERATOR_STS_TLS_AUTO=false) the function returns early and the reconcile controller never starts. Since STS is enabled by default, an STS-only misconfiguration can cause a full control-plane outage; this should degrade by disabling STS and continuing controller startup.

Useful? React with 👍 / 👎.

Comment thread src/sts/server.rs
error = %error.code(),
"TokenReview denied STS request"
);
return xml_response(StatusCode::BAD_REQUEST, error.as_xml());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return 500 for TokenReview backend failures

The TokenReview path can produce StsError::InternalError (e.g., Kubernetes API failure or RBAC regression), but this branch always returns HTTP 400. That misclassifies server-side outages as client input errors, making failures harder to detect and potentially preventing correct retry behavior. This branch should map InternalError to 500 instead of always using 400.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant