Skip to content

feat(CLDSRV-884): Add OpenTelemetry tracing instrumentation#6140

Merged
bert-e merged 4 commits into
development/9.4from
improvement/CLDSRV-884/otel-instrumentation
Jun 5, 2026
Merged

feat(CLDSRV-884): Add OpenTelemetry tracing instrumentation#6140
bert-e merged 4 commits into
development/9.4from
improvement/CLDSRV-884/otel-instrumentation

Conversation

@delthas
Copy link
Copy Markdown
Contributor

@delthas delthas commented Apr 2, 2026

Summary

Add OpenTelemetry tracing to cloudserver, gated behind ENABLE_OTEL=true. When the flag is unset, no @opentelemetry/* package is loaded — zero overhead off the OTEL path.

The bootstrap, trust-boundary host filter, health-probe filter, and per-handler span helper now live in arsenal's shared lib/tracing module (scality/Arsenal#2632, ARSN-586); cloudserver consumes it instead of carrying its own copy. Rebased onto development/9.4, leading with a prettier-only commit so the feature commits show logic, not formatting noise.

Commits

  1. chore: prettier-format files touched by CLDSRV-884 — isolates the pre-existing prettier reformat of lib/api/api.js + lib/server.js so the subsequent feat: commits carry only logic deltas.

  2. chore: add OpenTelemetry dependencies and consume arsenal tracing module — pin arsenal at the ARSN-586 branch (which carries the shared tracing module and the W3C trace-context stamping on MongoDB metadata writes, ARSN-572, so async consumers can continue the trace across the oplog boundary). Drop the SDK-core packages now that arsenal carries them as optionalDependencies, and keep the three instrumentation packages (instrumentation-http / -ioredis / -mongodb) here — the consumer owns and configures them. No auto-instrumentations-node bundle (which would pull ~36 unused instrumentations).

  3. feat: wire OpenTelemetry bootstrap and shutdown via arsenal

    • index.js initializes tracing via require('arsenal/build/lib/tracing') before server.js. The deep-require into build/lib/tracing is deliberate: require('arsenal') would eagerly pull ioredis/mongodb through the barrel and defeat require-in-the-middle patching. The call is gated on tracing.isEnabled() so the OTEL-off path never loads Config (and the arsenal barrel) early, and the cluster primary skips init (workers re-exec and init themselves). Cloudserver constructs the three instrumentations itself and passes the http one through tracing.makeHttpInstrumentationConfig({ healthPaths }) for the k8s liveness/readiness/metrics paths — which wires the trust-boundary requestHook (strips traceparent/tracestate on outbound to hosts outside OTEL_TRUSTED_HOSTS) plus an OPTIONS + health-path ignore hook.
    • S3Server.cleanUp() is async: it awaits tracing.close() after server.close() and before process.exit. close() is race-safe and bounded at 5 s via Promise.race with a .unref()'d timer, so an unreachable collector can't block past Kubernetes' 30 s grace. caughtExceptionShutdown() flushes on the non-cluster and cluster-worker paths; the cluster primary keeps its survive-and-log behavior.
    • Inbound trace-context extraction is intentionally not done manually: instrumentation-http already makes the server span a child of any remote traceparent. A manual extract would demote downstream api.<handler> spans to siblings.
  4. feat: instrument all S3 API handlers with OTEL spans (lib/api/api.js) — apply arsenal's instrumentApiMethod via an Object.entries(api) loop with a NON_INSTRUMENTED_KEYS skip set (callApiMethod, checkAuthResults, handleAuthorizationResults). Every S3 handler produces an api.<method> span (instrumentation scope arsenal.api) owning the whole handler execution; auto-instrumentation spans nest beneath. New handlers are auto-instrumented — no per-handler boilerplate. When OTEL is disabled, instrumentApiMethod returns the original function unchanged.

Configuration

Variable Default Description
ENABLE_OTEL unset Set to true to load the SDK. When unset, no @opentelemetry/* package is required — zero overhead.
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT — (required when enabled) Collector URL (OTLP/HTTP)
OTEL_SAMPLING_RATIO 0.01 Trace sampling ratio in [0, 1]. Use 1 to trace everything during testing.
OTEL_TRUSTED_HOSTS unset Comma-joined list of lowercase bare hostnames that may receive outbound traceparent. Loopback is always trusted. Unset ⇒ only loopback trusted, so cross-service traceparent is stripped until an operator sets this (safe default).
OTEL_SERVICE_NAME cloudserver Service name in spans
OTEL_SERVICE_VERSION package.json#version Service version in spans
OTEL_SERVICE_NAMESPACE scality Namespace label

Tracing-correctness coverage (bootstrap, trust boundary, health filter, span helper) lives with the code in arsenal's tests/unit/tracing/; the full cloudserver unit suite remains green.

Out of scope (follow-ups)

The trust-boundary enforcement (read OTEL_TRUSTED_HOSTS, strip traceparent on untrusted outbound) lives entirely in arsenal and is wired automatically. Populating that env var per deployment is routine operator config — no cloudserver-side work.

Related tickets

Issue: CLDSRV-884

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Apr 2, 2026

Hello delthas,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Apr 2, 2026

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 89.43089% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.35%. Comparing base (9fc5974) to head (b16b268).
⚠️ Report is 16 commits behind head on development/9.4.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
lib/api/api.js 89.47% 10 Missing ⚠️
lib/server.js 89.28% 3 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
lib/server.js 82.62% <89.28%> (+3.01%) ⬆️
lib/api/api.js 90.87% <89.47%> (+0.19%) ⬆️

... and 1 file with indirect coverage changes

@@                 Coverage Diff                 @@
##           development/9.4    #6140      +/-   ##
===================================================
+ Coverage            85.30%   85.35%   +0.04%     
===================================================
  Files                  208      208              
  Lines                13919    13932      +13     
===================================================
+ Hits                 11874    11891      +17     
+ Misses                2045     2041       -4     
Flag Coverage Δ
file-ft-tests 68.81% <82.92%> (-0.06%) ⬇️
kmip-ft-tests 28.15% <69.10%> (+0.03%) ⬆️
mongo-v0-ft-tests 69.98% <82.92%> (-0.11%) ⬇️
mongo-v1-ft-tests 70.01% <82.92%> (+0.04%) ⬆️
multiple-backend 36.63% <72.35%> (+0.02%) ⬆️
sur-tests 35.47% <72.35%> (+0.02%) ⬆️
sur-tests-inflights 37.37% <72.35%> (+0.02%) ⬆️
unit 72.12% <82.11%> (+0.09%) ⬆️
utapi-v2-tests 34.80% <73.17%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread lib/otel.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
@delthas delthas force-pushed the improvement/CLDSRV-884/otel-instrumentation branch from 9c93a1d to 97e63fc Compare April 2, 2026 16:32
Comment thread lib/instrumentation/simple.js Outdated
Comment thread package.json Outdated
Comment thread lib/otel.js Outdated
Comment thread lib/otel.js Outdated
@delthas delthas force-pushed the improvement/CLDSRV-884/otel-instrumentation branch from 97e63fc to f978280 Compare April 2, 2026 16:35
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/otel.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
@delthas delthas force-pushed the improvement/CLDSRV-884/otel-instrumentation branch from f978280 to 06eea4e Compare April 2, 2026 16:49
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
Comment thread package.json Outdated
@delthas delthas force-pushed the improvement/CLDSRV-884/otel-instrumentation branch from 06eea4e to 75c3afb Compare April 2, 2026 17:02
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
Comment thread package.json Outdated
Comment thread lib/otel.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
@delthas delthas force-pushed the improvement/CLDSRV-884/otel-instrumentation branch from 75c3afb to 207b785 Compare April 13, 2026 13:05
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/tracing/trustedHosts.js Outdated
@delthas
Copy link
Copy Markdown
Contributor Author

delthas commented May 26, 2026

Comment thread lib/tracing/healthPaths.js Outdated
Comment thread lib/tracing/instrumentation.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/api/api.js
Comment thread lib/tracing/index.js Outdated
Comment thread lib/tracing/index.js Outdated
Comment thread lib/tracing/index.js Outdated
Comment thread lib/server.js Outdated
Copy link
Copy Markdown
Contributor

@DarkIsDude DarkIsDude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good for me. My main concern is about the package.json 🙏

Comment thread package.json Outdated
Comment thread index.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
Comment thread lib/instrumentation/simple.js Outdated
Comment thread package.json Outdated
Comment thread lib/server.js
Comment thread package.json Outdated
Comment thread tests/unit/lib/tracing/instrumentation.spec.js Outdated
Comment thread package.json Outdated
Comment thread lib/tracing/instrumentation.js Outdated
Comment thread tests/unit/server.js
Comment thread package.json
Comment thread lib/tracing/instrumentation.js Outdated
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Jun 1, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Pre-existing prettier debt on lib/api/api.js and lib/server.js — the
9.4 base did not run prettier on these. Extracted as a separate
commit so the subsequent CLDSRV-884 OTEL commits show only logic
changes, not formatting noise.

Issue: CLDSRV-884
@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Jun 3, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Comment thread package.json
Comment thread package.json
Comment thread package.json
Comment thread package.json Outdated
@claude
Copy link
Copy Markdown

claude Bot commented Jun 5, 2026

LGTM

Review by Claude Code

@delthas
Copy link
Copy Markdown
Contributor Author

delthas commented Jun 5, 2026

/approve

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Jun 5, 2026

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/9.4

The following branches will NOT be impacted:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1
  • development/9.2
  • development/9.3

This pull request does not target the following hotfix branch(es) so they
will be left untouched:

  • hotfix/7.4.3
  • hotfix/7.10.0
  • hotfix/7.6.0
  • hotfix/7.4.6
  • hotfix/7.10.3
  • hotfix/7.7.0
  • hotfix/7.2.0
  • hotfix/7.10.1
  • hotfix/7.4.2
  • hotfix/7.4.10
  • hotfix/9.0.7
  • hotfix/7.4.5
  • hotfix/7.10.15
  • hotfix/9.0.32
  • hotfix/7.4.4
  • hotfix/7.10.2
  • hotfix/7.10.27
  • hotfix/7.4.7
  • hotfix/7.10.4
  • hotfix/7.4.1
  • hotfix/7.70.73
  • hotfix/7.70.45
  • hotfix/7.9.0
  • hotfix/7.10.49
  • hotfix/7.10.30
  • hotfix/8.8.45
  • hotfix/7.10.28
  • hotfix/7.4.9
  • hotfix/7.70.51
  • hotfix/7.4.0
  • hotfix/7.10.8
  • hotfix/6.4.7
  • hotfix/7.8.0
  • hotfix/9.2.24
  • hotfix/7.4.8
  • hotfix/7.70.21
  • hotfix/7.70.11

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Copy Markdown
Contributor

bert-e commented Jun 5, 2026

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/9.4

The following branches have NOT changed:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1
  • development/9.2
  • development/9.3

Please check the status of the associated issue CLDSRV-884.

Goodbye delthas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants