A Rust tool that converts Bazel Build Event Protocol (BEP) streams into OpenTelemetry traces, enabling observability of Bazel builds in Datadog, Jaeger, and any OTLP-compatible backend.
Conduit intercepts Bazel's build event stream — either as a live gRPC Build Event Service (BES) backend or from saved JSON files — and produces a structured OTel trace where:
- 1 trace = 1 Bazel invocation (trace ID derived from Bazel's invocation UUID)
- Root span covers
BuildStarted→BuildFinishedwith build metadata - Target spans represent each configured target with timing, output files, tags, and success/failure
- Action spans represent build actions (compile, link, etc.) nested under their target
- Test spans capture test attempts with status, timing, caching, and execution strategy
- Spawn spans (from execution log) provide process-level detail: command lines, I/O metrics, timing breakdowns, cache hit/miss
bazel.invocation (root)
├── target //pkg:lib (TargetConfigured → TargetCompleted)
│ ├── action CppCompile //pkg:lib
│ │ └── spawn CppCompile lib/foo.cc (from exec log)
│ └── test //pkg:lib_test
│ └── test attempt 1 (PASSED)
├── target //external:zlib (synthetic, from exec log)
│ └── spawn CppCompile zlib/adler32.c
├── fetches
│ └── fetch https://...
└── skipped targets
└── target //pkg:skipped (ANALYSIS_FAILURE)
┌─────────────────────────────┐
│ Bazel Build │
└──────────┬──────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼
BES gRPC stream JSON file (NDJSON)
(--bes_backend) (--build_event_json_file)
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ gRPC Server │ │ BEP Decoder │
│ (BES proto) │ │ (JSON parser) │
└───────┬────────┘ └───────┬────────┘
│ Proto-direct routing │ JSON routing
└───────────┬───────────────────┘
▼
┌──────────────┐
│ EventRouter │
│ (dispatch) │
└──────┬───────┘
│
┌──────────┼──────────┐
▼ ▼ ▼
BuildState OtelMapper ExecLog
(tracking) (spans) (enrichment)
│
▼
┌──────────────┐
│ OTLP Export │──▶ Datadog / Jaeger / etc.
└──────────────┘
| Module | Purpose |
|---|---|
bep/decoder.rs |
Parses NDJSON BEP files into BepJsonEvent structs |
bep/router.rs |
Dispatches BEP events to the mapper; dual-path (JSON + proto-direct) |
otel/mapper.rs |
Creates and manages OTel spans for the build trace |
otel/trace_context.rs |
UUID→TraceID conversion, TracerProvider/LoggerProvider init |
otel/attributes.rs |
~100 typed attribute key constants (bazel.* namespace) |
otel/redact.rs |
In-process scrubber for --client_env=NAME=VALUE style flags |
grpc/server.rs |
BES gRPC server implementation (PublishBuildEvent service) |
exec_log/tailer.rs |
Live tail-follows --execution_log_compact_file, decodes zstd, ships SpawnExecs to the mapper via mpsc |
exec_log/compact.rs |
Incremental decoder state (dedup table + SpawnExec reconstruction) shared by tailer and tests |
state/build_state.rs |
Build lifecycle tracking, action buffering, named set cache |
state/action_mode.rs |
Lightweight vs full action processing mode detection |
bazel build //rust/conduit:conduit# Start conduit
./bazel-bin/rust/conduit/conduit --serve --port 8080 --export otlp --otlp-endpoint http://localhost:4317
# Run Bazel with conduit as BES backend
bazel build //your:target \
--bes_backend=grpc://localhost:8080 \
--build_event_publish_all_actions# Record BEP during a build
bazel build //your:target --build_event_json_file=bep.ndjson
# Process offline
./bazel-bin/rust/conduit/conduit --input bep.ndjson --export otlp| Flag | Description | Default |
|---|---|---|
--input <FILE> |
Read BEP from NDJSON file | - |
--serve |
Start BES gRPC server | false |
--listen-addr <IP> |
Bind address for the BES gRPC server. Defaults to loopback; pass 0.0.0.0 only when external access is intentionally needed (conduit has no auth/TLS). |
127.0.0.1 |
--port <PORT> |
gRPC server port | 8080 |
--export <MODE> |
Export mode: none, stdout, otlp |
none |
--otlp-endpoint <URL> |
OTLP endpoint | http://localhost:4317 |
--log-level <LEVEL> |
Log level (trace/debug/info/warn/error) | info |
--no-redact |
Disable in-process scrubbing of --client_env=NAME=VALUE style flags |
off (scrubbing on) |
--redact-name-pattern <SUBSTR> |
Replace the default sensitive-name list (repeatable) | built-in defaults |
--exec-log-max-message-mib <MIB> |
Per-message cap on length-delimited entries in Bazel's --execution_log_compact_file. Prevents a malformed varint length prefix from OOM'ing the parser. |
64 |
--exec-log-max-decompressed-mib <MIB> |
Cap on total decompressed bytes pulled from a compact execution log's zstd frame. Prevents a small malicious frame from inflating into GiB of in-memory state (zstd zip-bomb). | 2048 |
Bazel surfaces environment variables and user-defined values on the command
line via flags like --client_env=NAME=VALUE, --action_env=, --test_env=,
--repo_env=, --host_action_env=, and --define=NAME=VALUE. Workspace
status entries produced by --workspace_status_command (STABLE_* keys)
routinely carry CI-injected tokens. Progress stdout/stderr can echo any of
the above. Without intervention all of these end up verbatim in the
bazel.command_line, bazel.explicit_cmd_line, bazel.startup_options,
bazel.action.command_line, bazel.spawn.command, bazel.workspace.*,
and bazel.progress.* span attributes — and from there into whatever
backend the OTLP exporter is wired to.
Conduit ships with an in-process scrubber (rust/conduit/src/otel/redact.rs)
that runs before any attribute is set on a span. Three scrubbing
surfaces are used:
- Argv tokens (
scrub_arg/scrub_args) — applied tocommand_line,explicit_cmd_line,startup_options, ActionExecuted command lines, and SpawnExec command_args. TheVALUEhalf of a recognized--flag=NAME=VALUEis replaced with***whenNAMEmatches. (name, value)pairs (scrub_value_by_name) — applied toworkspaceStatusentries. Replaces VALUE with***when the key name matches.- Free-form text (
scrub_text) — applied to progress stdout/stderr. Tokenises on whitespace, runs each token throughscrub_arg, preserves whitespace runs.
The pass-through form --client_env=NAME (no =VALUE, inherits from
the parent process env) is left untouched because no value is on the wire.
Default sensitive-name substrings: TOKEN, SECRET, PASSWORD, PASSWD,
CREDENTIAL, COOKIE, APIKEY, API_KEY, ACCESS_KEY, PRIVATE_KEY,
AUTH. The list is intentionally narrow (e.g. plain KEY is excluded
because it would match MONKEY). Override or extend it via
--redact-name-pattern (repeatable, supplied list fully replaces the
default). Disable entirely with --no-redact — only safe when the receiving
backend is fully trusted or has its own scrubbing layer.
# Default list
./conduit --serve --port 8080 --export otlp --otlp-endpoint http://localhost:4317
# Custom list (replaces default)
./conduit --serve \
--redact-name-pattern TOKEN --redact-name-pattern JIRA --redact-name-pattern AWS_
# Disabled
./conduit --serve --no-redactFor belt-and-braces protection, also configure the Datadog Agent's
apm_config.replace_tags so a future call-site that forgets to route through
the scrubber still cannot leak. Example datadog.yaml:
apm_config:
replace_tags:
- name: "bazel.command_line"
pattern: "--client_env=([A-Z0-9_]*?(TOKEN|SECRET|PASSWORD|KEY|CREDENTIAL)[A-Z0-9_]*)=[^ ]+"
repl: "--client_env=$1=***"
- name: "bazel.explicit_cmd_line"
pattern: "--client_env=([A-Z0-9_]*?(TOKEN|SECRET|PASSWORD|KEY|CREDENTIAL)[A-Z0-9_]*)=[^ ]+"
repl: "--client_env=$1=***"Equivalent functionality exists in the OpenTelemetry Collector's
redactionprocessor
when traces transit a collector hop.
| Mode | Trigger | Behavior |
|---|---|---|
| Lightweight | Default (no flag) | Only failed actions create spans |
| Full | --build_event_publish_all_actions |
Every action gets a span with accurate start/end timestamps |
When Bazel is run with --execution_log_compact_file=exec.compact, conduit
detects the flag in OptionsParsed, opens the file as it's being written,
and tail-follows it in a dedicated blocking thread for the rest of the
build. Each SpawnExec it decodes:
- Looks up its parent action span in conduit's cache by
(target_label, mnemonic, primary_output). - On the first matching spawn for an action, backfills a curated set of
attributes onto the action span (
bazel.action.spawn.runner,bazel.action.spawn.cache_hit, theSpawnMetrics.*_time_msbreakdown). Subsequent spawns for the same action (retries, dynamic-exec races) only bumpbazel.action.spawn.countand emit their own child span. - Always emits a
spawn {mnemonic}child span under the action span with the fullbazel.spawn.*attribute set (command, digest, cache flags, per-stage timings, I/O metrics).
If a spawn arrives before its ActionExecuted BEP event (the compact
log flushes every ~128 KiB of uncompressed input, asynchronously to BEP),
it's buffered briefly and flushed onto the action span the moment it
arrives. Spawns that never match anything by the end of the build are
grouped by (label, mnemonic, primary_output) and emitted under
synthesised parent action spans (bazel.target.synthetic = true); action
spans with no matching spawn pick up bazel.action.spawn.missing = true.
Format support. Only --execution_log_compact_file (and its
--experimental_execution_log_compact_file alias) is consumed.
--execution_log_binary_file and --execution_log_json_file lack the
dedup table that makes streaming feasible — conduit logs a warning and
records a bazel.exec_log.unsupported_format event on the root span when
it sees them.
Streaming semantics (best-effort live tail). Bazel writes the compact
log through AsynchronousMessageOutputStream → ZstdOutputStream → BufferedOutputStream → FileOutputStream. ZstdOutputStream only emits a
compressed block once it has accumulated 128 KiB of uncompressed proto,
and Bazel doesn't call flush() until close() at the end of the build.
In practice this means the tailer sees a chunk every ~200–600 spawns on
big builds, and the final <128 KiB of input is only visible after
close(). Conduit waits up to 2 s for that final chunk to drain through
finish() before sealing the trace.
Why not stream from bytestream://? Bazel uploads the spawn log to the
remote cache only after close() (post-build), so the bytestream
endpoint cannot be used as a live source.
| Source | Live streaming? | Dedup table? | Conduit support |
|---|---|---|---|
--execution_log_compact_file=<path> |
Yes (best-effort, 128 KiB chunks) | Yes | Live tailed, default path |
--experimental_execution_log_compact_file=<path> (alias) |
Yes | Yes | Live tailed |
--execution_log_binary_file=<path> |
No (would need full buffer) | No | Warned, ignored |
--execution_log_json_file=<path> |
No | No | Warned, ignored |
bytestream://…/spawn_log (remote cache) |
No (uploaded post-close()) |
Yes | Not consumed |
# Hermetic test (minimal BEP sample, no live Bazel)
bazel test //integration:trace_test
# Full scenario (builds conduit, records BEP + exec log, analyzes trace)
./integration/run_full_trace_scenario.shThis project was developed iteratively over ~13 sessions. The journey from initial prototype to the current state surfaced several non-obvious insights about Bazel's BEP, OpenTelemetry SDKs, and trace backend differences.
BEP only emits TargetConfigured/TargetCompleted/ActionCompleted events for directly requested targets. If you build //my:app and it depends on @zlib//:zlib, the zlib actions won't appear in BEP at all. The compact execution log (--execution_log_compact_file) is the only way to get full coverage of what Bazel actually spawned.
This is why conduit live-tails the compact log: spawns that don't match any BEP-visible action are buffered and emitted at finish() under synthesised parent action spans tagged bazel.target.synthetic = true.
Datadog uses the span name field for operation grouping in its flame graph, while Jaeger uses operationName (which maps to the OTel span's display name). When all spans had SpanKind::Internal, Datadog showed them all as "Internal" — flattening the hierarchy visually even though parent-child relationships were correct.
Datadog also uses temporal containment for flame graph nesting: a child span must start after and end before its parent. This required adding clamp_time_range() to ensure spawn child spans never escape their parent action's time bounds.
The start_time / start_time_millis in BuildStarted can hold the Bazel server start time (potentially weeks old on long-lived daemons), not the current invocation start. Conduit prefers the BES-level event_time (when the event was actually emitted) as a more reliable fallback for root span start time in gRPC mode.
Bazel sometimes sends start_time_nanos = 0 on action spans with end_time_nanos set to a duration value rather than an absolute timestamp. Without validation, this produces spans starting at Unix epoch (1970) with multi-day durations. Conduit rejects timestamps below a MIN_ABSOLUTE_NANOS threshold (~year 2001) to catch these.
The default OpenTelemetry BatchSpanProcessor queue size is 2048 spans. Large Bazel builds (thousands of actions + spawns) overflow this easily. Conduit configures both the span and log batch processors with queue 65,536, batch 512, scheduled delay 200 ms, and per-export timeout 2 s (see rust/conduit/src/otel/trace_context.rs).
Prost-generated code for google.protobuf.Duration may produce different Rust types depending on the proto path (prost_types::Duration vs a generated duration_proto::Duration). The solution is to access raw fields (d.seconds, d.nanos) rather than relying on typed parameters.
BEP events use @@repo//:target (double-at canonical form) while the execution log uses @repo//:target (single-at). Label normalization (trim_start_matches('@')) is essential for cross-format matching.
In gRPC serve mode, router.finish() emits OTel log records to the batch processor and returns. We deliberately do not call force_flush() from the route worker: both BatchSpanProcessor::force_flush and BatchLogProcessor::force_flush use a sync block_on against their worker channel, which deadlocks the Tokio runtime under load. Drain instead happens via the 200 ms scheduled_delay ticker; explicit shutdown_providers() is called from main only at process exit.
The execution log only records spawned processes (actual execve calls). Internal Bazel actions like FileWrite, TemplateExpand, and SymlinkTree are not spawns and won't appear. This means some BEP actions will never have matching exec log entries — by design. Action spans for those carry bazel.action.spawn.missing = true at the end of the build.
Bazel's compact exec log goes through ZstdOutputStream (zstd-jni) which buffers 128 KiB of uncompressed input before emitting a compressed block, and Bazel never calls flush(). The only way to live-tail is to keep the reader blocked on EOF rather than propagating Ok(0) to the zstd decoder, and accept that the trailing <128 KiB of input is only visible after close(). Conduit's tailer (exec_log/tailer.rs) sleeps 100 ms on EOF and resumes when more bytes appear; finish() budgets up to 2 s for the post-close() final chunk.
The children field on Progress events is for BEP DAG ordering (announcing which events will follow), not content correlation. The stderr/stdout text in a Progress event is not necessarily related to the child events it declares.
- Must use
bazel build, notcargo build: The project uses Bazel as its build system withrules_rustandrules_rs(crate_universe). Cargo is only present for IDE support (Cargo.toml). - Rust edition 2024 / rustc 1.93+: OpenTelemetry 0.28 crates require a recent Rust toolchain. The Bazel toolchain is pinned to 1.93.0.
- Protobuf compilation: BEP proto has deep import chains (
failure_details.proto→descriptor.proto). Proto targets userules_rust_prostwithrules_rsfor crate resolution.
- Co-location requirement: Exec log enrichment requires conduit to run on the same machine as Bazel (it tail-follows the local file path from
OptionsParsed; thebytestream://upload that goes to the remote cache only happens afterclose()and isn't a live source). - Non-standard OTLP ports: The Datadog Agent often uses
14317/14318instead of the OTel-default4317/4318. Configure--otlp-endpointaccordingly. - API key vs environment variable:
dd-authsetsDD_API_KEYas an env var, but the Datadog Agent reads itsapi_keyfrom its config file at startup. These are separate mechanisms.
- Action-level durations may be unreliable: Some BEP events report zero-length or negative durations due to clock skew. Target-level durations (derived from action buffering) are more reliable.
- Proto3 JSON defaults: In proto3 JSON,
success: trueis often omitted (it's the default). Conduit defaultssuccesstotruewhen absent fromTargetCompletedto avoid false negatives. - Progress stderr/stdout can be empty: Bazel sends many Progress events with no content. Conduit skips these.
All OTel span attributes follow the bazel.<component>.<field> naming convention. See rust/conduit/src/otel/attributes.rs for the full list (~100 constants), organized by:
- Trace-level:
bazel.invocation_id,bazel.command,bazel.exit_code,bazel.patterns, ... - Target spans:
bazel.target.label,bazel.target.kind,bazel.target.success,bazel.target.output_files, ... - Action spans:
bazel.action.mnemonic,bazel.action.exit_code,bazel.action.cached,bazel.action.runner, ... - Spawn spans:
bazel.spawn.runner,bazel.spawn.cache_hit,bazel.spawn.command,bazel.spawn.execution_wall_time_ms, ... - Test spans:
bazel.test.status,bazel.test.attempt,bazel.test.cached_locally,bazel.test.strategy, ... - Build metrics:
bazel.metrics.wall_time_ms,bazel.metrics.cpu_time_ms,bazel.metrics.cache_hits, ...
- No distributed / remote build support: Conduit is a local sidecar. It does not support remote BES endpoints or multi-machine builds.
- No phase spans: Loading, analysis, and execution phases are not modeled as separate spans (metrics for these are captured as attributes on the root span via
BuildMetrics). - No span links / DAG representation: BEP's DAG structure (secondary parents) is not represented as OTel span links. All relationships are parent-child.
- No sampling policies: All qualifying events produce spans. There is no configurable top-N-slowest or probabilistic sampling.
- Single invocation at a time: The gRPC server processes one build stream at a time (sequential invocations via
Arc<Mutex<EventRouter>>). - Exec log streaming is best-effort, not "live per spawn": Bazel flushes the compact log every ~128 KiB of uncompressed input (200–600 spawns on big builds), and the final <128 KiB is only visible after
close()atBuildCompleteEvent. Spawn enrichment lands on action spans in those chunk-sized bursts, with a final post-close()drain atfinish(). TestProgressandExecRequestare no-ops: These event types are received but not mapped to any OTel construct.
bazel_conduit/
├── MODULE.bazel # Bazel module (rules_rust, rules_rs, prost)
├── BUILD.bazel # Top-level build targets
├── BEP_TO_OTEL_DESIGN.md # Original design document
├── RUST_CONDUIT_PLAN.md # Development plan
├── NOTES.md # BEP event analysis notes
├── proto/
│ ├── build_event_stream/ # Bazel BEP proto (vendored)
│ ├── spawn/ # SpawnExec proto (execution log)
│ └── google/ # Google API protos (BES, well-known types)
├── rust/conduit/
│ ├── BUILD.bazel # rust_binary, rust_library, rust_test targets
│ ├── Cargo.toml / Cargo.lock # IDE support
│ ├── src/
│ │ ├── main.rs # CLI entry (clap)
│ │ ├── lib.rs # Crate root, proto re-exports
│ │ ├── bep/ # BEP decoder + event router
│ │ ├── otel/ # OTel mapper, trace context, attributes
│ │ ├── grpc/ # BES gRPC server
│ │ ├── exec_log/ # Compact exec log live-tail + enrichment
│ │ └── state/ # Build state tracking
│ └── tests/ # Unit tests (decoder, action_mode, router)
├── integration/ # Integration tests + full scenario script
├── toolchain/ # Prost toolchain config
└── docs/ # Additional documentation