feat(M1/T1): integrity hashing (ssdeep + mismatch + degraded-coverage alert) and crossbeam removal#190
Conversation
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Add the pure-Rust, MIT-licensed fuzzyhash 0.2.2 crate (ssdeep/CTPH) as an exact-pinned workspace dependency, exposed through a new daemoneye-lib fuzzy-hashes feature (on by default). ssdeep is a non-cryptographic similarity hash and is deliberately kept off the MultiAlgorithmHasher cryptographic path. Passes cargo deny (licenses/bans/advisories/sources ok). Advances R2 AC7 (ssdeep fuzzy hash recording). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
New integrity::fuzzy module computes ssdeep/CTPH digests on a dedicated non-cryptographic path (bytes + streaming-reader entry points) and compares two digests for 0-100 similarity. FuzzyConfig carries a named default threshold (DEFAULT_SSDEEP_SIMILARITY_THRESHOLD = 80) with a validate() bound rejecting [0] and [100] so a misconfiguration cannot silently disable or saturate the binary-change observation. Feature-gated on fuzzy-hashes with compiling stubs when disabled. Kept entirely off MultiAlgorithmHasher so the HashResult cryptographic-only invariant is untouched. Advances R2 AC7 (ssdeep fuzzy hash recording). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…contract Add three typed fields to the protobuf ProcessRecord (ssdeep_hash=15, on_disk_mismatch=16, ssdeep_degraded=17) so the agent reads them directly. On the procmond side the signals ride ProcessEvent.platform_metadata (via typed set_integrity_signals / ssdeep_hash / on_disk_mismatch / ssdeep_degraded helpers) rather than new struct fields, keeping the 180+ ProcessEvent literal construction sites stable; the IPC conversion lifts them onto the proto record. ssdeep_hash is decoupled from the (executable_hash, hash_algorithm) paired invariant. Unrelated proto-record literals default the new fields. Advances R2 AC6/AC7 and the degraded-coverage alert requirement. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
populate_hashes now computes an ssdeep fuzzy hash alongside SHA-256 for each authorized executable, streaming a clone of the same authorized fd on the blocking pool (no second open, no new TOCTOU window). The digest is stamped via ProcessEvent::set_ssdeep_signal. When SHA-256 succeeds but ssdeep fails the event is flagged ssdeep_degraded and a new HashPassStats.ssdeep_failures counter increments; a disabled fuzzy-hashes feature is NOT counted as degraded. ssdeep failure never fails the pass or enumeration. Replaces the monolithic set_integrity_signals with composable set_ssdeep_signal and set_on_disk_mismatch setters so the hash pass and the collector (U6) write independently without clobbering each other. Advances R2 AC7 and the degraded-coverage alert requirement. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The post-enumeration hash pass previously ran on whatever budget enumeration left over (collection_timeout minus elapsed, with a skip when exhausted), so a slow enumeration starved hashing and hashing competed with the R1 enumeration deadline. Enumeration already completes and produces its events before the hash pass, so the pass now runs on its own independent budget (collection_timeout in ProcessEventSource, CYCLE_BUDGET in the actor collector). Hashing latency no longer shortens or extends the enumeration deadline; cache reuse via the shared engine keeps steady-state cost low; inaccessible executables remain non-fatal. Adds ssdeep_failures to the completion telemetry. Advances R2 AC4 (async hashing outside the enumeration deadline). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The Linux collector now classifies the /proc/<pid>/exe symlink target: when the kernel appends the trailing " (deleted)" suffix (the backing executable was unlinked or replaced while the process runs), it strips the suffix from the stored path and records the on-disk-vs-running mismatch via ProcessEvent::set_on_disk_mismatch (R2 AC6). The match is anchored to the trailing token so a path legitimately containing the substring mid-string is not flagged. macOS/Windows default the flag to false (Linux is primary for this signal). Helper classify_exe_target is unit-tested. Advances R2 AC6 (on-disk-vs-running mismatch recorded as distinct metadata). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
procmond sets per-process integrity flags on the wire but cannot emit alerts (no AlertManager, no network). The new integrity_alerts module in daemoneye-agent reads the proto ProcessRecord flags before native conversion and raises alerts: ssdeep_degraded -> integrity.coverage.degraded (Medium), on_disk_mismatch -> integrity.disk_mismatch (High). The alerts are folded into the existing execute_rules alert stream so they share dedup, rate-limiting, and delivery. A record may raise both (distinct dedup keys); clean records raise none. Implements the operator degraded-coverage alert requirement and surfaces R2 AC6 mismatch as an alert. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Adds a session-scoped BinaryChangeTracker to the agent integrity bridge. It holds the last ssdeep digest per executable path; when a process's current ssdeep similarity to its previously recorded value falls below the configured threshold it raises an integrity.binary_change alert (Medium). The first observation for a path seeds the baseline only; a comparison failure or a missing ssdeep is skipped without a false alert. The "previously recorded value" is the agent's last in-memory value, reading no storage and adding no storage schema (within the ticket's no-new-storage-logic boundary). The threshold comes from the validated FuzzyConfig so a misconfiguration cannot silently disable it. Advances R2 AC7 (binary-change observation below a configurable threshold). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Extend the integrity criterion suite with ssdeep-only benchmarks across the representative sizes (1 KiB / 256 KiB / 4 MiB) and a combined SHA-256 + ssdeep benchmark matching what procmond's hash pass now computes per executable. Comparing these against the existing SHA-256-only baseline quantifies the ssdeep overhead for the R2 AC4 sustained-CPU budget. Baselines are recorded via the criterion CLI (cargo bench --baseline previous). Advances R2 AC4 (hashing impact baselines). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The R14 AC4 no-regression gate compares the daemoneye-eventbus in-process broker delivery path against a pre-migration baseline, but no benchmark measured that path: throughput.rs is publish-only, ipc_performance.rs is socket-based, and procmond's eventbus_benchmarks.rs measures the WAL connector. This new bench constructs a broker with no transport server bound and measures publish -> in-process subscriber receive (tokio mpsc fan-out) — the path the agent runs. Record the baseline with: cargo bench -p daemoneye-eventbus --bench broker_inprocess_latency -- --save-baseline pre-migration. Advances R14 AC4 (broker end-to-end no-regression baseline). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…ndency Gate decision (R14 AC4/AC7): the crossbeam HighPerformanceEventBus was export-only dead code with zero runtime consumers (instantiated only in its own unit tests), so its removal is pure dead-code elimination with no possible performance regression — the in-process delivery path the agent actually runs is the daemoneye-eventbus broker, now covered by the broker_inprocess_latency benchmark (U10). Deletes high_performance_event_bus.rs, drops its re-exports from collector-core, and removes the crossbeam dependency from both manifests (absent from Cargo.lock). Fixes a stale doc comment claiming LocalEventBus uses crossbeam (it uses tokio channels) and updates the AGENTS.md tech-stack entry. No dual-bus end state remains. Implements R14 AC7 (legacy crossbeam path removed). Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The EventSubscription example in the crate-level doctest predated the include_control field and failed to compile under cargo test --workspace. Signed-off-by line added by -s. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Address ce-code-review findings on the integrity feature:
- Alert over-suppression (adversarial + reliability): the AlertManager dedup key
is severity:rule_id:title, identical across processes for a shared integrity
rule, so distinct affected executables collapsed to one delivered alert per
window. build_alert now discriminates the dedup key by executable identity, so
distinct executables alert separately while the same executable still dedups.
- Unbounded BinaryChangeTracker growth (adversarial + reliability +
maintainability + correctness): observe() now evicts baselines for executables
no longer running, bounding the map to the running-process set.
- Inline {score} format arg; document the intentional proto->native integrity
drop. Add tests: per-target vs shared dedup keys, tracker eviction, and a
ProcessEvent->proto round-trip for the three integrity fields.
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 📃 Configuration Change RequirementsWonderful, this rule succeeded.Mergify configuration change
🟢 Enforce conventional commitWonderful, this rule succeeded.Require conventional commit format per https://www.conventionalcommits.org/en/v1.0.0/. Skipped for dependabot and dosubot.
🟢 Full CI must passWonderful, this rule succeeded.All CI checks must pass. Activates for non-bot authors, or dependabot when files exist outside .github/workflows/.
🟢 Do not merge outdated PRsWonderful, this rule succeeded.Make sure PRs are within 3 commits of the base branch before merging
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds feature-gated ssdeep fuzzy hashing across collection and hash pass, extends ProcessRecord proto with ssdeep/on_disk_mismatch/ssdeep_degraded, detects deleted /proc/.../exe targets, lifts integrity signals into wire records, emits agent-side integrity alerts, and removes the crossbeam high-performance event bus. ChangesIntegrity Signal Detection and Alerting
Test, Benchmark, and Dependency Updates
Sequence Diagram(s)sequenceDiagram
participant Collection as Linux Process Collection
participant HashPass as Hash Pass (populate_hashes)
participant Conversion as Proto Conversion
participant Agent as Agent Detection Loop
participant Alerts as Alert Pipeline
Collection->>Collection: read /proc/pid/exe\nclassify_exe_target()
Collection->>Collection: Create ProcessEvent\nset_on_disk_mismatch(bool)
HashPass->>HashPass: Clone authorized fd
HashPass->>HashPass: SHA-256 hash
HashPass->>HashPass: Blocking: ssdeep compute_ssdeep_best_effort()
HashPass->>Collection: Stamp event\nset_ssdeep_signal(hash, degraded)
Conversion->>Conversion: Extract ssdeep/mismatch\nfrom platform_metadata
Conversion->>Conversion: Populate ProtoRecord\nfields 15-17
Agent->>Agent: detect_integrity_alerts()\n(degraded, mismatch flags)
Agent->>Agent: BinaryChangeTracker.observe()\n(compare baseline)
Alerts->>Alerts: build_alert()\n(custom dedup key)
Alerts->>Alerts: Emit alerts: degraded, mismatch, binary_change
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
✨ Simplify code
Warning Review ran into problems🔥 ProblemsThese MCP integrations need to be re-authenticated in the Integrations settings: Linear, Notion Comment |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
daemoneye-lib/benches/integrity_operations.rs (1)
158-177: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winBenchmark claim still overstates production-path parity.
bench_sha256_plus_ssdeep_mediumcomputes SHA-256 fromtmp.path()but computes ssdeep from an in-memoryCursor<&[u8]>. That still omits the file/FD streaming path procmond uses for ssdeep, so the “end-to-end per executable” overhead comparison is biased. Either stream ssdeep from the temp file reader in-loop or rename/describe this as mixed-path benchmarking.Proposed minimal adjustment
fn bench_sha256_plus_ssdeep_medium(c: &mut Criterion) { let rt = Runtime::new().expect("tokio runtime"); let size = 256 * 1024; let tmp = make_file(size); - let bytes = make_bytes(size); let hasher = build_hasher(vec![HashAlgorithm::Sha256]); c.bench_function("integrity_sha256_plus_ssdeep_256kib", |b| { b.iter(|| { rt.block_on(async { black_box(hasher.compute(tmp.path()).await.expect("sha256 hash")); }); + let mut file = std::fs::File::open(tmp.path()).expect("open file for ssdeep"); black_box( - fuzzy::compute_ssdeep_from_reader(&mut Cursor::new(&bytes)).expect("ssdeep digest"), + fuzzy::compute_ssdeep_from_reader(&mut file).expect("ssdeep digest"), ); }); }); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@daemoneye-lib/benches/integrity_operations.rs` around lines 158 - 177, The benchmark bench_sha256_plus_ssdeep_medium currently computes SHA-256 from the temp file path (hasher.compute(tmp.path())) but computes ssdeep from an in-memory Cursor (&bytes), which misrepresents the production streaming path; change the ssdeep call to read from the same temp file reader instead (use tmp.path() and open a File/BufReader and pass that reader into fuzzy::compute_ssdeep_from_reader) inside the benchmark loop, reopening the file (or seeking back to start) each iteration so the streaming path is measured end-to-end and not the in-memory shortcut.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@daemoneye-lib/benches/integrity_operations.rs`:
- Around line 158-177: The benchmark bench_sha256_plus_ssdeep_medium currently
computes SHA-256 from the temp file path (hasher.compute(tmp.path())) but
computes ssdeep from an in-memory Cursor (&bytes), which misrepresents the
production streaming path; change the ssdeep call to read from the same temp
file reader instead (use tmp.path() and open a File/BufReader and pass that
reader into fuzzy::compute_ssdeep_from_reader) inside the benchmark loop,
reopening the file (or seeking back to start) each iteration so the streaming
path is measured end-to-end and not the in-memory shortcut.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Repository UI (inherited), Organization UI (inherited)
Review profile: ASSERTIVE
Plan: Pro
Run ID: d0b169a1-0369-4d29-8600-bc43d34e9fe6
⛔ Files ignored due to path filters (1)
daemoneye-eventbus/benches/broker_inprocess_latency.rsis excluded by none and included by none
📒 Files selected for processing (5)
AGENTS.mdcollector-core/src/event.rsdaemoneye-lib/benches/integrity_operations.rsdocs/src/architecture/system-architecture.mdprocmond/src/hash_pass.rs
…pwire The native ProcessRecord has no integrity-signal fields; ssdeep_hash / on_disk_mismatch / ssdeep_degraded live only on the protobuf record and are produced on the procmond ProcessEvent -> proto path. Both From conversions therefore drop them, and the agent integrity-alert bridge depends on reading the signals off the proto record *before* conversion. That correspondence is hand-maintained and the type system can't express it, so add a tripwire test asserting native -> proto defaults the three fields and a signal-carrying proto loses them after a native round-trip. A future edit that adds the fields to the native model (or wires them through) now fails this test and forces a deliberate decision instead of silently changing the lossy boundary. Surfaced by the type-design and test-coverage review passes on #190. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…meout Address Copilot review feedback on #190: - procmond/hash_pass.rs: the fd-clone call-site comment still claimed "not attempted (non-degraded)" semantics, but the code returns (None, true) — degraded coverage — when the authorized fd can't be cloned (ssdeep absent while SHA-256 succeeded). Update the comment to match the code. - daemoneye-eventbus broker_inprocess_latency bench: receiver.recv() was awaited unbounded, so a dropped event or a publish/subscribe regression would hang the bench indefinitely and wedge CI. Wrap it in a 5s tokio::time::timeout that fails fast with a clear message instead. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
…go-designer-skill, and trailofbits/skills Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
|
@Mergifyio queue |
Merge Queue Status
Waiting for
All conditions
|
| let Some(mut file) = fuzzy_file else { | ||
| return (None, true); | ||
| }; |
| //! It exists so the R14 AC4 no-regression gate has a concrete pre-migration | ||
| //! baseline to compare against before the legacy crossbeam | ||
| //! `HighPerformanceEventBus` is removed (R14 AC7). Record a baseline with: | ||
| //! | ||
| //! ```bash | ||
| //! cargo bench --package daemoneye-eventbus --bench broker_inprocess_latency \ | ||
| //! -- --save-baseline pre-migration | ||
| //! ``` |
Summary
Finishes the two in-flight M1 / ticket T1 foundation workstreams: executable integrity hashing (R2) and crossbeam removal (R14).
This is the foundation the ShadowHunt workflow builds on, so the privilege-separation, TOCTOU-safety, and typed-IPC decisions here are deliberate.
What changed
Integrity hashing (R2)
daemoneye-lib/src/integrity/fuzzy.rs) — deliberately kept offMultiAlgorithmHasherso theHashResult.hashescryptographic-only invariant holds (ssdeep is attacker-malleable, never an identity guarantee).None./proc/<pid>/exe" (deleted)").ssdeep_hash,on_disk_mismatch,ssdeep_degraded) threaded to the agent over a typed protobuf contract (ProcessRecord15/16/17); carried onProcessEvent.platform_metadatavia typed helpers to avoid breaking 180+ literal construction sites.daemoneye-agent/src/integrity_alerts.rs): degraded-coverage (Medium), disk-mismatch (High), binary-change (Medium, below a validated similarity threshold) with per-target dedup and a bounded change tracker.Crossbeam removal (R14)
HighPerformanceEventBusand dropped thecrossbeamdependency. The real in-process path is thedaemoneye-eventbusbroker, now covered by a new in-process broker latency benchmark (R14 AC4 baseline).Signal flow (new)
flowchart LR P[procmond hash pass] -- "clone authorized fd" --> S[SHA-256 + ssdeep] S -- "ssdeep / mismatch / degraded" --> M[ProcessEvent.platform_metadata] M -- "typed lift" --> W["proto ProcessRecord 15/16/17"] W -- IPC --> A[daemoneye-agent] A --> AL[integrity_alerts: degraded / mismatch / binary-change]Review checklist
Code quality & security
HashResult.hashesinvariantTesting
Test coverage
Extensive new tests, all green under
cargo test --workspace:fuzzy.rs(8 unit tests), integrity-signal helpers, hash-pass ssdeep stamping + stale-reset,classify_exe_target, the agent alert bridge +BinaryChangeTracker(12), and aProcessEvent → protoround-trip plus a native↔proto drop tripwire. Coverage CI job passes.Risk assessment
Note on scope
The T1 feature is the
feat(integrity):series + the crossbeam removal. This branch also carries a few incidental commits that landed here during the session and are unrelated to T1: a.mergify.ymlcleanup, adocs/solutions/learning doc (mise/uv CI fix), atessl.jsonskill-version bump, and an AGENTS.md rule-02 wording clarification. Flagging them so reviewers aren't surprised; they're low-risk and can be reviewed quickly or split out if preferred.AI usage disclosure
Per AI_POLICY.md: implemented with Claude Code (
Claude Opus 4.8 (1M Context)). Reviewed across an 8-personace-code-review, a 5-agentreview-prpass, and CodeRabbit/Copilot; all actionable findings were applied (per-target dedup, bounded tracker, stale-ssdeep reset, fd-clone degraded, proto tripwire, bench timeout). All changes build clean, passclippy -D warnings, and pass the full test suite.