Skip to content

feat(index): report per-file indexing failures via skipped[] + logfile (Stage 2/B2-B4)#785

Merged
DeusData merged 1 commit into
mainfrom
feat/index-error-surfacing
Jul 2, 2026
Merged

feat(index): report per-file indexing failures via skipped[] + logfile (Stage 2/B2-B4)#785
DeusData merged 1 commit into
mainfrom
feat/index-error-surfacing

Conversation

@DeusData

@DeusData DeusData commented Jul 2, 2026

Copy link
Copy Markdown
Owner

feat(index): report per-file indexing failures via skipped[] + logfile (Stage 2/B2-B4)

index_repository silently dropped files that failed to index: CBMFileResult.has_error
was set but never read, oversized files (>100 MB) were dropped with no signal, and
read/extract failures only bumped a logged-only counter — a file that couldn't be
indexed just vanished from the graph.

Collect and surface them:

  • New retained {path, reason, phase} error list on struct cbm_pipeline (mirrors the
    excluded_dirs pattern) + accessor + a back-pointer on cbm_pipeline_ctx_t so both
    extraction paths can append. Wired in BOTH the sequential (pass_definitions) and
    parallel (per-worker, merged lock-free) paths — small repos take the sequential
    path, so wiring only one would leave the guard vacuous.
  • Feeds: read-fail, extract-fail, the newly-CONSUMED has_error (parse timeout /
    parse failed, with error_msg), and oversized. The cross_lsp phase is reserved for
    the crash supervisor (Track C) and not fed here (the cross-LSP passes are
    best-effort with no failure return; feeding the no-source case would be false
    positives).
  • MCP/CLI response gains top-level "skipped_count" (0 on clean) and, when >0, a
    capped "skipped":{files[<=50],count,truncated} + "logfile". Status stays "indexed"
    — a reported skip is the expected, handled outcome, not a degradation.
  • Per-run logfile (full uncapped list) written ONLY when >=1 file is skipped:
    $CBM_INDEX_LOG override else <cache_dir>/logs/-.log.
  • Generous env-configurable caps (src/foundation/limits.c): CBM_MAX_FILE_BYTES,
    default raised 100 MB -> 512 MiB; over-cap files are REPORTED (phase oversized) +
    WARNed, never silently dropped.

Reproduce-first: tests/test_index_resilience.c (gating) — an oversized file (cap
lowered via env) must appear in skipped[] with the 2 good files still indexed and a
logfile written; a clean run has skipped_count 0 and no logfile. Genuine guard:
no-op'ing the recorder makes the oversized file silently vanish (RED). Full suite
5750/0, no ASan/UBSan.

Part of the resilient-indexing effort (Track B surfacing layer). Refs #668.

…e (Stage 2/B2-B4)

index_repository silently dropped files that failed to index: CBMFileResult.has_error
was set but never read, oversized files (>100 MB) were dropped with no signal, and
read/extract failures only bumped a logged-only counter — a file that couldn't be
indexed just vanished from the graph.

Collect and surface them:
- New retained {path, reason, phase} error list on struct cbm_pipeline (mirrors the
  excluded_dirs pattern) + accessor + a back-pointer on cbm_pipeline_ctx_t so both
  extraction paths can append. Wired in BOTH the sequential (pass_definitions) and
  parallel (per-worker, merged lock-free) paths — small repos take the sequential
  path, so wiring only one would leave the guard vacuous.
- Feeds: read-fail, extract-fail, the newly-CONSUMED has_error (parse timeout /
  parse failed, with error_msg), and oversized. The cross_lsp phase is reserved for
  the crash supervisor (Track C) and not fed here (the cross-LSP passes are
  best-effort with no failure return; feeding the no-source case would be false
  positives).
- MCP/CLI response gains top-level "skipped_count" (0 on clean) and, when >0, a
  capped "skipped":{files[<=50],count,truncated} + "logfile". Status stays "indexed"
  — a reported skip is the expected, handled outcome, not a degradation.
- Per-run logfile (full uncapped list) written ONLY when >=1 file is skipped:
  $CBM_INDEX_LOG override else <cache_dir>/logs/<project>-<epoch>.log.
- Generous env-configurable caps (src/foundation/limits.c): CBM_MAX_FILE_BYTES,
  default raised 100 MB -> 512 MiB; over-cap files are REPORTED (phase oversized) +
  WARNed, never silently dropped.

Reproduce-first: tests/test_index_resilience.c (gating) — an oversized file (cap
lowered via env) must appear in skipped[] with the 2 good files still indexed and a
logfile written; a clean run has skipped_count 0 and no logfile. Genuine guard:
no-op'ing the recorder makes the oversized file silently vanish (RED). Full suite
5750/0, no ASan/UBSan.

Part of the resilient-indexing effort (Track B surfacing layer). Refs #668.

Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
@DeusData DeusData merged commit 6c44fc1 into main Jul 2, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant