Summary
Since the merge of #667, fresh indexing attaches Java USAGE edges to the wrong source files, nondeterministically — every fresh index of the same unchanged repo produces a different set of incorrect edges. Queries then return "users" that contain no reference to the target symbol at all, while real users are missing. Not present in v0.8.1 or at the #528 merge commit; present from the #667 merge through current HEAD (dcf98dc). This is not in any released binary yet — flagging it now so it doesn't ship in the next release.
Repro (spring-petclinic, macOS arm64, source builds via scripts/build.sh)
git clone --depth 1 https://github.com/spring-projects/spring-petclinic.git
# ground truth: 7 files reference OwnerRepository besides its own definition
grep -rl "OwnerRepository" spring-petclinic/src --include="*.java"
Index the repo (stdio MCP index_repository, fresh CBM_CACHE_DIR each run), then:
MATCH (c {name: 'OwnerRepository'})<-[r:USAGE]-(m) RETURN m.file_path
HEAD (dcf98dc), 5 fresh indexes — 5 different wrong answers:
| run |
bogus "users" returned |
real users missing |
| 0 |
EntityUtils.java, PetValidatorTests.java |
ClinicServiceTests, OwnerControllerTests, PetControllerTests, VisitControllerTests |
| 1 |
PetType.java |
OwnerControllerTests, PetControllerTests |
| 2 |
PetValidator.java |
OwnerControllerTests, PetControllerTests |
| 3 |
EntityUtils.java, OwnerRepository.java, PetTypeFormatterTests.java |
ClinicServiceTests, OwnerControllerTests, PetControllerTests, VisitControllerTests |
| 4 |
PetTypeFormatter.java, PetValidatorTests.java |
OwnerControllerTests, PetControllerTests, VisitControllerTests |
None of the bogus files contain the string OwnerRepository.
Controls (same machine, same build script, same query): v0.8.1 tag and the #528 merge commit (be3e038) each return exactly the 7 real users with zero bogus entries, deterministically (2/2 runs each).
On a larger private Spring Boot service (~1.3k Java files, 142 @Service/@Component beans) the effect is severe: v0.8.1 resolves 100% of grep-verified bean users (757/757); HEAD drops to ~16% with the remainder misattributed.
Bisect
git bisect --first-parent between be3e038 (good) and dcf98dc (bad), each step = full build + 3 fresh-index probes:
Notes on the suspected area
#667 switches Java/Go module-QN derivation to directory-based (cbm_pipeline_fqn_module_dir; pu_module_is_dir / pp_module_is_dir / pxc_module_is_dir copies that the comments say MUST match cbm_lang_module_is_dir). With Java packages, many files share a directory-derived module QN, so if USAGE source resolution keys on module QN it can pick an arbitrary same-package file — consistent with the observed behavior (misattributed sources; varies run to run; still wrong, though less so, with CBM_WORKERS=1).
Environment: macOS 26.5 arm64 (Darwin 25.5.0), Apple clang, plain scripts/build.sh.
Summary
Since the merge of #667, fresh indexing attaches Java
USAGEedges to the wrong source files, nondeterministically — every fresh index of the same unchanged repo produces a different set of incorrect edges. Queries then return "users" that contain no reference to the target symbol at all, while real users are missing. Not present in v0.8.1 or at the #528 merge commit; present from the #667 merge through current HEAD (dcf98dc). This is not in any released binary yet — flagging it now so it doesn't ship in the next release.Repro (spring-petclinic, macOS arm64, source builds via
scripts/build.sh)Index the repo (stdio MCP
index_repository, freshCBM_CACHE_DIReach run), then:HEAD (dcf98dc), 5 fresh indexes — 5 different wrong answers:
None of the bogus files contain the string
OwnerRepository.Controls (same machine, same build script, same query): v0.8.1 tag and the #528 merge commit (be3e038) each return exactly the 7 real users with zero bogus entries, deterministically (2/2 runs each).
On a larger private Spring Boot service (~1.3k Java files, 142
@Service/@Componentbeans) the effect is severe: v0.8.1 resolves 100% of grep-verified bean users (757/757); HEAD drops to ~16% with the remainder misattributed.Bisect
git bisect --first-parentbetween be3e038 (good) and dcf98dc (bad), each step = full build + 3 fresh-index probes:Notes on the suspected area
#667 switches Java/Go module-QN derivation to directory-based (
cbm_pipeline_fqn_module_dir;pu_module_is_dir/pp_module_is_dir/pxc_module_is_dircopies that the comments say MUST matchcbm_lang_module_is_dir). With Java packages, many files share a directory-derived module QN, so if USAGE source resolution keys on module QN it can pick an arbitrary same-package file — consistent with the observed behavior (misattributed sources; varies run to run; still wrong, though less so, withCBM_WORKERS=1).Environment: macOS 26.5 arm64 (Darwin 25.5.0), Apple clang, plain
scripts/build.sh.