Add a small streaming runbook and groundtruth to test_data#1127
Add a small streaming runbook and groundtruth to test_data#1127magdalendobson wants to merge 12 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a small “streaming” runbook + corresponding groundtruth files to the existing test_data/disk_index_search dataset, and updates the diskann-benchmark dynamic graph-index example to use the in-repo test data instead of external Big ANN Benchmarks paths. This makes it possible to run small, self-contained dynamic/streaming benchmark runs that stay aligned with future code changes.
Changes:
- Added a streaming runbook YAML under
test_data/disk_index_search/. - Added per-step groundtruth artifacts under
test_data/disk_index_search/example_runbook_gt/. - Updated
diskann-benchmark/example/graph-index-dynamic.jsonto use the in-repo SIFT-small-256 slice + new runbook/GT directory.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| test_data/disk_index_search/example_runbook.yaml | Adds the streaming runbook (Git LFS-tracked) for the small SIFT slice. |
| test_data/disk_index_search/example_runbook_gt/step2.gt10 | Adds runbook step groundtruth (Git LFS-tracked). |
| test_data/disk_index_search/example_runbook_gt/step4.gt10 | Adds runbook step groundtruth (Git LFS-tracked). |
| test_data/disk_index_search/example_runbook_gt/step6.gt10 | Adds runbook step groundtruth (Git LFS-tracked). |
| test_data/disk_index_search/example_runbook_gt/step8.gt10 | Adds runbook step groundtruth (Git LFS-tracked). |
| diskann-benchmark/example/graph-index-dynamic.json | Switches the dynamic example to test_data/disk_index_search and wires it to the new runbook + GT directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1127 +/- ##
==========================================
- Coverage 88.87% 88.87% -0.01%
==========================================
Files 485 485
Lines 92112 92112
==========================================
- Hits 81868 81865 -3
- Misses 10244 10247 +3
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
| "search_directories": [ | ||
| "../big-ann-benchmarks/data/MSTuringANNS", | ||
| "../big-ann-benchmarks/neurips23/runbooks" | ||
| "test_data/disk_index_search" |
There was a problem hiding this comment.
Care to wire this up to the integration tests to prevent regression?
Currently we don't have a way to benchmark streaming algorithms using the existing test data. This PR adds a streaming runbook and groundtruth for the 256-point slice of sift that already exists in
test_data. It also updates the example dynamic index indiskann-benchmarkto use these files, and to be able to run correctly. This will help the existing and future dynamic benchmarks stay in sync with any changes, and allow us to run small tests.