bench: compare pyarrow and native arrow scans#7
Draft
abnobdoss wants to merge 1 commit into
Draft
Conversation
4ae56ff to
443819e
Compare
08c21db to
e3ce3e4
Compare
443819e to
abe8da0
Compare
e3ce3e4 to
33818f4
Compare
abe8da0 to
bb205f9
Compare
33818f4 to
ed3dee6
Compare
ed3dee6 to
4cc8409
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack position: Python PR after #6 (ABA-160).
Expands the manual benchmark harness for comparing PyArrow and opt-in Rust-backed Arrow scan paths.
What changed:
Validation:
uv run prek run --files dev/bench_arrow_scan.py--refresh --rows 10000 --files 20 --delete-rows 5000 --delete-files 10 --runs 1 --warmups 0 --table-prefix bench_native_scan_smoke --s3-endpoint http://localhost:19000uv run python dev/bench_arrow_scan.py --skip-provision --runs 1 --warmups 0 --s3-endpoint http://localhost:19000 --json-out /tmp/native_scan_bench_stress.json --markdown-out /tmp/native_scan_bench_stress.mdHeadline full-stress results:
Caveat: memory metrics are for the Python benchmark/client process only, not Spark/REST/MinIO containers.
Max RSSincludes interpreter/import peaks;Peak RSSis parent-sampled during the measured scan and is the more useful comparison column.