Spark-only BerlinMOD benchmark harness consuming the canonical suite#23
Open
estebanzimanyi wants to merge 9 commits into
Open
Spark-only BerlinMOD benchmark harness consuming the canonical suite#23estebanzimanyi wants to merge 9 commits into
estebanzimanyi wants to merge 9 commits into
Conversation
b0bc798 to
4d7a5e8
Compare
This was referenced Jun 12, 2026
4d7a5e8 to
27beef9
Compare
The MobilitySpark base UDF library — temporal, geo, boxes, time and set surfaces plus the base infrastructure — bound to the canonical single-source functions.GeneratedFunctions surface (the MEOS-API / meos-idl.json codegen output), bundled as libs/JMEOS-1.4.jar regenerated against the ecosystem pin, with lib/libmeos.so built from the pin (CBUFFER/NPOINT/POSE/RGEO + H3). Every UDF binds the generated surface directly; the legacy hand-rolled functions.functions facade is retired. The pg_-prefixed PG-compat I/O (pg_interval_in/out, pg_timestamptz_in) is preserved to disambiguate the PostgreSQL built-ins of the same name. CI builds libmeos from the ecosystem pin on Linux/macOS (with H3, and an in-source build dir so pgtypes/postgres.h's relative ../../meos/include resolves); Windows is non-blocking. Full unit suite green (907/907). The th3index, sibling (cbuffer/npoint/pose/rgeo) and portable-operator families stack on this foundation as separate changes; the BerlinMOD benchmark builds on the full surface.
The temporal H3-index (th3index) family for MobilitySpark, stacked on the foundation library: the h3index/th3index UDFs (Th3IndexUDFs), the H3 cell prefilter (Th3IndexPrefilterUDFs) and the JNR bindings, registered in the session. th3index is backed by libmeos built with -DH3=ON (the standalone library wires the h3 object library at the current ecosystem pin); the UDFs bind the generated functions.GeneratedFunctions surface (th3index_*, h3index_in/out, the canonicalized parse/to_string -> in/out). No change to the library surface, so the unit suite is unchanged (907/907).
The four sibling temporal families for MobilitySpark, stacked on the th3index change: CbufferUDFs, NpointUDFs, PoseUDFs and RgeoUDFs, each binding the generated functions.GeneratedFunctions surface for its family and registered in the session. No change to the foundation surface, so the unit suite is unchanged (907/907).
The 29 canonical bare-name operator UDFs (PortableOperatorAliasUDFs) for MobilitySpark, stacked on the sibling families and registered last so the bare names are the authoritative spelling across the portable dialect. Each bare name reuses its operator's own backing on the generated functions.GeneratedFunctions surface. No change to the underlying surface, so the unit suite is unchanged (907/907).
09962c2 to
3c2223c
Compare
estebanzimanyi
added a commit
to estebanzimanyi/MobilitySpark
that referenced
this pull request
Jun 12, 2026
Folds the MobilityDB#23 canonical-suite update into the integration/benchmark evidence so evidence == the deliverable stack (canonical q*.sql, eContains/round, no Spark-local SQL variants or preprocessing).
GeneratedSpatioTemporalUDFs (emitted by tools/codegen_spark_udfs.py from the MEOS-API catalog) provides the runtime type-dispatching overlaps / stbox(geom,time) / timeSpan UDFs, registered LAST in create() so they are authoritative over the single-type hand registrations. The dispatch classifies each String arg by its MEOS type (text wire form for spans/stboxes/geometries, hex only for temporals) and routes to the catalog-determined backing -- closing the bench's overlaps/stbox gaps with generated, serialization-safe code (no hand UDFs, no MEOS-API growth).
873be2e to
ecdc2ae
Compare
Times the canonical BerlinMOD suite (vendored as the berlinmod/suite git submodule — the single source shared with MobilityDB and MobilityDuck) on Apache Spark. bench_mspark.sh reads every query from the submodule; there is no Spark-specific query variant and no SQL rewriting. MobilitySpark carries only its own benchmark: the cross-tool loaders and runners (load_mbdb/mduck.sql, run/bench_mbdb/mduck.sh, the 3-engine bench.sh orchestrator, MOBILITYDB-FINDINGS.md) are removed. Cross-engine comparison is assembled offline by merging each tool's results JSON (report.py / chart.py).
ecdc2ae to
9109d58
Compare
Replace the 4-UDF GeneratedSpatioTemporalUDFs placeholder with the full catalog-driven surface emitted by tools/codegen_spark_udfs.py: 2192 1:1 UDFs organized into one class per MEOS doxygen \@InGroup module (GeneratedUdfs_<group>), excluding meos_internal_*. registerAll() is already invoked from MobilitySparkSession, so the generated UDFs register alongside the hand layer; compiles green. This is the surface that will let the hand *UDFs.java be retired once it is proven a superset query-by-query.
Pick up the generator's dispatch pass (MobilityDB#28): adds GeneratedUdfs_portable_comparison with the 18 contract comparison bare names (everEq..everGe / alwaysEq..alwaysGe / tempEq..tempGe) wrapping the MEOS superclass *_temporal_temporal entrypoints. Compiles green; the bare names register alongside the hand PortableOperatorAliasUDFs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #22 (the canonical UDF library). Review #22 first — this PR's own change is the bench commit on top; its diff cleans up to bench-only once #22 merges to
main.Adds the Q1–Q17 query set with reference expecteds, the loader/runner scripts, the
BerlinMODBench/BerlinMODDemodrivers, and the 3-tier index framework (Spark column-store prefilter · th3index cells · PG GiST/SP-GiST) with the NxN cross-join mitigations.Demo/harness/data only — no change to the library surface, so the unit suite is unchanged (907/907).