Skip to content

Spark-only BerlinMOD benchmark harness consuming the canonical suite#23

Open
estebanzimanyi wants to merge 9 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/berlinmod-benchmark
Open

Spark-only BerlinMOD benchmark harness consuming the canonical suite#23
estebanzimanyi wants to merge 9 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/berlinmod-benchmark

Conversation

@estebanzimanyi

Copy link
Copy Markdown
Member

Stacked on #22 (the canonical UDF library). Review #22 first — this PR's own change is the bench commit on top; its diff cleans up to bench-only once #22 merges to main.

Adds the Q1–Q17 query set with reference expecteds, the loader/runner scripts, the BerlinMODBench/BerlinMODDemo drivers, and the 3-tier index framework (Spark column-store prefilter · th3index cells · PG GiST/SP-GiST) with the NxN cross-join mitigations.

Demo/harness/data only — no change to the library surface, so the unit suite is unchanged (907/907).

The MobilitySpark base UDF library — temporal, geo, boxes, time and set
surfaces plus the base infrastructure — bound to the canonical single-source
functions.GeneratedFunctions surface (the MEOS-API / meos-idl.json codegen
output), bundled as libs/JMEOS-1.4.jar regenerated against the ecosystem pin,
with lib/libmeos.so built from the pin (CBUFFER/NPOINT/POSE/RGEO + H3).

Every UDF binds the generated surface directly; the legacy hand-rolled
functions.functions facade is retired. The pg_-prefixed PG-compat I/O
(pg_interval_in/out, pg_timestamptz_in) is preserved to disambiguate the
PostgreSQL built-ins of the same name.

CI builds libmeos from the ecosystem pin on Linux/macOS (with H3, and an
in-source build dir so pgtypes/postgres.h's relative ../../meos/include
resolves); Windows is non-blocking. Full unit suite green (907/907).

The th3index, sibling (cbuffer/npoint/pose/rgeo) and portable-operator families
stack on this foundation as separate changes; the BerlinMOD benchmark builds on
the full surface.
The temporal H3-index (th3index) family for MobilitySpark, stacked on the
foundation library: the h3index/th3index UDFs (Th3IndexUDFs), the H3 cell
prefilter (Th3IndexPrefilterUDFs) and the JNR bindings, registered in the
session.

th3index is backed by libmeos built with -DH3=ON (the standalone library wires
the h3 object library at the current ecosystem pin); the UDFs bind the
generated functions.GeneratedFunctions surface (th3index_*, h3index_in/out,
the canonicalized parse/to_string -> in/out). No change to the library surface,
so the unit suite is unchanged (907/907).
The four sibling temporal families for MobilitySpark, stacked on the th3index
change: CbufferUDFs, NpointUDFs, PoseUDFs and RgeoUDFs, each binding the
generated functions.GeneratedFunctions surface for its family and registered
in the session.

No change to the foundation surface, so the unit suite is unchanged (907/907).
The 29 canonical bare-name operator UDFs (PortableOperatorAliasUDFs) for
MobilitySpark, stacked on the sibling families and registered last so the
bare names are the authoritative spelling across the portable dialect.

Each bare name reuses its operator's own backing on the generated
functions.GeneratedFunctions surface. No change to the underlying surface, so
the unit suite is unchanged (907/907).
@estebanzimanyi estebanzimanyi force-pushed the feat/berlinmod-benchmark branch 2 times, most recently from 09962c2 to 3c2223c Compare June 12, 2026 20:26
estebanzimanyi added a commit to estebanzimanyi/MobilitySpark that referenced this pull request Jun 12, 2026
Folds the MobilityDB#23 canonical-suite update into the integration/benchmark evidence so
evidence == the deliverable stack (canonical q*.sql, eContains/round, no
Spark-local SQL variants or preprocessing).
GeneratedSpatioTemporalUDFs (emitted by tools/codegen_spark_udfs.py from the
MEOS-API catalog) provides the runtime type-dispatching overlaps / stbox(geom,time)
/ timeSpan UDFs, registered LAST in create() so they are authoritative over the
single-type hand registrations. The dispatch classifies each String arg by its MEOS
type (text wire form for spans/stboxes/geometries, hex only for temporals) and routes
to the catalog-determined backing -- closing the bench's overlaps/stbox gaps with
generated, serialization-safe code (no hand UDFs, no MEOS-API growth).
@estebanzimanyi estebanzimanyi force-pushed the feat/berlinmod-benchmark branch 3 times, most recently from 873be2e to ecdc2ae Compare June 13, 2026 03:38
Times the canonical BerlinMOD suite (vendored as the berlinmod/suite git
submodule — the single source shared with MobilityDB and MobilityDuck) on
Apache Spark. bench_mspark.sh reads every query from the submodule; there
is no Spark-specific query variant and no SQL rewriting.

MobilitySpark carries only its own benchmark: the cross-tool loaders and
runners (load_mbdb/mduck.sql, run/bench_mbdb/mduck.sh, the 3-engine
bench.sh orchestrator, MOBILITYDB-FINDINGS.md) are removed. Cross-engine
comparison is assembled offline by merging each tool's results JSON
(report.py / chart.py).
@estebanzimanyi estebanzimanyi force-pushed the feat/berlinmod-benchmark branch from ecdc2ae to 9109d58 Compare June 13, 2026 03:41
@estebanzimanyi estebanzimanyi changed the title BerlinMOD benchmark harness and 3-tier index framework Spark-only BerlinMOD benchmark harness consuming the canonical suite Jun 13, 2026
Replace the 4-UDF GeneratedSpatioTemporalUDFs placeholder with the full
catalog-driven surface emitted by tools/codegen_spark_udfs.py: 2192 1:1 UDFs
organized into one class per MEOS doxygen \@InGroup module
(GeneratedUdfs_<group>), excluding meos_internal_*. registerAll() is already
invoked from MobilitySparkSession, so the generated UDFs register alongside the
hand layer; compiles green. This is the surface that will let the hand
*UDFs.java be retired once it is proven a superset query-by-query.
Pick up the generator's dispatch pass (MobilityDB#28): adds GeneratedUdfs_portable_comparison
with the 18 contract comparison bare names (everEq..everGe / alwaysEq..alwaysGe /
tempEq..tempGe) wrapping the MEOS superclass *_temporal_temporal entrypoints.
Compiles green; the bare names register alongside the hand PortableOperatorAliasUDFs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant