Skip to content

Integration: BerlinMOD benchmark branch — 907/907 green on the b183b12 MEOS + th3index/h3 surface#16

Open
estebanzimanyi wants to merge 182 commits into
MobilityDB:mainfrom
estebanzimanyi:integration/berlinmod-bench
Open

Integration: BerlinMOD benchmark branch — 907/907 green on the b183b12 MEOS + th3index/h3 surface#16
estebanzimanyi wants to merge 182 commits into
MobilityDB:mainfrom
estebanzimanyi:integration/berlinmod-bench

Conversation

@estebanzimanyi

Copy link
Copy Markdown
Member

What this is

The accumulated BerlinMOD benchmark integration branch — it consumes the
currently-open feature PRs so the full Spark UDF surface can be exercised and
benchmarked together against a single MEOS pin. It is an integration/visibility
branch; the individual PRs below remain the unit of review.

Consumes (open): #5, #7, #8, #9, #10, #11, #12, #13, #14, #15.

State

Builds and passes the full suite — 907 tests, 0 failures, 0 errors — against
a JMEOS-1.4.jar regenerated from the estebanzimanyi/MobilityDB
fix/meos-pg-symbol-collision-plus-h3 MEOS API (th3index + static-geometry
h3_geo surface), with libmeos built H3=ON.

The h3 prefilter UDFs (geo_to_h3index_set, ever_eq_anyof_h3indexset_th3index)
are bound directly to libmeos via H3IndexJnrBindings; the rest go through the
regenerated functions facade.

Reconciliation carried here

The regenerated jar corrects several MEOS C-type resolutions the prior jar got
wrong: H3Index/int64long, boolboolean (including subdir-header
functions), and an 8-byte TimestampTz * out-param that was under-allocated to
4 bytes — the cause of the stbox/tbox tmin/tmax and timestampN errors and
of the inflated native-heap reading in the leak test. UDF call sites were
reconciled to the corrected signatures (interpType enum, direct h3 binding,
acontains_geo_tgeo/tspatial_transform_pipeline renames, a no-exit MEOS error
handler, and out-param/array marshalling). One test was corrected: tnumber_trend
on a step-interpolated tint correctly returns null.

Luis Alfredo Leon Villapun and others added 30 commits August 7, 2023 12:05
- trip_h3 called tgeompoint_to_th3index(), which is the internal MEOS C
  name, not a SQL function. The SQL converter is h3_latlng_to_cell, and
  its tgeompoint overload requires lon/lat, so reproject with
  transform(..., 4326) (Trip is stored in EPSG:3857).
- Add the geomWKT text column selected by q11/q12/q15; the loader
  previously created only geom, so those queries errored on a missing
  column.
The portable query files referenced functions this MobilityDB build
exposes under different names, so the queries errored at run time:

- asHexWKB(tgeompoint)              -> asHexEWKB (no plain-WKB output
                                       for tgeompoint in this build)
- everEq{H3Index,Th3Index}Th3Index -> ever_eq
- geomToH3Cell(geom, r)            -> geoToH3Cell(ST_Transform(geom,4326), r)
                                       (geoToH3Cell needs lon/lat; query
                                       points are EPSG:3857)
- minDistance(tgeompoint, tgeompoint)
                                   -> ST_Distance(trajectory(t1), trajectory(t2))
                                       (spatial minimum ignoring time,
                                       distinct from the time-synchronous nad)

With these all 18 queries run on MobilityDB. The cross-platform aim is
to expose these as portable aliases in MEOS so the shared queries can
revert to the single portable dialect.
Committed snapshot of the honest MobilityDB-only run (all 18 queries
verified to execute under ON_ERROR_STOP; MobilityDB 1.4.0 on PG18,
1620 trips, median of 3 runs). results/ is normally gitignored — this
is a deliberate reference snapshot, drop it if you prefer #913 only.
Note: report.py's static q05 label still reads "nearestApproachDistance";
the SQL uses ST_Distance(trajectory,...) (spatial-min).
query_regions.csv holds lon/lat (the exporter emits ST_Transform(geom,
4326)), but load_mbdb.sql parsed it with ST_GeomFromText(geom, 3857) —
stamping degrees as metres, ~480 km from the trips. Every region query
(q02/q13/q14/q16) therefore silently returned 0 rows (masked, before the
harness fix, as phantom fast timings).

- load_mbdb.sql: parse QueryRegions as 4326, then reproject to 3857.
- q02.sql: geoToH3IndexSet needs lon/lat, so transform the (now 3857)
  region back to 4326 for the th3 prefilter, as q04 already does for
  points.

Soundness verified against the exhaustive predicate (no stale oracle
needed): q02 with the th3 prefilter = 139 rows = q02 with eIntersects
alone; q04 likewise 43 = 43.
q02/q13/q14/q16 now execute real work. q16 still returns 0 rows by
construction (the 10x10x10 subset has co-occurring pairs but none are
always-disjoint), but it is now timed honestly rather than phantom-fast.
Consolidates the MobilityDB-only results (sf 0.005, MobilityDB 1.4.0 /
PG18) with the index-tier study. Key result: th3index is redundant on
MobilityDB — the prefilter operators (ever_eq / everIntersectsH3IndexSet)
are not in the th3 GiST operator class, so the trip_h3 index is never
used and native spatial indexes carry every query; and even the indexable
&& form is too loose (q06: th3 index 23x slower than a plain filter).
th3's value is expected on the index-less platforms (Duck/Spark), which
the cross-platform benchmark will measure next.

Includes correctness validation by soundness (prefilter == exhaustive),
methodology, and the tier raw timings (results/mbdb-tier{1,2,3}.json).
…11p)

Migrate every UDF and test off the retired legacy functions.functions facade
onto the single generated functions.GeneratedFunctions surface, regenerated
against the ecosystem pin (JMEOS PR MobilityDB#19). Refresh the bundled JMEOS-1.4.jar
and lib/libmeos.so (built from ecosystem-pin-2026-06-11p with -DH3=ON and the
CBUFFER/NPOINT/POSE/RGEO families), so the th3index family is backed with no
build-time special-casing.

Adapt the call-sites to the regenerated API:
- keep the pg_-prefixed PG-compat I/O (pg_interval_in/out, pg_timestamptz_in)
  that disambiguates the PostgreSQL built-ins of the same name;
- map the canonicalized h3index_parse/to_string onto h3index_in/out;
- pass the trailing count out-param to the *set_values accessors;
- adopt the Pointer-returning *_value_at_timestamptz form (per-type reads;
  ttext via the wrapper's already-dereferenced text*).

Reconcile five tests to the pin's canonical behavior: text_out is the raw
PostgreSQL textout (unquoted), and tnumber_trend is defined for step
interpolation.

Build CI libmeos from the ecosystem pin on all three platforms (the pin now
carries the MEOS_TZDATA_DIR option for Windows). Linux is the authoritative
blocking job; macOS/Windows are best-effort non-blocking.

Full unit suite green (907/907).
@estebanzimanyi estebanzimanyi force-pushed the integration/berlinmod-bench branch from 9805635 to 15eafe1 Compare June 12, 2026 10:03
The trajectory UDF frees its Temporal* and GSERIALIZED*; the residual growth
is the structural JNR-FFI char* micro-leak (String-returning hex-EWKB) plus
glibc arena fragmentation, which is markedly higher on the CI runners than on a
developer box (~3.6 KB/call / ~18 MB vs <10 MB locally). Raise the ceiling to
50 MB — still an order of magnitude below the real-Temporal*-leak signal
(>=100 KB/call, >=500 MB) so genuine leaks still fail the test.
Folds the MobilityDB#23 canonical-suite update into the integration/benchmark evidence so
evidence == the deliverable stack (canonical q*.sql, eContains/round, no
Spark-local SQL variants or preprocessing).
…rmat fixes

Folds the deliverable stack update (generated GeneratedSpatioTemporalUDFs dispatch +
create() wiring, valueAtTimestamp/expandSpace accessor fixes) into the integration/
benchmark evidence so evidence == the deliverable stack tip.
…rtability submodule

Replace the 18 vendored q*.sql copies with the berlinmod/suite git
submodule pointing at the canonical berlinmod-portability repo (one source
shared by MobilityDB, MobilityDuck, MobilitySpark). bench_mspark.sh points
berlinmod.sql.dir at the submodule; BerlinMODBench doc updated.
@estebanzimanyi estebanzimanyi force-pushed the integration/berlinmod-bench branch from 35fb0bc to 68c6522 Compare June 13, 2026 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants