Integration: BerlinMOD benchmark branch — 907/907 green on the b183b12 MEOS + th3index/h3 surface#16
Open
estebanzimanyi wants to merge 182 commits into
Open
Conversation
…trators for UDT and UDF.
Period implementation
Satria/poc
… UDTs using Meos Datatypes.
Meos datatype
Timestampset implementation
- trip_h3 called tgeompoint_to_th3index(), which is the internal MEOS C name, not a SQL function. The SQL converter is h3_latlng_to_cell, and its tgeompoint overload requires lon/lat, so reproject with transform(..., 4326) (Trip is stored in EPSG:3857). - Add the geomWKT text column selected by q11/q12/q15; the loader previously created only geom, so those queries errored on a missing column.
The portable query files referenced functions this MobilityDB build
exposes under different names, so the queries errored at run time:
- asHexWKB(tgeompoint) -> asHexEWKB (no plain-WKB output
for tgeompoint in this build)
- everEq{H3Index,Th3Index}Th3Index -> ever_eq
- geomToH3Cell(geom, r) -> geoToH3Cell(ST_Transform(geom,4326), r)
(geoToH3Cell needs lon/lat; query
points are EPSG:3857)
- minDistance(tgeompoint, tgeompoint)
-> ST_Distance(trajectory(t1), trajectory(t2))
(spatial minimum ignoring time,
distinct from the time-synchronous nad)
With these all 18 queries run on MobilityDB. The cross-platform aim is
to expose these as portable aliases in MEOS so the shared queries can
revert to the single portable dialect.
Committed snapshot of the honest MobilityDB-only run (all 18 queries verified to execute under ON_ERROR_STOP; MobilityDB 1.4.0 on PG18, 1620 trips, median of 3 runs). results/ is normally gitignored — this is a deliberate reference snapshot, drop it if you prefer #913 only. Note: report.py's static q05 label still reads "nearestApproachDistance"; the SQL uses ST_Distance(trajectory,...) (spatial-min).
query_regions.csv holds lon/lat (the exporter emits ST_Transform(geom, 4326)), but load_mbdb.sql parsed it with ST_GeomFromText(geom, 3857) — stamping degrees as metres, ~480 km from the trips. Every region query (q02/q13/q14/q16) therefore silently returned 0 rows (masked, before the harness fix, as phantom fast timings). - load_mbdb.sql: parse QueryRegions as 4326, then reproject to 3857. - q02.sql: geoToH3IndexSet needs lon/lat, so transform the (now 3857) region back to 4326 for the th3 prefilter, as q04 already does for points. Soundness verified against the exhaustive predicate (no stale oracle needed): q02 with the th3 prefilter = 139 rows = q02 with eIntersects alone; q04 likewise 43 = 43.
q02/q13/q14/q16 now execute real work. q16 still returns 0 rows by construction (the 10x10x10 subset has co-occurring pairs but none are always-disjoint), but it is now timed honestly rather than phantom-fast.
Consolidates the MobilityDB-only results (sf 0.005, MobilityDB 1.4.0 /
PG18) with the index-tier study. Key result: th3index is redundant on
MobilityDB — the prefilter operators (ever_eq / everIntersectsH3IndexSet)
are not in the th3 GiST operator class, so the trip_h3 index is never
used and native spatial indexes carry every query; and even the indexable
&& form is too loose (q06: th3 index 23x slower than a plain filter).
th3's value is expected on the index-less platforms (Duck/Spark), which
the cross-platform benchmark will measure next.
Includes correctness validation by soundness (prefilter == exhaustive),
methodology, and the tier raw timings (results/mbdb-tier{1,2,3}.json).
…11p) Migrate every UDF and test off the retired legacy functions.functions facade onto the single generated functions.GeneratedFunctions surface, regenerated against the ecosystem pin (JMEOS PR MobilityDB#19). Refresh the bundled JMEOS-1.4.jar and lib/libmeos.so (built from ecosystem-pin-2026-06-11p with -DH3=ON and the CBUFFER/NPOINT/POSE/RGEO families), so the th3index family is backed with no build-time special-casing. Adapt the call-sites to the regenerated API: - keep the pg_-prefixed PG-compat I/O (pg_interval_in/out, pg_timestamptz_in) that disambiguates the PostgreSQL built-ins of the same name; - map the canonicalized h3index_parse/to_string onto h3index_in/out; - pass the trailing count out-param to the *set_values accessors; - adopt the Pointer-returning *_value_at_timestamptz form (per-type reads; ttext via the wrapper's already-dereferenced text*). Reconcile five tests to the pin's canonical behavior: text_out is the raw PostgreSQL textout (unquoted), and tnumber_trend is defined for step interpolation. Build CI libmeos from the ecosystem pin on all three platforms (the pin now carries the MEOS_TZDATA_DIR option for Windows). Linux is the authoritative blocking job; macOS/Windows are best-effort non-blocking. Full unit suite green (907/907).
9805635 to
15eafe1
Compare
The trajectory UDF frees its Temporal* and GSERIALIZED*; the residual growth is the structural JNR-FFI char* micro-leak (String-returning hex-EWKB) plus glibc arena fragmentation, which is markedly higher on the CI runners than on a developer box (~3.6 KB/call / ~18 MB vs <10 MB locally). Raise the ceiling to 50 MB — still an order of magnitude below the real-Temporal*-leak signal (>=100 KB/call, >=500 MB) so genuine leaks still fail the test.
This was referenced Jun 12, 2026
Folds the MobilityDB#23 canonical-suite update into the integration/benchmark evidence so evidence == the deliverable stack (canonical q*.sql, eContains/round, no Spark-local SQL variants or preprocessing).
…rmat fixes Folds the deliverable stack update (generated GeneratedSpatioTemporalUDFs dispatch + create() wiring, valueAtTimestamp/expandSpace accessor fixes) into the integration/ benchmark evidence so evidence == the deliverable stack tip.
…rtability submodule Replace the 18 vendored q*.sql copies with the berlinmod/suite git submodule pointing at the canonical berlinmod-portability repo (one source shared by MobilityDB, MobilityDuck, MobilitySpark). bench_mspark.sh points berlinmod.sql.dir at the submodule; BerlinMODBench doc updated.
35fb0bc to
68c6522
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
The accumulated BerlinMOD benchmark integration branch — it consumes the
currently-open feature PRs so the full Spark UDF surface can be exercised and
benchmarked together against a single MEOS pin. It is an integration/visibility
branch; the individual PRs below remain the unit of review.
Consumes (open): #5, #7, #8, #9, #10, #11, #12, #13, #14, #15.
State
Builds and passes the full suite — 907 tests, 0 failures, 0 errors — against
a
JMEOS-1.4.jarregenerated from theestebanzimanyi/MobilityDBfix/meos-pg-symbol-collision-plus-h3MEOS API (th3index + static-geometryh3_geosurface), withlibmeosbuiltH3=ON.The h3 prefilter UDFs (
geo_to_h3index_set,ever_eq_anyof_h3indexset_th3index)are bound directly to libmeos via
H3IndexJnrBindings; the rest go through theregenerated
functionsfacade.Reconciliation carried here
The regenerated jar corrects several MEOS C-type resolutions the prior jar got
wrong:
H3Index/int64→long,bool→boolean(including subdir-headerfunctions), and an 8-byte
TimestampTz *out-param that was under-allocated to4 bytes — the cause of the stbox/tbox
tmin/tmaxandtimestampNerrors andof the inflated native-heap reading in the leak test. UDF call sites were
reconciled to the corrected signatures (interpType enum, direct h3 binding,
acontains_geo_tgeo/tspatial_transform_pipelinerenames, a no-exit MEOS errorhandler, and out-param/array marshalling). One test was corrected:
tnumber_trendon a step-interpolated
tintcorrectly returns null.