Skip to content

regen: regenerate the MEOS facade against the consolidated MobilityDB surface#19

Closed
estebanzimanyi wants to merge 20 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/regen-extended-types-meos-idl
Closed

regen: regenerate the MEOS facade against the consolidated MobilityDB surface#19
estebanzimanyi wants to merge 20 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/regen-extended-types-meos-idl

Conversation

@estebanzimanyi

@estebanzimanyi estebanzimanyi commented May 21, 2026

Copy link
Copy Markdown
Member

Bump codegen/input/meos-idl.json to the MEOS-API IDL generated from the consolidated MobilityDB headers and regenerate functions.GeneratedFunctions. The surface covers the streaming-consumer needs without gaps: the temporal multiplication binds as mul_* (including the tbigint family), the set-set spatial distance functions mindistance_tgeo_tgeo and tgeoarr_tgeoarr_mindist are exposed, the circular-buffer and network-point MF-JSON readers tcbuffer_from_mfjson and tnpoint_from_mfjson are exposed, and the rigid-geometry accessors bind as trgeo_*. 2743 functions; the facade compiles and resolves against a libmeos built from the same surface.

Source IDL regenerated by MEOS-API run.py from the MobilityDB
accumulate/parity-1.4 headers (@3764e6894) — the pre-merge parity target,
which carries the trgeo_* -> trgeometry_* user-API rename that master does
not yet have. 4068 functions.

This lands the trgeometry I/O + accessor surface the prior IDL missed:
trgeometry_in (constructor), trgeometry_instant_n, trgeometry_instants, and
the renamed trgeometry_* accessors/relations (the old abbreviated trgeo_*
public names are gone from libmeos, so the prior facade called renamed-away
symbols). GeneratedFunctions regenerated from it (jmeos-core compiles clean;
the legacy functions.functions surface the tests use is untouched, 0 test
refs to GeneratedFunctions).

Unblocks IDL-driven consumers (e.g. the streaming-parity Flink/Kafka facade)
to build a trgeometry sample and exercise the ~66 trgeo operators.
@estebanzimanyi estebanzimanyi force-pushed the feat/regen-extended-types-meos-idl branch 2 times, most recently from f9d7311 to b0079af Compare May 29, 2026 13:45
@estebanzimanyi estebanzimanyi changed the title regen: bump meos-idl.json to MEOS-API + regenerate extended types (supersedes #15) regen: regenerate the MEOS facade against upstream MobilityDB master May 29, 2026
@estebanzimanyi estebanzimanyi force-pushed the feat/regen-extended-types-meos-idl branch from b0079af to 3e17ec6 Compare May 29, 2026 15:17
@estebanzimanyi estebanzimanyi changed the title regen: regenerate the MEOS facade against upstream MobilityDB master regen: regenerate the MEOS facade against the consolidated MobilityDB surface May 29, 2026
@estebanzimanyi estebanzimanyi force-pushed the feat/regen-extended-types-meos-idl branch 8 times, most recently from 9a353f1 to 4eb28ed Compare May 29, 2026 17:09
… tjsonb family + recovered types

Regenerate the meos-idl.json and functions.GeneratedFunctions against
ecosystem-pin-2026-06-11f (8a3a6db64): the base json/jsonb/jsonpath API is now
public in meos_json.h (IDL 137 -> 213 json fns), plus the tjsonb temporal type.
jsonb_to_text recovers to text* (was implicit-int). jsonb_in/out + tjsonb
round-trip through the binding.
MEOS keeps process-global state — meos_initialize cannot be re-run after a
meos_finalize in the same JVM. A fresh fork per test class keeps the native
MEOS lifecycle clean.
…t -> mul)

First scoped step of wiping the dual facade: route the five tnumber multiply
calls through the generated functions.GeneratedFunctions (mul_* — the
normalized name) instead of the hand-rolled legacy functions.functions
(mult_*). One family at a time; the legacy import stays until the file is fully
migrated.
Wipe step 2: route the 29 type/collection files whose every functions.functions
call has an identical-signature counterpart in the generated
functions.GeneratedFunctions through the generated facade, and drop their legacy
import. Mechanical 1:1 name+signature repoint — no behaviour change. The files
with signature-divergent calls (value_at_timestamptz / *set_values / spanset_spans
families) and rename families are migrated in follow-up scoped commits.
…f the legacy facade

Wipe step 3: IntSet/FloatSet/SpanSet/IntSpanSet/FloatSpanSet onto the generated
facade. intset_values / floatset_values / spanset_spans gained a trailing
Pointer-count out-param in the generated signature; the OO callers read the
length from the separate num_elements()/num_spans(), so they pass a throwaway
4-byte count buffer and ignore it. Array result unchanged; all five files now
fully off functions.functions.
…(pg_date_* -> date_*)

Wipe step 4: route datespan + datespanset through the generated facade. The
generated date I/O drops the legacy pg_ prefix with identical signatures
(pg_date_in -> date_in, pg_date_out -> date_out).
… date_*, dateset_values count)

Wipe step 5: route dateset through the generated facade — pg_date_in/out ->
date_in/out (identical sigs), and dateset_values gained a trailing Pointer-count
out-param (length comes from the separate num_elements(), so pass a throwaway
count buffer).
…ix value_at

Wipe step 6: route TInt/TFloat/TBool through the generated facade. value_at
reshaped to the generated *_value_at_timestamptz (manages the out-param
internally, returns a Pointer to the value or null). Fixes three latent bugs the
hand-rolled facade hid (value_at was untested): the value sits at offset 0 (was
read at offset Integer.BYTES=4); tfloat values are doubles -> read getDouble and
cast (was getFloat -> always 0.0); tbool values are 1 byte -> getByte (was getInt
-> out-of-bounds). Now null-safe: throws on a timestamp where this has no value
(was undefined/garbage). Verified 5/2.5/true via smoke.
…t_to_cstring/interval_make now public

11g exports the base PG-compat conversion helpers in postgres_ext_defs so the
generator catalogs them (resolves the legacy-facade-wipe helper relay). IDL
+3 fns.
…cade

Wipe step 7: TextSet (text2cstring -> text_to_cstring) and ConversionUtils
(interval_make now public in 11g; pg_timestamptz_in/out -> timestamptz_in/out,
pg_interval_out -> interval_out) onto the generated facade.
…+ value_at)

Wipe step 8: TText onto the generated facade. cstring2text/text2cstring ->
cstring_to_text/text_to_cstring (now public in 11g). value_at reshaped: the
generated ttext_value_at_timestamptz returns the text* directly (or null), so
read it via text_to_cstring; null-safe (throws on no-value, was the offset-8
garbage read). Verified "hello" via smoke.
Wipe step 9: tstzspan (adjacent_period_timestamp -> adjacent_span_timestamptz,
pg_timestamptz_in -> timestamptz_in) and tstzset (timestampset_out ->
tstzset_out; tstzset_values gained a count out-param) onto the generated facade.
… public)

Wipe step 10: STBox onto the generated facade. gserialized_in -> geom_in
(identical sig); geo_expand_spatial(gs, d) -> stbox_expand_space(geo_to_stbox(gs),
d) (the public composition). 77 STBox tests green.
Functions that return a struct larger than 16 bytes by value (the seven
*Split returns plus MvtGeom) use the SysV/AArch64 sret calling convention:
the caller allocates the struct and passes a hidden pointer as an implicit
first argument; the callee fills it and returns it. The emitter previously
collapsed such a return to a bare Pointer, so jnr-ffi mis-read the return
register and the struct fields (notably count) came back as garbage.

Parse the IDL "structs" section, compute each struct's size, and for a
by-value return larger than 16 bytes emit a hidden leading Pointer _sret
parameter in the interface, allocate it in the wrapper, and return the
filled buffer. Register-returned structs (<=16B) are logged, not silently
mis-bound. Regenerates GeneratedFunctions with the new bindings.
Repoint value_split / value_time_split / time_split / space_split /
space_time_split (TNumber, Temporal, TPoint) onto the generated
GeneratedFunctions split wrappers, which now return the *Split struct via
the sret convention. The methods read fragments at offset 0 and count at
the struct's count offset (16 for the 3-field splits, 24 for the 4-field
time splits) instead of the stale pre-735f out-parameter signature.

Also fixes defects surfaced while migrating:
- duration/start were parsed in an inverted branch so the duration was
  dropped (or null-dereferenced) on the default-start path; parse the
  duration unconditionally and default only the start.
- timedelta_to_interval passed cumulative units (toHours/toMinutes/
  toSeconds give the whole span in each unit) to interval_make, which then
  re-added them on top of the days. It now decomposes per field and parses
  a textual interval with interval_in, sidestepping a jnr-ffi quirk that
  mis-passes interval_make's trailing double after its six int arguments.

Verified end to end on both the Duration and the String duration paths:
all five split methods return the correct fragment counts through the OO API.
@estebanzimanyi estebanzimanyi force-pushed the feat/regen-extended-types-meos-idl branch from 4eb28ed to 4a6e5c1 Compare June 12, 2026 04:25
Repoint the last three main-code classes from the hand-rolled functions.functions
facade onto the generated GeneratedFunctions surface, completing the main-code
side of the dual-facade wipe. No main-code class imports functions.functions now.

Renames and reshapes applied (verified against the generated signatures):
- pg_timestamptz_in/pg_interval_in -> timestamptz_in/interval_in (identical sigs).
- the temporal spatial-relationship calls (tcontains/tdisjoint/tdwithin/
  tintersects/ttouches) drop the trailing restr,atvalue booleans the current
  MEOS signatures no longer take (dwithin keeps its distance argument).
- value_at_timestamp uses the generated bool+out-param wrapper, which returns the
  GSERIALIZED* directly, instead of reading the out buffer at the wrong offset.

Defects surfaced and fixed while migrating (all confirmed via smoke):
- count out-parameters were read at offset 4 (getInt(Integer.BYTES)) instead of 0
  in values/make_simple/stboxes, yielding out-of-bounds garbage counts.
- Memory.allocate(Runtime.getRuntime(runtime), n) threw ClassCastException
  (Runtime.getRuntime expects a library proxy, not a Runtime); pass runtime.

Smoke: values=3, make_simple=1, stboxes=2, value_at=POINT(5 5). Type suites
TGeomPoint/TGeogPoint/TInt/TFloat/TBool all green (624 tests, 0 fail/0 err); the
only residual is the pre-existing varstr_cmp ttext_in crash in TTextTest.
JMEOS bootstraps MEOS with meos_initialize_timezone + meos_initialize_error_handler
but never meos_initialize_collation(). Text comparison goes through varstr_cmp,
which dereferences the (uninitialized) collation and segfaults; integer, float and
geometry temporals never compare text, so only the text paths crashed. This is the
long-standing ttext_in -> varstr_cmp SIGSEGV that took out TTextTest/TextSetTest and
the error-branch classes in the full suite — a binding bootstrap gap, not a MEOS bug
(raw jnr confirms: timezone-only crashes, meos_initialize() or timezone+collation
work; pure C is fine).

Initialize the collation alongside the existing init. For classes that build text
objects in instance-field initializers (TextSetTest), the init goes in a static
block so it runs at class load, before the fields are constructed. The collation
call uses GeneratedFunctions because the legacy facade has no static wrapper for it.

Full suite now fully green for the first time: 1735 tests, 0 failures, 0 errors,
0 native crashes (was 1625 passing with two classes core-dumping).
Repoint every test from functions.functions to the generated GeneratedFunctions
surface (the calls are same-name same-signature, so this is a mechanical repoint)
and remove the hand-rolled functions.java. The dual-facade irregularity is gone:
the whole library — main code and tests — now uses the single generated facade.
The other functions-package classes (GeneratedFunctions, the Meos* error types,
error_handler/error_handler_fn) are unaffected.

Full suite after deletion: 1735 tests, 0 failures, 0 errors, 0 native crashes.
@estebanzimanyi estebanzimanyi force-pushed the feat/regen-extended-types-meos-idl branch 2 times, most recently from d66835c to f82d9aa Compare June 12, 2026 09:22
Track the pin fast-forward train to its tip (a816eec9b). Purely additive over 11n
(+5 functions, no removals, no signature changes): the base text-case helpers
text_upper / text_lower / text_initcap, meos_strtof, and the borrowed-pointer
accessor tsequenceset_value_n_p. (11o/11p in between were surface-neutral —
vendored cppcheck + a Windows tzdata cmake option.) Rebuild libmeos with -DH3=ON
(70 th3index exports), regenerate the IDL via MEOS-API (4389 functions), and
regenerate GeneratedFunctions.

Carries the full delta over the wipe: H3/th3index, text_in/out, the case helpers,
pg_interval/pg_timestamptz, and the uint64 hash_extended fix. sret + collation
preserved.

Verified: jmeos-core compiles; full suite green (1735 tests, 0 failures, 0
crashes); text_upper("hello")="HELLO" through JMEOS.
@estebanzimanyi

Copy link
Copy Markdown
Member Author

Superseded by the canonical build-time-generation stack #22#26 (family build flags → trgeometry C API → pin bump + org.mobilitydb.meos facade → set-set join → Spark Connect registrar), now advanced to the pin-12l catalog in #27. That line is the agreed architecture (build-time GeneratedFunctions + auto-derived facade, no hand map); this divergent regen line is retired to keep the consolidate equivalent to the deliverable stack. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant