Skip to content

fix: guard recursive array materialization#1040

Open
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:fix/lazy-nested-self-ref
Open

fix: guard recursive array materialization#1040
He-Pin wants to merge 1 commit into
databricks:masterfrom
He-Pin:fix/lazy-nested-self-ref

Conversation

@He-Pin

@He-Pin He-Pin commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Motivation

The previous materialization guard only tracked Val.Obj. A nested lazy array can re-enter the same Val.Arr during manifestation, especially through the fused ByteRenderer path, and sjsonnet would emit partial JSON until stack/memory failure. Recursive array manifestation should fail before any output grows, matching C++ jsonnet, go-jsonnet, and jrsonnet behavior.

Reproducer:

local a = 1;
local b = local a = 2; local c = local a = 3; [a, c]; [a, b];
[a, b]
Implementation Behavior
C++ jsonnet 0.22.0 exits 1, RUNTIME ERROR: max stack frames exceeded.
go-jsonnet 0.22.0 exits 1, RUNTIME ERROR: max stack frames exceeded.
jrsonnet 0.5.0-pre99 exits 1, stack overflow
sjsonnet master exits 1 but writes 3.8MB partial JSON first
this PR exits 1, stdout 0 bytes, sjsonnet.Error: Stackoverflow while materializing

Modification

  • Replace the object-only visitedObjects: IdentityHashMap[Val.Obj, Boolean] with visitedContainers: IdentityHashMap[Val, Boolean].
  • Guard ordinary arrays in recursive Materializer, stackless frame push/pop paths, and direct ByteRenderer materialization.
  • Keep RangeArr and ByteArr fast paths outside the guard — they are compact numeric leaf arrays, avoiding extra map traffic on hot paths.
  • Hoist sub visitor outside the while loop in ByteArr stackless fast path to avoid repeated subVisitor access and asInstanceOf cast per element.
  • Use try/finally cleanup on all three materialization paths to avoid retaining containers after errors.
  • Add golden tests for nested lazy array cycles and shared acyclic arrays.

Result

  • Cyclic arrays fail before partial output, matching reference implementations.
  • Shared acyclic arrays materialize normally (not treated as cycles).
  • Performance stays flat: upstream mean 261.5ms vs branch mean 261.8ms on array_copy_views.jsonnet.
  • Zero additional allocation on hot RangeArr/ByteArr fast paths.

References

  • C++ jsonnet core/vm.cpp checks max stack frames in newCall
  • Existing test: test_suite/error.array_recursive_manifest.jsonnet

@He-Pin He-Pin marked this pull request as draft June 24, 2026 20:30
@He-Pin He-Pin marked this pull request as ready for review June 25, 2026 03:57
Motivation:
Nested lazy arrays can re-enter the same array value during manifestation. The generic materializer and ByteRenderer previously only guarded object cycles, so recursive arrays could grow partial output until stack or memory failure.

Modification:
Track currently materialized arrays and objects in the existing identity guard, with try/finally cleanup on recursive, stackless, and ByteRenderer direct paths. Keep compact RangeArr and ByteArr fast paths outside the guard. Add golden regressions for nested lazy array cycles and shared acyclic arrays.

Result:
Recursive array manifestation now fails before writing partial JSON while shared non-cyclic arrays still materialize normally.
@He-Pin He-Pin force-pushed the fix/lazy-nested-self-ref branch from d9831b7 to 9909a95 Compare June 25, 2026 04:48
@He-Pin He-Pin marked this pull request as draft June 25, 2026 18:01
@He-Pin He-Pin marked this pull request as ready for review June 25, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant