WIP: Test.run intermittent <<loop>> under -threaded -N#156
Draft
omnibs wants to merge 2 commits into
Draft
Conversation
A minimal reproduction for an intermittent `<<loop>>` (NonTermination) crash from Test.run under -threaded with multiple capabilities (+RTS -N): a dozen ungrouped tests, each decoding a tiny document to a distinct type (a distinct decoder CAF), run in parallel by the test runner. Crashes in ~0.3% of runs at -N>=4 and never at -N1. Flag-gated behind `parallel-loop-bug` (off by default), so normal builds are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tried aeson eitherDecodeStrict' (trivial "{}", a rich nested record, and
20 distinct recursive types) — all 0/10000 at -N12, vs ~0.27% for the YAML
version. The trigger appears specific to the libyaml-backed YAML decode
path, not concurrent decoding in general.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ArthurJordao
approved these changes
Jun 4, 2026
Member
Author
|
This is just a repro script, not a fix! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WIP:
Test.runintermittent<<loop>>under-threaded -NWIP — this PR adds a minimal reproduction under
nri-prelude/scripts/parallel-loop-bugso we can work on a fix. No library change yet.
Summary
A test executable built with
Test.run, compiled-threadedand run with+RTS -N(multiple capabilities), intermittently crashes with an uncaught(
NonTermination) before printing a report — i.e. a CI flake (~0.2–0.3% of runson a 12-core box). It comes from the parallel execution of ungrouped tests
(
Test.Internal.runStrategy→Task.parallel→Async.forConcurrently).Test.serializemakes it disappear, which is the current workaround.Reproduction
See
scripts/parallel-loop-bug/(README +Main.hs). It's a dozen ungroupedtests, each decoding a tiny document to a distinct type. Build it flag-gated
and loop it on a multi-core box:
Observed 27/10000 (~0.27%) at
-N12; 0/10000 at-N1.Symptom
<<loop>>on stderr and no test report.runner wraps and reports as failures), so it originates in the
orchestration/forcing layer, not inside a test.
Observations (GHC 9.8.4, 12-core, looping the process)
<<loop>>rate-N1/-N2-N4-N8/-N12-N12-N12Expect.passtests,-N12"{}", rich nested record, AND 20 recursive types,-N12Async.forConcurrently(noTest.run),-N12Test.serialize,-N12Takeaways:
-N1/-N2; scales with capability count.Async.forConcurrentlyof the same decode worknever reproduced it; going through
Test.rundoes.and
Expect.passsuites never loop; ~12 distinct ones do.aesonequivalent didnot reproduce in any variant tried (trivial
"{}", a rich nested record, and20 distinct recursive types — all 0/10000), while YAML does. So the trigger
seems tied to what the
yamlpackage does (libyaml over FFI, withunsafePerformIO), not to concurrent decoding in general.Test.serializereliably avoids it.What we ruled out
-N1is clean across thousands of runs; areal self-referential thunk would loop regardless of
-N.<<loop>>under concurrent STM) — fixed in 8.2.1; this is 9.8.4.(see above); it's specific to the libyaml-backed YAML path. (We did not
exhaustively rule out aeson.)
Hypothesis
A threaded-RTS black-hole / deadlock-detector firing on concurrent forcing
of multiple distinct shared CAFs (per-type decoder dictionaries/thunks). When
several capabilities each enter a different CAF and then need one another's
(already black-holed) CAFs, an unlucky interleaving forms a transient
block-cycle the RTS reports as
NonTermination. Consistent with: identicaldecoders not tripping it, scaling with
-N, and being masked by profiling(a
-fprof-latebuild run with+RTS -xcdoes not reproduce — so noHaskell-level stack; a timing race, not a deterministic cycle).
Open questions / directions
Task.parallel? Should test parallelism beopt-in or capped, or
serializedocumented as the remedy for<<loop>>?runSingle(per-test tracing-spanMVar/IORef, theTask.timeoutracer) widen the window, vs. this being a pure GHC RTS issueto report upstream?
🤖 Generated with Claude Code