[codex] add Windows file locking fallback#9
Conversation
Da7-Tech
left a comment
There was a problem hiding this comment.
Thanks for taking this on — and for the red-test-first validation notes, that's exactly the workflow this repo asks for. The direction is right and the fake-import regression test is a clean way to exercise the branch from POSIX. Two things need fixing before this can merge:
1. LK_LOCK doesn't block like flock does — a contended save now crashes on Windows (the main issue).
msvcrt.locking(fd, LK_LOCK, 1) is not an indefinite blocking lock: the CRT retries once per second, 10 times, then raises OSError. In this diff that OSError propagates out of _save() (the except ImportError around it doesn't catch it), so two agent processes saving concurrently — precisely the scenario the lock exists for — turn into a hard failure of remember/dream after ~10s, where POSIX flock would simply wait. The fallback as written trades "no lock" for "crash under contention", which is a regression in the one case that matters.
The parity fix is small — wrap the acquire in a retry loop so it blocks indefinitely like flock:
while True:
try:
msvcrt.locking(lockf.fileno(), msvcrt.LK_LOCK, 1)
break
except OSError:
continue # LK_LOCK already slept ~10s; keep waiting like flock2. Please add a regression test for that contention path.
Your FakeMsvcrt is already the right harness: make locking raise OSError on the first LK_LOCK call and succeed on the second, and assert the save completes (and the graph file is written) instead of raising. That's the test that fails on the current diff and passes with the retry loop.
Two smaller notes, no action needed:
windows-latestis already in the CI matrix onmain, so once this lands the realmsvcrtpath (not just the fake) gets exercised on every push — no workflow changes needed in this PR.mainmoved to v5.4.2 today (a precision-review release that touches_save's merge block and its tests). Please rebase; the conflict, if any, should be trivial.
The seek-to-0 lock/unlock symmetry, the lock_backend tuple, and keeping the read-merge-write path intact all look correct to me. With the retry loop + test this is mergeable. Thanks again!
LK_LOCK retries once per second only 10 times, then raises OSError — so a save contended for >10s crashed instead of waiting. Wrap the acquire in a retry loop and add a contention regression test (fails without the loop). Also rebased onto v5.4.2.
f4f6881 to
9259145
Compare
|
Heads-up: to unblock the release train I used maintainer-edit access to push directly to your branch rather than wait on the review round-trip — hope that's okay! Two things landed: (1) a rebase onto v5.4.2 (the CHANGELOG conflict is resolved, your Unreleased entry kept on top), and (2) the retry loop from the review, plus a contention regression test built on your FakeMsvcrt harness (it fails without the loop, passes with it). Your commit and authorship are intact; my commit sits on top. If CI is green this merges as-is — thanks again for closing the last platform gap! |
Summary
msvcrt.lockingfallback whenfcntlis unavailableCloses #2.
Validation
python3 -m unittest discover -s tests -p test_mind.py -vfailed because nomsvcrt.lockingcalls occurredpython3 -m unittest discover -s tests -p test_mind.py -vpython3 -m compileall mind.py tests/test_mind.pypython3 bench/bench.pypython3 bench/soak.pygit diff --check