Release the GIL across libmaxminddb lookup calls#362
Release the GIL across libmaxminddb lookup calls#362efroemling wants to merge 1 commit intomaxmind:mainfrom
Conversation
get_record() previously held the GIL across MMDB_lookup_sockaddr and MMDB_get_entry_data_list. These walk the read-only mmap and touch no Python state, but if the relevant pages aren't resident (cold cache, slow disk, memory pressure) a single page-fault inside the walk stalls every Python thread for the duration of the disk wait, not just the calling thread. Wrap both calls in Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS. The pthread_rwlock_t read lock is held throughout, and the module already declares Py_MOD_GIL_NOT_USED, so this just brings the GIL-build fast path in line with the existing free-threaded guarantees.
There was a problem hiding this comment.
Code Review
This pull request aims to improve concurrency by releasing the Python GIL during MaxMind DB lookups and data retrieval, preventing the interpreter from stalling during I/O or page-ins. However, the review identifies a critical safety issue: in 'GIL-only' mode (the default for standard Python builds), the internal rwlock is disabled, making the GIL the only protection against concurrent resource deallocation. Releasing the GIL in this state creates a race condition with Reader.close() that could lead to a use-after-free crash. The reviewer recommends wrapping the GIL release blocks in #ifndef MAXMINDDB_USE_GIL_ONLY to ensure safety across different build configurations.
| Py_BEGIN_ALLOW_THREADS | ||
| result = MMDB_lookup_sockaddr(mmdb, ip_address, &mmdb_error); | ||
| Py_END_ALLOW_THREADS |
There was a problem hiding this comment.
Releasing the GIL here is unsafe when the extension is compiled in 'GIL-only' mode (the default for standard Python builds, as defined on lines 24-26). In this mode, the internal rwlock is a no-op, and the GIL is the only mechanism protecting the mmdb structure from being freed by a concurrent Reader.close() call in another thread. If the GIL is released, a race condition occurs that can lead to a use-after-free crash.
To fix this safely, wrap the block in #ifndef MAXMINDDB_USE_GIL_ONLY. To enable this optimization for standard Python, you should also update the locking logic at the top of the file to use real locks (pthreads/Windows) even when the GIL is present.
#ifndef MAXMINDDB_USE_GIL_ONLY
Py_BEGIN_ALLOW_THREADS
#endif
result = MMDB_lookup_sockaddr(mmdb, ip_address, &mmdb_error);
#ifndef MAXMINDDB_USE_GIL_ONLY
Py_END_ALLOW_THREADS
#endif| Py_BEGIN_ALLOW_THREADS | ||
| status = MMDB_get_entry_data_list(&result.entry, &entry_data_list); | ||
| Py_END_ALLOW_THREADS |
There was a problem hiding this comment.
Similar to the previous lookup call, releasing the GIL here is unsafe in 'GIL-only' mode because the rwlock does not provide actual synchronization. A concurrent call to Reader.close() could free the mmdb structure while MMDB_get_entry_data_list is accessing the mmap.
You should wrap this block in #ifndef MAXMINDDB_USE_GIL_ONLY or ensure that real locks are used even in standard Python builds.
#ifndef MAXMINDDB_USE_GIL_ONLY
Py_BEGIN_ALLOW_THREADS
#endif
status = MMDB_get_entry_data_list(&result.entry, &entry_data_list);
#ifndef MAXMINDDB_USE_GIL_ONLY
Py_END_ALLOW_THREADS
#endif|
@gemini-code-assist was right, and the CI failure on The simple fix (
Slow-path parallelism is preserved (~10x), and the single-thread warm-path cost is ~7% (split roughly 4% GIL handoff + 3% rwlock). But the 8-thread warm-path drops from 168k to 99k ops/s - GIL ping-pong plus rwlock cacheline contention. Asking every user to pay that for a benefit that only kicks in during disk stalls is a hard sell. Going to close this for now. Might revisit from a different angle (opt-in build flag, or just |
|
Thank you for the detailed follow-up and for sharing those performance benchmarks. It is completely understandable to prioritize the stability and performance of the common case over the edge-case stalls, especially given the non-trivial overhead introduced by the rwlock contention in multi-threaded scenarios. Closing the PR is a reasonable decision given these findings. I appreciate the thorough investigation into the trade-offs here; it provides valuable context for anyone who might revisit this problem in the future. |
Summary
I've been running into some occasional long Python stalls across all threads and tracked them down to geoip lookups. I've found that a few
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADSadditions can prevent such stalls and greatly increase parallelism.The lookup path -
Reader.get()andReader.get_with_prefix_len(), which back geoip2'scity()/country()/ etc. - holds the GIL acrossMMDB_lookup_sockaddrandMMDB_get_entry_data_list. Both walk the read-only mmap and touch no Python state, but if the relevant pages aren't resident - cold cache after start-up, slow EBS, memory pressure on a long-running process - a single page-fault inside the walk stalls every Python thread for the duration of the disk wait, not just the calling thread.This patch wraps both calls in
Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. No change to fast-path behavior; on the slow path, only the calling thread waits.Why it's safe
Py_MOD_GIL_NOT_USEDis already declared (free-threaded build), so the maintainers have committed that this code is safe to run with no GIL.pthread_rwlock_tread lock is acquired before the wrap and released after, so concurrent close/destroy can't pullmmdbout from under us.MMDB_s(immutable post-open), and stack-locals. No Python C-API calls inside.Measured effect
Microbenchmark with a 10ms
usleepinserted intoget_record(stand-in for a page-fault stall), 10 threads × 100 lookups:Py_BEGIN_ALLOW_THREADS)For warm-cache lookups the per-call cost is dominated by tree-walk work, so the GIL release/acquire overhead is in the noise - bench numbers without the artificial sleep are identical between variants.
Notes
MMDB_get_entry_data_listcall exists inside the iterator's__iter__path (around line 856 in 3.1.1). Same reasoning would apply, but it's a separate code path and I left it out of this PR to keep the change tightly scoped to the hot lookup path. Happy to follow up if you'd like it covered.