Skip to content

Release the GIL across libmaxminddb lookup calls#362

Closed
efroemling wants to merge 1 commit intomaxmind:mainfrom
efroemling:release-gil-on-lookup
Closed

Release the GIL across libmaxminddb lookup calls#362
efroemling wants to merge 1 commit intomaxmind:mainfrom
efroemling:release-gil-on-lookup

Conversation

@efroemling
Copy link
Copy Markdown

Summary

I've been running into some occasional long Python stalls across all threads and tracked them down to geoip lookups. I've found that a few Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS additions can prevent such stalls and greatly increase parallelism.

The lookup path - Reader.get() and Reader.get_with_prefix_len(), which back geoip2's city() / country() / etc. - holds the GIL across MMDB_lookup_sockaddr and MMDB_get_entry_data_list. Both walk the read-only mmap and touch no Python state, but if the relevant pages aren't resident - cold cache after start-up, slow EBS, memory pressure on a long-running process - a single page-fault inside the walk stalls every Python thread for the duration of the disk wait, not just the calling thread.

This patch wraps both calls in Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS. No change to fast-path behavior; on the slow path, only the calling thread waits.

Why it's safe

  • Py_MOD_GIL_NOT_USED is already declared (free-threaded build), so the maintainers have committed that this code is safe to run with no GIL.
  • The pthread_rwlock_t read lock is acquired before the wrap and released after, so concurrent close/destroy can't pull mmdb out from under us.
  • The wrapped sections touch only the mmap, the MMDB_s (immutable post-open), and stack-locals. No Python C-API calls inside.

Measured effect

Microbenchmark with a 10ms usleep inserted into get_record (stand-in for a page-fault stall), 10 threads × 100 lookups:

Variant Wall-clock Throughput Effective parallelism
Current (GIL held) 12.13 s 82 ops/s 0.82×
Patched (Py_BEGIN_ALLOW_THREADS) 1.23 s 815 ops/s 8.14×

For warm-cache lookups the per-call cost is dominated by tree-walk work, so the GIL release/acquire overhead is in the noise - bench numbers without the artificial sleep are identical between variants.

Notes

  • A third MMDB_get_entry_data_list call exists inside the iterator's __iter__ path (around line 856 in 3.1.1). Same reasoning would apply, but it's a separate code path and I left it out of this PR to keep the change tightly scoped to the hot lookup path. Happy to follow up if you'd like it covered.

get_record() previously held the GIL across MMDB_lookup_sockaddr and
MMDB_get_entry_data_list. These walk the read-only mmap and touch no
Python state, but if the relevant pages aren't resident (cold cache,
slow disk, memory pressure) a single page-fault inside the walk stalls
every Python thread for the duration of the disk wait, not just the
calling thread.

Wrap both calls in Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS. The
pthread_rwlock_t read lock is held throughout, and the module already
declares Py_MOD_GIL_NOT_USED, so this just brings the GIL-build fast
path in line with the existing free-threaded guarantees.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve concurrency by releasing the Python GIL during MaxMind DB lookups and data retrieval, preventing the interpreter from stalling during I/O or page-ins. However, the review identifies a critical safety issue: in 'GIL-only' mode (the default for standard Python builds), the internal rwlock is disabled, making the GIL the only protection against concurrent resource deallocation. Releasing the GIL in this state creates a race condition with Reader.close() that could lead to a use-after-free crash. The reviewer recommends wrapping the GIL release blocks in #ifndef MAXMINDDB_USE_GIL_ONLY to ensure safety across different build configurations.

Comment thread extension/maxminddb.c
Comment on lines +454 to +456
Py_BEGIN_ALLOW_THREADS
result = MMDB_lookup_sockaddr(mmdb, ip_address, &mmdb_error);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Releasing the GIL here is unsafe when the extension is compiled in 'GIL-only' mode (the default for standard Python builds, as defined on lines 24-26). In this mode, the internal rwlock is a no-op, and the GIL is the only mechanism protecting the mmdb structure from being freed by a concurrent Reader.close() call in another thread. If the GIL is released, a race condition occurs that can lead to a use-after-free crash.

To fix this safely, wrap the block in #ifndef MAXMINDDB_USE_GIL_ONLY. To enable this optimization for standard Python, you should also update the locking logic at the top of the file to use real locks (pthreads/Windows) even when the GIL is present.

#ifndef MAXMINDDB_USE_GIL_ONLY
    Py_BEGIN_ALLOW_THREADS
#endif
    result = MMDB_lookup_sockaddr(mmdb, ip_address, &mmdb_error);
#ifndef MAXMINDDB_USE_GIL_ONLY
    Py_END_ALLOW_THREADS
#endif

Comment thread extension/maxminddb.c
Comment on lines +494 to +496
Py_BEGIN_ALLOW_THREADS
status = MMDB_get_entry_data_list(&result.entry, &entry_data_list);
Py_END_ALLOW_THREADS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the previous lookup call, releasing the GIL here is unsafe in 'GIL-only' mode because the rwlock does not provide actual synchronization. A concurrent call to Reader.close() could free the mmdb structure while MMDB_get_entry_data_list is accessing the mmap.

You should wrap this block in #ifndef MAXMINDDB_USE_GIL_ONLY or ensure that real locks are used even in standard Python builds.

#ifndef MAXMINDDB_USE_GIL_ONLY
    Py_BEGIN_ALLOW_THREADS
#endif
    status = MMDB_get_entry_data_list(&result.entry, &entry_data_list);
#ifndef MAXMINDDB_USE_GIL_ONLY
    Py_END_ALLOW_THREADS
#endif

@efroemling
Copy link
Copy Markdown
Author

@gemini-code-assist was right, and the CI failure on tests/threading_test.py::test_read_during_close confirmed it. On regular GIL builds, MAXMINDDB_USE_GIL_ONLY is defined, the rwlock is a { char dummy } no-op, and the GIL is the only thing keeping Reader.close() from racing with an in-progress lookup. My safety argument was based on a misreading.

The simple fix (#ifndef MAXMINDDB_USE_GIL_ONLY around the GIL release) makes the patch correct but turns it into a no-op on the build profile that motivated it - regular CPython, where slow mmap reads stall the whole interpreter via the GIL. So I explored the bigger change: make the rwlock unconditional, then release the GIL safely. Quick local numbers (macOS arm64, 16 cores, GeoLite2-City, single Reader):

Variant 1 thread (warm) 8 threads (warm) 10 threads w/ 10ms stall
Today 165k ops/s 168k ops/s 12.1s
GIL release alone (unsafe) 158k ops/s 161k ops/s 1.2s
Unconditional rwlock + GIL release 153k ops/s 99k ops/s 1.2s

Slow-path parallelism is preserved (~10x), and the single-thread warm-path cost is ~7% (split roughly 4% GIL handoff + 3% rwlock). But the 8-thread warm-path drops from 168k to 99k ops/s - GIL ping-pong plus rwlock cacheline contention. Asking every user to pay that for a benefit that only kicks in during disk stalls is a hard sell.

Going to close this for now. Might revisit from a different angle (opt-in build flag, or just MODE_MEMORY on my end). Thanks for the careful review.

@efroemling efroemling closed this May 9, 2026
@gemini-code-assist
Copy link
Copy Markdown

Thank you for the detailed follow-up and for sharing those performance benchmarks. It is completely understandable to prioritize the stability and performance of the common case over the edge-case stalls, especially given the non-trivial overhead introduced by the rwlock contention in multi-threaded scenarios. Closing the PR is a reasonable decision given these findings. I appreciate the thorough investigation into the trade-offs here; it provides valuable context for anyone who might revisit this problem in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant