Skip to content

Commit ccea600

Browse files
committed
Add InternalDocs/qsbr.md.
1 parent b706ff0 commit ccea600

3 files changed

Lines changed: 132 additions & 1 deletion

File tree

InternalDocs/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,5 @@ Program Execution
4141
- [Garbage Collector Design](garbage_collector.md)
4242

4343
- [Exception Handling](exception_handling.md)
44+
45+
- [Quiescent-State Based Reclamation (QSBR)](qsbr.md)

InternalDocs/qsbr.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Quiescent-State Based Reclamation (QSBR)
2+
3+
## Introduction
4+
5+
When implementing lock-free data structures, a key challenge is determining
6+
when it is safe to free memory that has been logically removed from a
7+
structure. Freeing memory too early can lead to use-after-free bugs if another
8+
thread is still accessing it. Freeing it too late results in excessive memory
9+
consumption.
10+
11+
Safe memory reclamation (SMR) schemes address this by delaying the free
12+
operation until all concurrent read accesses are guaranteed to have completed.
13+
Quiescent-State Based Reclamation (QSBR) is an SMR scheme used in Python's
14+
free-threaded build to manage the lifecycle of shared memory.
15+
16+
QSBR requires threads to periodically report that they are in a quiescent
17+
state. A thread is in a quiescent state if it holds no references to shared
18+
objects that might be reclaimed. Think of it as a checkpoint where a thread
19+
signals, "I am not in the middle of any operation that relies on a shared
20+
resource." In Python, the eval_breaker provides a natural and convenient place
21+
for threads to report this state.
22+
23+
24+
## Use in Free-Threaded Python
25+
26+
While CPython's memory management is dominated by reference counting and a
27+
tracing garbage collector, these mechanisms are not suitable for all data
28+
structures. For example, the backing array of a list object is not individually
29+
reference-counted but may have a shorter lifetime than the PyListObject that
30+
contains it. We could delay reclamation until the next GC run, but we want
31+
reclamation to be prompt and to run the GC less frequently in the free-threaded
32+
build, as it requires pausing all threads.
33+
34+
Many operations in the free-threaded build are protected by locks. However, for
35+
performance-critical code, we want to allow reads to happen concurrently with
36+
updates. For instance, we want to avoid locking during most list read accesses.
37+
If a list is resized while another thread is reading it, QSBR provides the
38+
mechanism to determine when it is safe to free the list's old backing array.
39+
40+
Specific use cases for QSBR include:
41+
42+
* Dictionary keys (PyDictKeysObject) and list arrays (ob_item): When a
43+
dictionary or list that may be shared between threads is resized, we use QSBR
44+
to delay freeing the old keys or array until it's safe. For dicts and lists
45+
that are not shared, their storage can be freed immediately upon resize.
46+
47+
* Mimalloc mi_page_t: Non-locking dictionary and list accesses require
48+
cooperation from the memory allocator. If an object is freed and its memory is
49+
reused, we must ensure the new object's reference count field is at the same
50+
memory location. In practice, this means when a mimalloc page (mi_page_t)
51+
becomes empty, we don't immediately allow it to be reused for allocations of a
52+
different size class. QSBR is used to determine when it's safe to repurpose the
53+
page or return its memory to the OS.
54+
55+
56+
## Implementation Details
57+
58+
59+
### Core Implementation
60+
61+
The proposal to add QSBR to Python is contained in Github issue 115103 [1].
62+
Many details of that proposal have been copied here, so they can be kept
63+
up-to-date with the actual implementation.
64+
65+
Python's QSBR implementation is based on FreeBSD's "Global Unbounded
66+
Sequences." [2, 3, 4]. It relies on a few key counters:
67+
68+
* Global Write Sequence (`wr_seq`): A per-interpreter counter, `wr_seq`, is started
69+
at 1 and incremented by 2 each time it is advanced. This ensures its value is
70+
always odd, which can be used to distinguish it from other state values. When
71+
an object needs to be reclaimed, `wr_seq` is advanced, and the object is tagged
72+
with this new sequence number.
73+
74+
* Per-Thread Read Sequence: Each thread has a local read sequence counter. When
75+
a thread reaches a quiescent state (e.g., at the eval_breaker), it copies the
76+
current global `wr_seq` to its local counter.
77+
78+
* Global Read Sequence (`rd_seq`): This per-interpreter value stores the minimum
79+
of all per-thread read sequence counters (excluding detached threads). It is
80+
updated by a "polling" operation.
81+
82+
To free an object, the following steps are taken:
83+
84+
1. Advance the global `wr_seq`.
85+
86+
2. Add the object's pointer to a deferred-free list, tagging it with the new
87+
`wr_seq` value as its qsbr_goal.
88+
89+
Periodically, a polling mechanism processes this deferred-free list:
90+
91+
1. The minimum read sequence value across all active threads is calculated and
92+
stored as the global `rd_seq`.
93+
94+
2. For each item on the deferred-free list, if its qsbr_goal is less than the
95+
new `rd_seq`, its memory is freed, and it is removed from the list. Otherwise,
96+
it remains on the list for a future attempt.
97+
98+
99+
### Deferred Advance Optimization
100+
101+
To reduce memory contention from frequent updates to the global `wr_seq`, its
102+
advancement is sometimes deferred. Instead of incrementing `wr_seq` on every
103+
reclamation request, each thread tracks its number of deferrals locally. Once
104+
the deferral count reaches a limit (QSBR_DEFERRED_LIMIT, currently 10), the
105+
thread advances the global `wr_seq` and resets its local count.
106+
107+
When an object is added to the deferred-free list, its qsbr_goal is set to
108+
`wr_seq` + 2. By setting the goal to the next sequence value, we ensure it's safe
109+
to defer the global counter advancement. This optimization improves runtime
110+
speed but may increase peak memory usage by slightly delaying when memory can
111+
be reclaimed.
112+
113+
114+
## Limitations
115+
116+
Determining the `rd_seq` requires scanning over all thread states. This operation
117+
could become a bottleneck in applications with a very large number of threads
118+
(e.g., >1,000). Future work may address this with more advanced mechanisms,
119+
such as a tree-based structure or incremental scanning. For now, the
120+
implementation prioritizes simplicity, with plans for refinement if
121+
multi-threaded benchmarks reveal performance issues.
122+
123+
124+
## References
125+
126+
1. https://github.com/python/cpython/issues/115103
127+
2. https://youtu.be/ZXUIFj4nRjk?t=694
128+
3. https://people.kernel.org/joelfernandes/gus-vs-rcu
129+
4. http://bxr.su/FreeBSD/sys/kern/subr_smr.c#44

Python/qsbr.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/*
22
* Implementation of safe memory reclamation scheme using
3-
* quiescent states.
3+
* quiescent states. See InternalDocs/qsbr.md.
44
*
55
* This is derived from the "GUS" safe memory reclamation technique
66
* in FreeBSD written by Jeffrey Roberson. It is heavily modified. Any bugs

0 commit comments

Comments
 (0)