Add InternalDocs/qsbr.md.

nascheme · nascheme · commit ccea6000526e · 2025-06-11T15:55:32.000-07:00
diff --git a/InternalDocs/README.md b/InternalDocs/README.md
@@ -41,3 +41,5 @@ Program Execution
 - [Garbage Collector Design](garbage_collector.md)
 
 - [Exception Handling](exception_handling.md)
+
+- [Quiescent-State Based Reclamation (QSBR)](qsbr.md)
diff --git a/InternalDocs/qsbr.md b/InternalDocs/qsbr.md
@@ -0,0 +1,129 @@
+# Quiescent-State Based Reclamation (QSBR)
+
+## Introduction
+
+When implementing lock-free data structures, a key challenge is determining
+when it is safe to free memory that has been logically removed from a
+structure. Freeing memory too early can lead to use-after-free bugs if another
+thread is still accessing it. Freeing it too late results in excessive memory
+consumption.
+
+Safe memory reclamation (SMR) schemes address this by delaying the free
+operation until all concurrent read accesses are guaranteed to have completed.
+Quiescent-State Based Reclamation (QSBR) is an SMR scheme used in Python's
+free-threaded build to manage the lifecycle of shared memory.
+
+QSBR requires threads to periodically report that they are in a quiescent
+state. A thread is in a quiescent state if it holds no references to shared
+objects that might be reclaimed. Think of it as a checkpoint where a thread
+signals, "I am not in the middle of any operation that relies on a shared
+resource." In Python, the eval_breaker provides a natural and convenient place
+for threads to report this state.
+
+
+## Use in Free-Threaded Python
+
+While CPython's memory management is dominated by reference counting and a
+tracing garbage collector, these mechanisms are not suitable for all data
+structures. For example, the backing array of a list object is not individually
+reference-counted but may have a shorter lifetime than the PyListObject that
+contains it. We could delay reclamation until the next GC run, but we want
+reclamation to be prompt and to run the GC less frequently in the free-threaded
+build, as it requires pausing all threads.
+
+Many operations in the free-threaded build are protected by locks. However, for
+performance-critical code, we want to allow reads to happen concurrently with
+updates. For instance, we want to avoid locking during most list read accesses.
+If a list is resized while another thread is reading it, QSBR provides the
+mechanism to determine when it is safe to free the list's old backing array.
+
+Specific use cases for QSBR include:
+
+* Dictionary keys (PyDictKeysObject) and list arrays (ob_item): When a
+dictionary or list that may be shared between threads is resized, we use QSBR
+to delay freeing the old keys or array until it's safe. For dicts and lists
+that are not shared, their storage can be freed immediately upon resize.
+
+* Mimalloc mi_page_t: Non-locking dictionary and list accesses require
+cooperation from the memory allocator. If an object is freed and its memory is
+reused, we must ensure the new object's reference count field is at the same
+memory location. In practice, this means when a mimalloc page (mi_page_t)
+becomes empty, we don't immediately allow it to be reused for allocations of a
+different size class. QSBR is used to determine when it's safe to repurpose the
+page or return its memory to the OS.
+
+
+## Implementation Details
+
+
+### Core Implementation
+
+The proposal to add QSBR to Python is contained in Github issue 115103 [1].
+Many details of that proposal have been copied here, so they can be kept
+up-to-date with the actual implementation.
+
+Python's QSBR implementation is based on FreeBSD's "Global Unbounded
+Sequences." [2, 3, 4].  It relies on a few key counters:
+
+* Global Write Sequence (`wr_seq`): A per-interpreter counter, `wr_seq`, is started
+at 1 and incremented by 2 each time it is advanced. This ensures its value is
+always odd, which can be used to distinguish it from other state values. When
+an object needs to be reclaimed, `wr_seq` is advanced, and the object is tagged
+with this new sequence number.
+
+* Per-Thread Read Sequence: Each thread has a local read sequence counter. When
+a thread reaches a quiescent state (e.g., at the eval_breaker), it copies the
+current global `wr_seq` to its local counter.
+
+* Global Read Sequence (`rd_seq`): This per-interpreter value stores the minimum
+of all per-thread read sequence counters (excluding detached threads). It is
+updated by a "polling" operation.
+
+To free an object, the following steps are taken:
+
+1. Advance the global `wr_seq`.
+
+2. Add the object's pointer to a deferred-free list, tagging it with the new
+   `wr_seq` value as its qsbr_goal.
+
+Periodically, a polling mechanism processes this deferred-free list:
+
+1. The minimum read sequence value across all active threads is calculated and
+   stored as the global `rd_seq`.
+
+2. For each item on the deferred-free list, if its qsbr_goal is less than the
+   new `rd_seq`, its memory is freed, and it is removed from the list. Otherwise,
+   it remains on the list for a future attempt.
+
+
+### Deferred Advance Optimization
+
+To reduce memory contention from frequent updates to the global `wr_seq`, its
+advancement is sometimes deferred. Instead of incrementing `wr_seq` on every
+reclamation request, each thread tracks its number of deferrals locally. Once
+the deferral count reaches a limit (QSBR_DEFERRED_LIMIT, currently 10), the
+thread advances the global `wr_seq` and resets its local count.
+
+When an object is added to the deferred-free list, its qsbr_goal is set to
+`wr_seq` + 2. By setting the goal to the next sequence value, we ensure it's safe
+to defer the global counter advancement. This optimization improves runtime
+speed but may increase peak memory usage by slightly delaying when memory can
+be reclaimed.
+
+
+## Limitations
+
+Determining the `rd_seq` requires scanning over all thread states. This operation
+could become a bottleneck in applications with a very large number of threads
+(e.g., >1,000). Future work may address this with more advanced mechanisms,
+such as a tree-based structure or incremental scanning. For now, the
+implementation prioritizes simplicity, with plans for refinement if
+multi-threaded benchmarks reveal performance issues.
+
+
+## References
+
+1. https://github.com/python/cpython/issues/115103
+2. https://youtu.be/ZXUIFj4nRjk?t=694
+3. https://people.kernel.org/joelfernandes/gus-vs-rcu
+4. http://bxr.su/FreeBSD/sys/kern/subr_smr.c#44
diff --git a/Python/qsbr.c b/Python/qsbr.c
@@ -1,6 +1,6 @@
 /*
  * Implementation of safe memory reclamation scheme using
- * quiescent states.
+ * quiescent states.  See InternalDocs/qsbr.md.
  *
  * This is derived from the "GUS" safe memory reclamation technique
  * in FreeBSD written by Jeffrey Roberson. It is heavily modified. Any bugs

Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`/*`
`2`	`2`	`* Implementation of safe memory reclamation scheme using`
`3`		`- * quiescent states.`
	`3`	`+ * quiescent states. See InternalDocs/qsbr.md.`
`4`	`4`	`*`
`5`	`5`	`* This is derived from the "GUS" safe memory reclamation technique`
`6`	`6`	`* in FreeBSD written by Jeffrey Roberson. It is heavily modified. Any bugs`