GH-137759:Limit _PyObject_HashFast to dicts keys, rename it, and mark it as Py_ALWAYS_INLINE by JasonMendoza2008 · Pull Request #137828 · python/cpython

JasonMendoza2008 · 2025-08-15T17:12:56Z

Context:

setobject.c used a mix of _PyObject_HashFast and PyObject_Hash. The consensus¹ was that _PyObject_HashFast should only really be used for dict keys (string-only sets are not so omnipresent). This PR uses the standard PyObject_Hash without the fast path (making set[Any] faster and set[str] slower²), renames _PyObject_HashFast since its name was misleading and marked it as Py_ALWAYS_INLINE to force inlining.

Pretty sure this doesn't require a NEWS entry?

Issue: Shouldn't set_intersection use _PyObject_HashFast()? #137759

There were discussions in the same issue around removing _PyObject_HashFast completely and adding the fast path for strings (and even potentially integers) to PyObject_Hash but there was no consensus so I'm not including it for now. ↩
In this comment , I run pyperformance on two versions of setobject.c, one using HashFast everywhere and one using the standard Hash everywhere. None of these correspond to the current implementation of setobject.c since the current implementation uses a mix of both. Obviously in all cases, dictionaries use HashFast. ↩

…d mark it as Py_ALWAYS_INLINE

python-cla-bot · 2025-08-15T17:13:00Z

All commit authors signed the Contributor License Agreement.

ZeroIntensity · 2025-08-15T17:58:29Z

making set[Any] faster and set[str] slower

I'm not totally convinced that this is true. Detecting the fast-path for strings is just one pointer comparison, which really won't have a noticeable performance impact. I agree that we want _PyObject_HashFast/_PyObject_HashDictKey for inlining purposes, but I don't see why the fast path has to be limited to it.

JasonMendoza2008 · 2025-08-15T18:03:08Z

making set[Any] faster and set[str] slower

I'm not totally convinced that this is true. Detecting the fast-path for strings is just one pointer comparison, which really won't have a noticeable performance impact. I agree that we want _PyObject_HashFast/_PyObject_HashDictKey for inlining purposes, but I don't see why the fast path has to be limited to it.

I'm fine with adding a fast-path for strings and integers (but then do we even need _PyObject_HashFast/_PyObject_HashDictKey , we could just put everything in PyObject_Hash¹). I didn't since there was no consensus.

However, we probably can't inline that function since it's bigger? I'm not sure... I'm no expert in what's too big to be inlined. ↩

ZeroIntensity · 2025-08-15T18:07:32Z

but then do we even need HashFast, we could just put everything in PyObject_Hash

Yeah, because even if there's no speedup there, it's just simpler maintenance (in the future, we might want to make other assumptions about a dictionary key's hash that improve performance).

JasonMendoza2008 · 2025-08-15T18:09:48Z

I'm not totally convinced that this is true. Detecting the fast-path for strings is just one pointer comparison, which really won't have a noticeable performance impact

Playing devil's advocate: if we don't detect it, we're a a function pointer dereference and call away from getting the hash anyway, right? The hash is not recomputed without that fast-path since it's stored (for strings, but for integers it's just an AND operation if I remember correctly)?

ZeroIntensity · 2025-08-15T18:28:57Z

A function pointer call has a lot more to do than just some reads, but maybe the overhead isn't noticeable?

JasonMendoza2008 · 2025-08-16T13:02:48Z

A function pointer call has a lot more to do than just some reads, but maybe the overhead isn't noticeable?

The best thing to answer this without guessing is probably to compile both (or all three, if we also include a fast path for integers) and run pyperformance.

JasonMendoza2008 · 2025-08-25T11:53:19Z

making set[Any] faster and set[str] slower

I'm not totally convinced that this is true. Detecting the fast-path for strings is just one pointer comparison, which really won't have a noticeable performance impact. I agree that we want _PyObject_HashFast/_PyObject_HashDictKey for inlining purposes, but I don't see why the fast path has to be limited to it.

Here is a comparison using pyperformance for what it's worth.

Pyperformance version: 1.11.0
Report on Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Number of logical CPUs: 24

Run 1

Run 2 (using the -r option which is supposed to get more accurate results by running everything more)

Both versions were compiled with ./configure --enable-optimizations --with-lto

vstinner · 2025-12-08T12:31:14Z

The set type has a fast-path for str hash for 20 years, since commit 9f1a679:

commit 9f1a6796eb83a2884df5fd93487634e46d8830a7
Author: Raymond Hettinger <python@rcn.com>
Date:   Sun Jul 31 01:16:36 2005 +0000

    Revised the set() and frozenset() implementaion to use its own internal
    data structure instead of using dictionaries.  Reduces memory consumption
    by 1/3 and provides modest speed-ups for most set operations.

(...)
+       if (PyString_CheckExact(key)) {
+               hash = ((PyStringObject *)key)->ob_shash;
+               if (hash == -1)
+                       hash = PyObject_Hash(key);
+       } else {
+               hash = PyObject_Hash(key);
+               if (hash == -1)
+                       return -1;
+       }
(...)

The consensus was that _PyObject_HashFast should only really be used for dict keys (string-only sets are not so omnipresent).

set implementation is very close to the dict implementation. So similar dict optimizations can be used in set.

marked it as Py_ALWAYS_INLINE to force inlining.

I dislike this macro, I prefer to rely on the compiler to decide if a function should be inlined or not. The compiler is smart enough to decide if a function should be inlined depending on the pressure on register allocation and other stuffs.

vstinner · 2025-12-08T12:43:45Z

I tried to remove the optimization for str:

diff --git a/Objects/setobject.c b/Objects/setobject.c
index 85f4d7d4031..5c8139a7c5f 100644
--- a/Objects/setobject.c
+++ b/Objects/setobject.c
@@ -409,7 +409,7 @@ set_discard_entry(PySetObject *so, PyObject *key, Py_hash_t hash)
static int
set_add_key(PySetObject *so, PyObject *key)
{
-    Py_hash_t hash = _PyObject_HashFast(key);
+    Py_hash_t hash = PyObject_Hash(key);
    if (hash == -1) {
        set_unhashable_type(key);
        return -1;

Results on Linux with CPU isolation and Python built with gcc -O3:

Benchmark	_PyObject_HashFast	PyObject_Hash
set(range(256))	5.58 us	5.16 us: 1.08x faster
set(map(chr, range(256)))	6.33 us	6.74 us: 1.06x slower
Geometric mean	(ref)	1.01x faster

So yes, set(integers) is 8% faster, but set(strings) becomes 6% slower. The question is how common it is to create a set of strings? In case of doubt, I prefer to leave the code as it is.

Benchmark script:

import pyperf
runner = pyperf.Runner()
runner.timeit('set(range(256))',
    setup='data = tuple(range(256)); Set=set',
    stmt='Set(data)')
runner.timeit('set(map(chr, range(256)))',
    setup='data = tuple(map(chr, range(256))); Set=set',
    stmt='Set(data)')

JasonMendoza2008 · 2025-12-09T13:01:17Z

I tried to remove the optimization for str:
diff --git a/Objects/setobject.c b/Objects/setobject.c

index 85f4d7d4031..5c8139a7c5f 100644

--- a/Objects/setobject.c

+++ b/Objects/setobject.c

@@ -409,7 +409,7 @@ set_discard_entry(PySetObject *so, PyObject *key, Py_hash_t hash)

static int

set_add_key(PySetObject *so, PyObject *key)

{

-    Py_hash_t hash = _PyObject_HashFast(key);

+    Py_hash_t hash = PyObject_Hash(key);

    if (hash == -1) {

        set_unhashable_type(key);

        return -1;
Results on Linux with CPU isolation and Python built with gcc -O3:

| Benchmark | _PyObject_HashFast | PyObject_Hash |

|---------------------------|:------------------:|:---------------------:|

| set(range(256)) | 5.58 us | 5.16 us: 1.08x faster |

| set(map(chr, range(256))) | 6.33 us | 6.74 us: 1.06x slower |

| Geometric mean | (ref) | 1.01x faster |

So yes, set(integers) is 8% faster, but set(strings) becomes 6% slower. The question is how common it is to create a set of strings? In case of doubt, I prefer to leave the code as it is.

Benchmark script:
import pyperf

runner = pyperf.Runner()

runner.timeit('set(range(256))',

    setup='data = tuple(range(256)); Set=set',

    stmt='Set(data)')

runner.timeit('set(map(chr, range(256)))',

    setup='data = tuple(map(chr, range(256))); Set=set',

    stmt='Set(data)')

~~The problem is that set sometimes has the fast path and sometimes doesn't have it. So we can't really leave the code as it is, we have to either include it everywhere or nowhere right?~~

Nevermind I now realize I misunderstood what you meant by leaving the code as it is. Leaving the fast path everywhere and fixing where it was forgotten seem fine to me but other people (cf. issue) seemed to disagree so I'll let you guys decide.

vstinner · 2025-12-09T13:05:51Z

Using _PyObject_HashFast() in set_intersection() sounds reasonable to me.

github-actions · 2026-04-28T06:53:26Z

This PR is stale because it has been open for 30 days with no activity.

JasonMendoza2008 · 2026-04-28T10:55:15Z

Using _PyObject_HashFast() in set_intersection() sounds reasonable to me.

Should someone that has the authority to do so do that? Not sure how to move forward this PR.

pythonGH-137759:Limit _PyObject_HashFast to dicts keys, rename it, an…

207cccd

…d mark it as Py_ALWAYS_INLINE

JasonMendoza2008 requested review from markshannon, methane and rhettinger as code owners August 15, 2025 17:12

bedevere-app Bot added the awaiting review label Aug 15, 2025

This comment was marked as resolved.

Sign in to view

bedevere-app Bot mentioned this pull request Aug 15, 2025

Shouldn't set_intersection use _PyObject_HashFast()? #137759

Open

Merge branch 'main' into fix-issue-137759

5fe204c

This comment was marked as resolved.

Sign in to view

ZeroIntensity added the skip news label Aug 15, 2025

rhettinger removed their request for review August 16, 2025 04:17

serhiy-storchaka requested review from rhettinger and vstinner December 8, 2025 08:31

Merge branch 'main' into fix-issue-137759

2a8ad4e

github-actions Bot added the stale Stale PR or inactive for long period of time. label Apr 28, 2026

Uh oh!

Conversation

JasonMendoza2008 commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

python-cla-bot Bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

ZeroIntensity commented Aug 15, 2025

Uh oh!

JasonMendoza2008 commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

ZeroIntensity commented Aug 15, 2025

Uh oh!

JasonMendoza2008 commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZeroIntensity commented Aug 15, 2025

Uh oh!

JasonMendoza2008 commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JasonMendoza2008 commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Dec 8, 2025

Uh oh!

vstinner commented Dec 8, 2025

Uh oh!

JasonMendoza2008 commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Dec 9, 2025

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

JasonMendoza2008 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JasonMendoza2008 commented Aug 15, 2025 •

edited

Loading

python-cla-bot Bot commented Aug 15, 2025 •

edited

Loading

JasonMendoza2008 commented Aug 15, 2025 •

edited

Loading

JasonMendoza2008 commented Aug 15, 2025 •

edited

Loading

JasonMendoza2008 commented Aug 16, 2025 •

edited

Loading

JasonMendoza2008 commented Aug 25, 2025 •

edited

Loading

JasonMendoza2008 commented Dec 9, 2025 •

edited

Loading