gh-141858: Speed up Objects/tupleobject.c richcompare with early identity and length checks#142027
gh-141858: Speed up Objects/tupleobject.c richcompare with early identity and length checks#142027maurycy wants to merge 11 commits intopython:mainfrom
Objects/tupleobject.c richcompare with early identity and length checks#142027Conversation
Objects/tupleobject.c richcompare` with early identity and length checksObjects/tupleobject.c richcompare with early identity and length checks
| if (v == w) { | ||
| Py_RETURN_RICHCOMPARE(0, 0, op); | ||
| } | ||
|
|
There was a problem hiding this comment.
I'm aware of https://bugs.python.org/issue30907 but I think it was premature:
- pyperformance was much less mature then,
- it was hard to go beyond simple microbenchmarks,
- there was no quantifiable arguments,
- there was generally less focus on performance in the language.
That said, even 25d53eb, without the identity check, is great (9f2a34a is main):
The `pyperformance compare`:
pyperformance.9f2a34af747.json
==============================
Performance version: 1.13.0
Report on Linux-6.12.57+deb13-amd64-x86_64-with-glibc2.41
Number of logical CPUs: 24
Start date: 2025-11-26 20:05:17.126618
End date: 2025-11-26 21:43:38.389462
pyperformance.25d53ebaea3.json
==============================
Performance version: 1.13.0
Report on Linux-6.12.57+deb13-amd64-x86_64-with-glibc2.41
Number of logical CPUs: 24
Start date: 2025-11-27 14:00:59.954246
End date: 2025-11-27 15:54:44.937969
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| Benchmark | pyperformance.9f2a34af747.json | pyperformance.25d53ebaea3.json | Change | Significance |
+==================================+================================+================================+==============+========================+
| 2to3 | 207 ms | 207 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_generators | 289 ms | 286 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_cpu_io_mixed | 394 ms | 392 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_cpu_io_mixed_tg | 395 ms | 394 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager | 83.3 ms | 83.3 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed | 336 ms | 333 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed_tg | 368 ms | 366 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_io | 422 ms | 425 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_io_tg | 419 ms | 421 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_memoization | 162 ms | 161 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_memoization_tg | 205 ms | 205 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_tg | 156 ms | 155 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_io | 448 ms | 450 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_io_tg | 429 ms | 431 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_memoization | 222 ms | 221 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_memoization_tg | 231 ms | 232 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_none | 192 ms | 194 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_none_tg | 191 ms | 192 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| asyncio_tcp | 252 ms | 277 ms | 1.10x slower | Significant (t=-35.30) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| asyncio_tcp_ssl | 1.21 sec | 1.24 sec | 1.03x slower | Significant (t=-82.44) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| asyncio_websockets | 357 ms | 357 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| bench_mp_pool | 13.2 ms | 12.2 ms | 1.08x faster | Significant (t=9.85) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| bench_thread_pool | 750 us | 674 us | 1.11x faster | Significant (t=113.01) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| bpe_tokeniser | 3.58 sec | 3.64 sec | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| chameleon | 11.9 ms | 11.8 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| chaos | 43.8 ms | 44.1 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| comprehensions | 13.0 us | 12.3 us | 1.06x faster | Significant (t=8.87) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| coroutines | 17.1 ms | 17.0 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| coverage | 58.5 ms | 57.9 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| create_gc_cycles | 2.09 ms | 2.03 ms | 1.03x faster | Significant (t=8.72) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| crypto_pyaes | 58.3 ms | 57.7 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| dask | 682 ms | 531 ms | 1.28x faster | Significant (t=299.38) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| deepcopy | 193 us | 196 us | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| deepcopy_memo | 19.1 us | 19.3 us | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| deepcopy_reduce | 2.13 us | 2.16 us | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| deltablue | 2.42 ms | 2.42 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| django_template | 28.3 ms | 28.1 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| docutils | 2.11 sec | 2.12 sec | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| dulwich_log | 43.6 ms | 43.6 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| fannkuch | 273 ms | 268 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| float | 56.1 ms | 53.8 ms | 1.04x faster | Significant (t=8.11) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| gc_traversal | 4.15 ms | 4.04 ms | 1.03x faster | Significant (t=15.82) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| generators | 22.4 ms | 21.7 ms | 1.04x faster | Significant (t=21.28) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| genshi_text | 17.4 ms | 17.5 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| genshi_xml | 39.5 ms | 38.7 ms | 1.02x faster | Significant (t=10.11) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| go | 89.9 ms | 89.4 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| hexiom | 4.56 ms | 4.49 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| html5lib | 49.5 ms | 50.0 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| json_dumps | 7.49 ms | 7.29 ms | 1.03x faster | Significant (t=36.97) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| json_loads | 18.4 us | 18.8 us | 1.03x slower | Significant (t=-16.34) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| logging_format | 4.88 us | 4.87 us | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| logging_silent | 69.2 ns | 67.5 ns | 1.03x faster | Significant (t=26.85) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| logging_simple | 4.40 us | 4.37 us | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| mako | 7.96 ms | 7.95 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| many_optionals | 862 us | 875 us | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| mdp | 939 ms | 954 ms | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| meteor_contest | 94.4 ms | 95.1 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| nbody | 68.1 ms | 69.9 ms | 1.03x slower | Significant (t=-11.08) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| nqueens | 69.1 ms | 72.1 ms | 1.04x slower | Significant (t=-56.52) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pathlib | 10.0 ms | 9.94 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pickle | 13.7 us | 13.7 us | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pickle_dict | 25.0 us | 24.3 us | 1.03x faster | Significant (t=23.80) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pickle_list | 3.78 us | 3.89 us | 1.03x slower | Significant (t=-13.22) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pickle_pure_python | 241 us | 241 us | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pidigits | 184 ms | 184 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pprint_pformat | 1.12 sec | 1.16 sec | 1.04x slower | Significant (t=-31.84) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pprint_safe_repr | 547 ms | 570 ms | 1.04x slower | Significant (t=-31.75) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pyflate | 316 ms | 311 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| python_startup | 10.8 ms | 10.8 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| python_startup_no_site | 6.34 ms | 6.34 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| raytrace | 212 ms | 213 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| regex_compile | 98.5 ms | 99.4 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| regex_dna | 162 ms | 155 ms | 1.04x faster | Significant (t=58.02) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| regex_effbot | 2.26 ms | 2.20 ms | 1.03x faster | Significant (t=33.39) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| regex_v8 | 18.4 ms | 18.3 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| richards | 33.6 ms | 33.4 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| richards_super | 38.2 ms | 37.8 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_fft | 197 ms | 198 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_lu | 66.5 ms | 67.2 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_monte_carlo | 44.0 ms | 43.6 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_sor | 74.9 ms | 74.2 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_sparse_mat_mult | 2.99 ms | 3.10 ms | 1.04x slower | Significant (t=-41.64) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| spectral_norm | 63.7 ms | 63.6 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sphinx | 792 ms | 790 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlalchemy_declarative | 108 ms | 109 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlalchemy_imperative | 13.2 ms | 13.3 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlglot_v2_normalize | 83.2 ms | 83.1 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlglot_v2_optimize | 41.9 ms | 41.7 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlglot_v2_parse | 969 us | 987 us | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlglot_v2_transpile | 1.25 ms | 1.26 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlite_synth | 1.98 us | 1.97 us | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| subparsers | 31.3 ms | 31.0 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sympy_expand | 360 ms | 361 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sympy_integrate | 16.0 ms | 16.2 ms | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sympy_str | 210 ms | 212 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sympy_sum | 110 ms | 110 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| telco | 117 ms | 117 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| tomli_loads | 1.51 sec | 1.54 sec | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| tornado_http | 76.7 ms | 76.7 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| typing_runtime_protocols | 121 us | 121 us | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| unpack_sequence | 31.7 ns | 34.0 ns | 1.07x slower | Significant (t=-54.16) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| unpickle | 10.6 us | 10.3 us | 1.03x faster | Significant (t=10.82) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| unpickle_list | 3.92 us | 3.84 us | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| unpickle_pure_python | 156 us | 158 us | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xdsl_constant_fold | 35.9 ms | 35.8 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xml_etree_generate | 67.2 ms | 68.0 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xml_etree_iterparse | 66.3 ms | 65.8 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xml_etree_parse | 106 ms | 105 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xml_etree_process | 47.3 ms | 46.6 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
There was a problem hiding this comment.
Particularly:
The `pyperformance compare`:
pyperformance.25d53ebaea3.json
==============================
Performance version: 1.13.0
Report on Linux-6.12.57+deb13-amd64-x86_64-with-glibc2.41
Number of logical CPUs: 24
Start date: 2025-11-27 14:00:59.954246
End date: 2025-11-27 15:54:44.937969
pyperformance.9ba81106b1d.json
==============================
Performance version: 1.13.0
Report on Linux-6.12.57+deb13-amd64-x86_64-with-glibc2.41
Number of logical CPUs: 24
Start date: 2025-11-27 03:09:32.694677
End date: 2025-11-27 05:02:22.084653
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| Benchmark | pyperformance.25d53ebaea3.json | pyperformance.9ba81106b1d.json | Change | Significance |
+==================================+================================+================================+==============+========================+
| 2to3 | 207 ms | 206 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_generators | 286 ms | 290 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_cpu_io_mixed | 392 ms | 392 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_cpu_io_mixed_tg | 394 ms | 394 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager | 83.3 ms | 82.6 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed | 333 ms | 333 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_cpu_io_mixed_tg | 366 ms | 366 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_io | 425 ms | 423 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_io_tg | 421 ms | 422 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_memoization | 161 ms | 160 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_memoization_tg | 205 ms | 206 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_eager_tg | 155 ms | 156 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_io | 450 ms | 453 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_io_tg | 431 ms | 432 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_memoization | 221 ms | 222 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_memoization_tg | 232 ms | 233 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_none | 194 ms | 193 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| async_tree_none_tg | 192 ms | 192 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| asyncio_tcp | 277 ms | 274 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| asyncio_tcp_ssl | 1.24 sec | 1.25 sec | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| asyncio_websockets | 357 ms | 358 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| bench_mp_pool | 12.2 ms | 12.4 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| bench_thread_pool | 674 us | 671 us | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| bpe_tokeniser | 3.64 sec | 3.58 sec | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| chameleon | 11.8 ms | 11.7 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| chaos | 44.1 ms | 44.1 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| comprehensions | 12.3 us | 12.3 us | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| connected_components | 359 ms | 358 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| coroutines | 17.0 ms | 17.0 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| coverage | 57.9 ms | 55.3 ms | 1.05x faster | Significant (t=35.65) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| create_gc_cycles | 2.03 ms | 2.03 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| crypto_pyaes | 57.7 ms | 58.2 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| dask | 531 ms | 526 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| deepcopy | 196 us | 193 us | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| deepcopy_memo | 19.3 us | 18.6 us | 1.03x faster | Significant (t=54.44) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| deepcopy_reduce | 2.16 us | 2.13 us | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| deltablue | 2.42 ms | 2.42 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| django_template | 28.1 ms | 28.0 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| docutils | 2.12 sec | 2.12 sec | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| dulwich_log | 43.6 ms | 43.7 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| fannkuch | 268 ms | 278 ms | 1.04x slower | Significant (t=-9.68) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| float | 53.8 ms | 54.1 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| gc_traversal | 4.04 ms | 4.06 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| generators | 21.7 ms | 21.4 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| genshi_text | 17.5 ms | 16.8 ms | 1.04x faster | Significant (t=16.98) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| genshi_xml | 38.7 ms | 38.0 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| go | 89.4 ms | 91.2 ms | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| hexiom | 4.49 ms | 4.47 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| html5lib | 50.0 ms | 49.0 ms | 1.02x faster | Significant (t=15.54) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| json_dumps | 7.29 ms | 7.40 ms | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| json_loads | 18.8 us | 19.0 us | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| k_core | 1.58 sec | 1.57 sec | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| logging_format | 4.87 us | 4.86 us | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| logging_silent | 67.5 ns | 67.4 ns | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| logging_simple | 4.37 us | 4.34 us | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| mako | 7.95 ms | 7.67 ms | 1.04x faster | Significant (t=12.51) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| many_optionals | 875 us | 871 us | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| mdp | 954 ms | 941 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| meteor_contest | 95.1 ms | 94.2 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| nbody | 69.9 ms | 69.3 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| nqueens | 72.1 ms | 70.0 ms | 1.03x faster | Significant (t=34.08) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pathlib | 9.94 ms | 9.83 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pickle | 13.7 us | 13.8 us | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pickle_dict | 24.3 us | 23.6 us | 1.03x faster | Significant (t=34.28) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pickle_list | 3.89 us | 3.78 us | 1.03x faster | Significant (t=13.83) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pickle_pure_python | 241 us | 240 us | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pidigits | 184 ms | 184 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pprint_pformat | 1.16 sec | 1.14 sec | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pprint_safe_repr | 570 ms | 557 ms | 1.02x faster | Significant (t=16.23) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| pyflate | 311 ms | 316 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| python_startup | 10.8 ms | 10.8 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| python_startup_no_site | 6.34 ms | 6.34 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| raytrace | 213 ms | 215 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| regex_compile | 99.4 ms | 98.0 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| regex_dna | 155 ms | 155 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| regex_effbot | 2.20 ms | 2.19 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| regex_v8 | 18.3 ms | 18.0 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| richards | 33.4 ms | 33.0 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| richards_super | 37.8 ms | 37.5 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_fft | 198 ms | 196 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_lu | 67.2 ms | 70.2 ms | 1.04x slower | Significant (t=-22.40) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_monte_carlo | 43.6 ms | 44.4 ms | 1.02x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_sor | 74.2 ms | 75.2 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| scimark_sparse_mat_mult | 3.10 ms | 3.03 ms | 1.02x faster | Significant (t=20.57) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| shortest_path | 369 ms | 370 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| spectral_norm | 63.6 ms | 62.5 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sphinx | 790 ms | 786 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlalchemy_declarative | 109 ms | 108 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlalchemy_imperative | 13.3 ms | 13.2 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlglot_v2_normalize | 83.1 ms | 81.6 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlglot_v2_optimize | 41.7 ms | 41.3 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlglot_v2_parse | 987 us | 962 us | 1.03x faster | Significant (t=10.79) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlglot_v2_transpile | 1.26 ms | 1.24 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sqlite_synth | 1.97 us | 1.99 us | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| subparsers | 31.0 ms | 31.1 ms | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sympy_expand | 361 ms | 353 ms | 1.02x faster | Significant (t=33.14) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sympy_integrate | 16.2 ms | 16.0 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sympy_str | 212 ms | 208 ms | 1.02x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| sympy_sum | 110 ms | 109 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| telco | 117 ms | 114 ms | 1.03x faster | Significant (t=15.66) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| tomli_loads | 1.54 sec | 1.50 sec | 1.03x faster | Significant (t=17.51) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| tornado_http | 76.7 ms | 76.5 ms | 1.00x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| typing_runtime_protocols | 121 us | 121 us | 1.00x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| unpack_sequence | 34.0 ns | 31.3 ns | 1.09x faster | Significant (t=44.25) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| unpickle | 10.3 us | 10.4 us | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| unpickle_list | 3.84 us | 4.01 us | 1.04x slower | Significant (t=-30.06) |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| unpickle_pure_python | 158 us | 156 us | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xdsl_constant_fold | 35.8 ms | 35.4 ms | 1.01x faster | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xml_etree_generate | 68.0 ms | 68.7 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xml_etree_iterparse | 65.8 ms | 66.1 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xml_etree_parse | 105 ms | 106 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
| xml_etree_process | 46.6 ms | 47.0 ms | 1.01x slower | Not significant |
+----------------------------------+--------------------------------+--------------------------------+--------------+------------------------+
vstinner
left a comment
There was a problem hiding this comment.
Can you add comparison tests to Lib/test/seq_tests.py? Especially a test with float("nan").
|
@vstinner Voilà! More than happy to add more. |
Objects/tupleobject.c richcompare with early identity and length checksObjects/tupleobject.c richcompare with early identity and length checks
|
See #141858 for a wider discussion. |
|
@picnixz Thank you for spotting this. I've missed this issue. Do you think it makes sense to split this PR into two? Especially given that the identity check is not responsible for the major (>= 1.05x) improvement here. |
Co-authored-by: Victor Stinner <vstinner@python.org>
There was a problem hiding this comment.
I would still want to have other core devs opinion on this one. While I agere that pyperformance wasn't as mature as it was, note that we have some tests that are definitely slower, such as unpickle and json.loads (AFAICT from the pyperformance benchmarks). I would like first to reach a consensus on this optimization in the issue rather than pushing this forward. In addition, we need to be sure that every Tier-1 platofmr exhibits the same speed-up. On Windows, we had slowdowns if we naively added those fast paths to tuple, list, and set.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
This is interesting. Usually it is the other way around: not all benefits we see in micro benchmarks show up in pyperformance ...
Maybe @mdboom can fire up the faster pyperformance suite on this? Or somebody the one from meta? These two are the defacto gold standards when it comes to benchmarking :) |
|
The Faster CPython benchmarking page is out of commission (probably indefinitely) :(. I have access to the Meta runners. However, it's late now and I need to sleep. Please ping me again so I remember to check the code and run this. Thanks. |
|
@Fidget-Spinner Thank you. I'd be grateful for checking: against the current |
|
@maurycy sorry I'm only going to run the current head of this branch with pyperformance against current main. Benchmarking takes a really long time. Hope that's okay. If the results are inconclusive, we can always re-run again in the future. |
Makes sense! The reason why I mentioned 25d53eb is that I started digging and I'm increasingly less convinced about the identity check, given how rare it is: Let's see how it goes. Thank you! Out of curiosity, how long do they take? |
|
Roughly 3 hours in total. 1.5 hours for thus branch and 1.5 hours for the main branch |
|
Sorry but I cannot reproduce the benchmark results on the Meta runner https://github.com/facebookexperimental/free-threading-benchmarking/blob/61e4120f3bfa339ecca1191c1ac74bb32056c8a6/results/bm-20251202-3.15.0a1%2B-0eae50d/bm-20251202-vultr-x86_64-maurycy-maurycy_tuple_early-3.15.0a1%2B-0eae50d-vs-base.md However, there is no Dask benchmark on those runners. So I'm not sure if that plays a part. Are you confident in the Dask results presented here? Could you please re-run just the Dask benchmark to see if the results reproduce? |
|
FWIW, in the other issue (#141858 (comment)) dask had very high jitter for me. Especially noticable in the violin plots ... |
Oh. I'm no longer sure that this change is worth it. It has basically no impact on performance. |
According to new pyperformance results, I'm no longer sure that this change is worth it
|
Yeah, I think so, too. This most probably will only show in hand crufted micro benchmarks. But identical tuples can happen to be compared, see #141858 (comment). |
|
@chris-eibl @vstinner @Fidget-Spinner Any outcome is good! The identity check keeps coming back. Coupled with the bftrace, it will at least make a good argument in the future. I really appreciate all the effort. The disprepancy bugs me a bit. For full transparency, I'm attaching my raw pyperformance runs: pyperformance.c9015394.json There was @Fidget-Spinner For the
I see this pretty reliably. Attaching the raw files just for 0eae50d7b48ddc0a19834642ff29e221f8737f4a.dask.pyperformance.json That said:
|
|
I think it isn't jitter but rather something else is up with the Dask benchmark. |
|
I'm closing it.
Neither early identity nor length checks are frequent enough in the code base, as proved by the Thank you @Fidget-Spinner @chris-eibl |
==and!=when tuple lenghts differ, to avoid a walk through the whole tupleSee below for more detailed benchmarks (9f2a34a v. c901539)
Significant (+ >= 1.05) improvements on pyperformance:
Significant regressions (+ <= 1.05) on pyperformance
I'm not exactly sure, but my hunch is that's the place hit in
asyncio_tcp:cpython/Lib/asyncio/events.py
Lines 182 to 188 in 5ec03cf
I couldn't reproduce this with a microbenchmark, though.
The
float("nan")behaviour is not changed:Benchmark
The script:
The results (with
--rigorous, on 0813448 v. 9f2a34a):The environment:
sudo ./python -m pyperf system tuneensured.pyperformance
The results:
pyperformance (without identity check)
Significant improvements
Significant regressions
UNPACK_SEQUENCE_TUPLEand_LISTshouldn't calltuple_richcompare; the difference is just 2.3ns which might be caused by jitter or by a branch predictor history)The results: