gh-132108: Add Buffer Protocol support to int.from_bytes to improve performance#132109
gh-132108: Add Buffer Protocol support to int.from_bytes to improve performance#132109vstinner merged 6 commits intopython:mainfrom
Conversation
Speed up conversion from `bytes-like` objects like `bytearray` while
keeping conversion from `bytes` stable.
On a `--with-lto --enable-optimizaitons` build on my 64 bit Linux box:
new:
from_bytes_flags: Mean +- std dev: 28.6 ns +- 0.5 ns
bench_convert[bytes]: Mean +- std dev: 50.4 ns +- 1.4 ns
bench_convert[bytearray]: Mean +- std dev: 51.3 ns +- 0.7 ns
old:
from_bytes_flags: Mean +- std dev: 28.1 ns +- 1.1 ns
bench_convert[bytes]: Mean +- std dev: 50.3 ns +- 4.3 ns
bench_convert[bytearray]: Mean +- std dev: 64.7 ns +- 0.9 ns
Benchmark code:
```python
import pyperf
import time
def from_bytes_flags(loops):
range_it = range(loops)
t0 = time.perf_counter()
for _ in range_it:
int.from_bytes(b'\x00\x10', byteorder='big')
int.from_bytes(b'\x00\x10', byteorder='little')
int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
int.from_bytes([255, 0, 0], byteorder='big')
return time.perf_counter() - t0
sample_bytes = [
b'',
b'\x00',
b'\x01',
b'\x7f',
b'\x80',
b'\xff',
b'\x01\x00',
b'\x7f\xff',
b'\x80\x00',
b'\xff\xff',
b'\x01\x00\x00',
]
sample_bytearray = [bytearray(v) for v in sample_bytes]
def bench_convert(loops, values):
range_it = range(loops)
t0 = time.perf_counter()
for _ in range_it:
for val in values:
int.from_bytes(val)
return time.perf_counter() - t0
runner = pyperf.Runner()
runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)
```
picnixz
left a comment
There was a problem hiding this comment.
Can we have benchmarks for very large bytes? maybe you can also say how much we're gaining in the NEWS entry that way.
|
Small question but how do we cope with classes that explicitly define Note that Instead, we should restrict ourselves to exact buffer objects, namely exact bytes and bytearray objects. |
I want to check that the edge cases are not an issue.
|
Cases including classes which implement As you point out, if code returns a different set of machine bytes when exporting buffer protocol vs Could match existing behavior by always checking for a Could restrict to known CPython types ( Walking through common types passed to
|
sobolevn
left a comment
There was a problem hiding this comment.
This is a breaking change. Example.
Before:
>>> class X(bytes):
... def __bytes__(self):
... return b'X'
...
>>> int.from_bytes(X(b'a'))
88After:
>>> class X(bytes):
... def __bytes__(self):
... return b'X'
...
>>> int.from_bytes(X(b'a'))
97| Speed up :meth:`int.from_bytes` when passed a bytes-like object such as a | ||
| :class:`bytearray`. |
There was a problem hiding this comment.
| Speed up :meth:`int.from_bytes` when passed a bytes-like object such as a | |
| :class:`bytearray`. | |
| Speed up :meth:`int.from_bytes` when passed a bytes-like object such as | |
| :class:`bytes` and :class:`bytearray`. |
There was a problem hiding this comment.
I do not think that it affects bytes.
Docs says:
If >>> class int2(int):
... def __float__(self):
... return 3.14
...
>>> float(int2(123))
3.14 |
|
This is an example that the method resolution order changes. It now ignores custom The reverse logic is true: PR's author must prove that it does not break things. |
I would say that if you expose something different via buffer protocol and |
|
I'll see if I can make a largely performance neutral version that checks >>> class distinct_bytes_buffer(bytes):
... def __bytes__(self):
... return b'b'
...
... def __buffer__(self, flags):
... return memoryview(b'c')
...
...
... class same_bytes_buffer(bytes):
... def __bytes__(self):
... return b'b'
...
... def __buffer__(self, flags):
... return memoryview(b'b')
...
>>> int.from_bytes(distinct_bytes_buffer(b'a'))
...
99
>>> int.from_bytes(same_bytes_buffer(b'a'))
...
98
>>> int.from_bytes(b'a')
...
97
>>> int.from_bytes(b'b')
...
98
>>> int.from_bytes(b'c')
...
99 |
|
Another edge case around these, >>> class my_bytes(bytes):
... def __bytes__(self):
... return b"bytes"
...
... def __buffer__(self, flags):
... return memoryview(b"buffer")
...
... class distinct_bytes_buffer(bytes):
... def __bytes__(self):
... return my_bytes(b"ob_sval")
...
... def __buffer__(self, flags):
... return memoryview(b"distinct_buffer")
...
... a = distinct_bytes_buffer(b"distinct_ob_sval")
... bytes(a)
...
b'ob_sval' |
| Speed up :meth:`int.from_bytes` when passed a bytes-like object such as a | ||
| :class:`bytearray`. |
There was a problem hiding this comment.
I do not think that it affects bytes.
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
|
Created a branch which matches resolution order of branch matching PyObject_Bytes order: So @sobolevn's example now returns the same value both before and after: >>> class X(bytes):
... def __bytes__(self):
... return b'X'
...
... int.from_bytes(X(b'a'))
...
88Should I incorporate here? (cc: @serhiy-storchaka, @sobolevn, @skirpichev) full diff from main: https://github.com/python/cpython/compare/main...cmaloney:cpython:exp/bytes_first?collapse=1 diff from PR: cmaloney@189f219 |
|
Also docs says: "The argument bytes must either be a bytes-like object or an iterable producing bytes." Something is wrong: either implementation (in the main) or docs. |
|
It may be an iterable producing bytes (not the bytes objects, but integers in the range 0 to 255). |
Yes, this part of the sentence might be at least not clear. But I meant the first part, which has a reference to the glossary term. |
from_bytes_flags: Mean +- std dev: [main] 28.3 ns +- 1.3 ns -> [exactbytes] 27.3 ns +- 0.3 ns: 1.04x faster bench_convert[bytearray]: Mean +- std dev: [main] 65.8 ns +- 3.3 ns -> [exactbytes] 53.1 ns +- 5.1 ns: 1.24x faster bench_convert_big[bytes]: Mean +- std dev: [main] 51.8 ns +- 0.6 ns -> [exactbytes] 50.3 ns +- 0.5 ns: 1.03x faster bench_convert_big[bytearray]: Mean +- std dev: [main] 65.8 ns +- 3.0 ns -> [exactbytes] 53.5 ns +- 5.3 ns: 1.23x faster Benchmark hidden because not significant (1): bench_convert[bytes]
|
Updated to use Updated benchmark code
import pyperf
import time
def from_bytes_flags(loops):
range_it = range(loops)
t0 = time.perf_counter()
for _ in range_it:
int.from_bytes(b'\x00\x10', byteorder='big')
int.from_bytes(b'\x00\x10', byteorder='little')
int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
int.from_bytes([255, 0, 0], byteorder='big')
return time.perf_counter() - t0
sample_bytes = [
b'',
b'\x00',
b'\x01',
b'\x7f',
b'\x80',
b'\xff',
b'\x01\x00',
b'\x7f\xff',
b'\x80\x00',
b'\xff\xff',
b'\x01\x00\x00',
]
sample_bytearray = [bytearray(v) for v in sample_bytes]
sample_big = [
b'\xff' * 128,
b'\xff' * 256,
b'\xff' * 512
]
sample_big_ba = [bytearray(v) for v in sample_bytes]
def bench_convert(loops, values):
range_it = range(loops)
t0 = time.perf_counter()
for _ in range_it:
for val in values:
int.from_bytes(val)
return time.perf_counter() - t0
runner = pyperf.Runner()
# Validate base bytes w/ flags doesn't change perf.
runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)
runner.bench_time_func('bench_convert_big[bytes]', bench_convert, sample_big, inner_loops=10)
runner.bench_time_func('bench_convert_big[bytearray]', bench_convert, sample_big_ba, inner_loops=10) |
|
Anything I can do to help close out this PR? |
|
ping @serhiy-storchaka : Trying to find ways to close out this PR. I'm happy to update this to match the past resolution order or leave as is if prior review stands. |
|
I think that we should fix inconsistencies in one of two ways:
I do not know what is better. We should open a discussion for this. |
|
A more general solution would definitely be nice. If I scope this change down to just do a |
gpshead
left a comment
There was a problem hiding this comment.
I think we could go ahead with this PR even while waiting on the larger discussion as it does make sense. Regardless, if you want to do a simpler version that just adds the PyBytes_CheckExact optimization for starters, also feel free. that is a pure optimization of existing behavior.
vstinner
left a comment
There was a problem hiding this comment.
LGTM. The latest PR is now backward compatible.
>>> class X(bytes):
... def __bytes__(self):
... return b'X'
...
>>> int.from_bytes(X(b'a'))
97
There's still a change in ordering between On 3.14.2: On 3.15 with this change: |
The "All required checks pass" CI failed. Merging main into this branch should fix the issue. |
It's a bug in the class if these two methods don't return the same contents.
I don't think that it's an issue. |
|
👍 to they should be the same / I'm happy shipping this as it is. |
|
Merged, thank you. |
…rove performance (python#132109) Speed up conversion from `bytes-like` objects like `bytearray` while keeping conversion from `bytes` stable. Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
Speed up conversion from
bytes-likeobjects likebytearraywhile keeping conversion frombytesstable.On a
--with-lto --enable-optimizaitonsbuild on my 64 bit Linux box:new:
old:
Benchmark code: