Skip to content

Commit e488932

Browse files
committed
gh-148729: use memchr in SRE prefix scan
For single byte characters use `memchr` instead of the equivalent hand-written while loop. This ensures that `re.search` is typically vectorized through libc for regexes starting with a `LITERAL`. In the no-match case this means 16 or 32 bytes per iterations instead of a single byte (ok, it was unrolled, but not auto-vectorized). Signed-off-by: Harmen Stoppels <harmenstoppels@gmail.com>
1 parent 7ce737e commit e488932

File tree

2 files changed

+18
-0
lines changed

2 files changed

+18
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Optimize prefix search for regular expressions starting with literals using
2+
``memchr()``. For single-byte character strings, the internal scanning loop
3+
now delegates to the C library, which is typically vectorized.

Modules/_sre/sre_lib.h

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1753,10 +1753,17 @@ SRE(search)(SRE_STATE* state, SRE_CODE* pattern)
17531753
end = (SRE_CHAR *)state->end;
17541754
state->must_advance = 0;
17551755
while (ptr < end) {
1756+
#if SIZEOF_SRE_CHAR == 1
1757+
ptr = (SRE_CHAR *)memchr(ptr, c, end - ptr);
1758+
if (!ptr) {
1759+
return 0;
1760+
}
1761+
#else
17561762
while (*ptr != c) {
17571763
if (++ptr >= end)
17581764
return 0;
17591765
}
1766+
#endif
17601767
TRACE(("|%p|%p|SEARCH LITERAL\n", pattern, ptr));
17611768
state->start = ptr;
17621769
state->ptr = ptr + prefix_skip;
@@ -1786,10 +1793,18 @@ SRE(search)(SRE_STATE* state, SRE_CODE* pattern)
17861793
#endif
17871794
while (ptr < end) {
17881795
SRE_CHAR c = (SRE_CHAR) prefix[0];
1796+
#if SIZEOF_SRE_CHAR == 1
1797+
ptr = (SRE_CHAR *)memchr(ptr, c, end - ptr);
1798+
if (!ptr) {
1799+
return 0;
1800+
}
1801+
ptr++;
1802+
#else
17891803
while (*ptr++ != c) {
17901804
if (ptr >= end)
17911805
return 0;
17921806
}
1807+
#endif
17931808
if (ptr >= end)
17941809
return 0;
17951810

0 commit comments

Comments
 (0)