gh-138407: avoid useless join operation when calculate the hash vaule for pathlib.Path#138645
gh-138407: avoid useless join operation when calculate the hash vaule for pathlib.Path#138645Zheaoli wants to merge 4 commits intopython:mainfrom
Conversation
… vaule for pathlib.Path Signed-off-by: Manjusaka <me@manjusaka.me>
Signed-off-by: Manjusaka <me@manjusaka.me>
|
Please check.
Benchmark this with paths of various lengths (very short and very long) and with many arguments to |
Make sense, I will update benchmark later(I will fix failed CI first) |
|
FWIW the tests are failing because you need to case-normalize the path on Windows. This is difficult to optimize because it depends on the whether the user calls
|
|
To add another wrinkle: sometimes paths are instantiated with their normalized string representation already set, e.g. this is true for the results of |
|
I have made a full benchmark test The following is the result 🧪 Available Tests
=� Benchmark DetailsThe benchmark tests hash performance on single-level paths (
=� Performance ResultsLocal Benchmark ResultsSummary: The optimized implementation shows a 1.44x performance improvement (44% faster) for short single-level path hashing operations. Long Path Benchmark ResultsSummary: The optimized implementation shows a 1.34x performance improvement (34% faster) for long single-level path hashing operations. Deep Path Benchmark ResultsSummary: The optimized implementation shows a 1.12x performance improvement (12% faster) for 512-level deep path hashing operations. Deep Long Path Benchmark ResultsSummary: The optimized implementation shows a 1.06x performance regression (6% slower) for 512-level deep paths with 10-character component names. Operation Pattern Benchmark ResultsTesting different operation patterns on short single-level paths ( 1. hash(p) only - 1.44x faster ✅2. str(p) only - Equal performance ✅3. str(p); hash(p) - 1.12x faster ❌ (Expected: Slower)4. hash(p); str(p) - 1.06x faster ❌ (Expected: Slower) |
|
Here's full benchmark repo https://github.com/Zheaoli/pathlib-benchmark |
|
I'm struggling to understand why the new code is faster for |
|
I'm not sure if it's faster, but this is the simplest way I can think of to implement def __hash__(self):
try:
return self._hash
except AttributeError:
if self.parser is posixpath:
self._hash = hash((self.root, tuple(self._tail)))
else:
self._hash = hash((self.drive.lower(),
self.root.lower(),
tuple([part.lower() for part in self._tail])))
return self._hashWould you mind running that through your benchmarks please? |
Sorry for reply late. I'm working on PyCon China 2025 last week. I will update this PR ASAP I can |
Co-authored-by: Barney Gale <barney.gale@gmail.com> Signed-off-by: Manjusaka <me@manjusaka.me>
|
@barneygale Here's new benchmark 📊 Benchmark DetailsThe benchmark tests hash performance on single-level paths (
📈 Performance ResultsShort Single-Level Path ResultsSummary: The optimized implementation shows a 1.55x performance improvement (55% faster) for short single-level path hashing operations. Long Single-Level Path ResultsSummary: The optimized implementation shows a 1.46x performance improvement (46% faster) for long single-level path hashing operations. Deep Path ResultsSummary: The optimized implementation shows a 1.13x performance improvement (13% faster) for 512-level deep path hashing operations. Deep Long Path ResultsSummary: The optimized implementation shows a 1.02x performance regression (2% slower) for 512-level deep paths with 10-character component names. Operation Pattern ResultsTesting different operation patterns on short single-level paths ( 1. hash(p) only - 1.55x faster ✅2. str(p) only - Equal performance ✅3. str(p); hash(p) - 1.18x faster ✅4. hash(p); str(p) - 1.15x faster ✅ |
|
@barneygale PTAL when you have time |
|
This PR is stale because it has been open for 30 days with no activity. |
Signed-off-by: Manjusaka me@manjusaka.me<!--
Try to fix #138407
The benchmark:
Before this patch
After