gh-142939: difflib.get_close_matches performance#142940
gh-142939: difflib.get_close_matches performance#142940hauntsaninja merged 2 commits intopython:mainfrom
Conversation
johnslavik
left a comment
There was a problem hiding this comment.
The code is now simpler, nice.
|
Hi, can you provide some microbenchmarks to ensure that there is really a performance boost with this change? It doesn’t need to be overly complex or highly precise. |
Doesn't my small benchmark cover it? Both changes are straight forward amendments that take place in a linear loop:
I don't think there is anything else that would provide additional information. |
|
Ah, sorry, I missed the message in the main thread. |
tim-one
left a comment
There was a problem hiding this comment.
Have to say I don't care about the speed of get_close_matches(). As the docs say,
See also function get_close_matches() in this module, which shows how
simple code building on SequenceMatcher can be used to do useful work.
If it were written today, it would be a mere "recipe" instead (but the docs didn't have such things way back when).
That said, the code changes are good. Does no harm 😉. But I like switching to "in" more for clarity than for speed. There's no reason I can imagine for why "in" will always be faster. Capturing a bound method object (dict.__contains__) avoids the runtime expense of finding it anew each time membership is tested. The greater speed now for in comes from avoiding the Python-level function call to invoke the membership testing via the bound method object.
Tomorrow that advantage may vanish, or ever reverse. One reason for why, in general, micro-optimizations cost more in human time than they're generally worth.
difflib.get_close_matchesperformance #142939