Skip to content

PERF: speed up tab completion for DataFrame/Series with a large index#65277

Open
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-18587
Open

PERF: speed up tab completion for DataFrame/Series with a large index#65277
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-18587

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

@jbrockmendel jbrockmendel commented Apr 18, 2026

I'm on the fence as to whether this is a big enough deal to merit special-casing.

Summary

  • closes PERF: tab completion with a large index #18587
  • Avoid hashing the entire Index inside _dir_additions_for_owner when only the first display.max_dir_items unique values are needed. First try a bounded prefix (max_items * 10); fall back to full unique() only when the prefix is duplicate-heavy, so behavior is preserved for pathological sorted-duplicate cases (e.g. MultiIndex level 0).
  • First dir(df) for a DataFrame with 1M unique string columns goes from ~115 ms to ~17 ms on my machine. Non-unique and sorted-duplicate cases are unchanged.
  • Adds an ASV Dir benchmark (cold path via number=1 + _cache={}) and a whatsnew entry.

Test plan

  • pytest pandas/tests/frame/test_api.py pandas/tests/series/test_api.py pandas/tests/indexes/test_base.py
  • test_display_max_dir_items covers the default cap, a custom cap, and None (unlimited)
  • pre-commit run on the changed files

🤖 Generated with Claude Code

… (GH#18587)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: tab completion with a large index

1 participant