BUG: Index.take respects fill_value and allows fills on integer dtypes#65268
BUG: Index.take respects fill_value and allows fills on integer dtypes#65268jbrockmendel wants to merge 5 commits intopandas-dev:mainfrom
Conversation
Index.take previously silently discarded the caller's fill_value in favor of self._na_value, and raised ValueError on integer/range dtypes via the _can_hold_na gate. Respect the caller's fill_value and let the underlying array handle type promotion. Simplify internal workaround sites in frame.py idxmin/idxmax, reshape/merge.py, indexes/base.py reset_index, io/formats/excel.py, and indexes/multi.py that previously bypassed Index.take. Ref GH#65210. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Generally looks good! Some comments/questions
| -1 are regarded as NA. If Index doesn't hold NA, raise ValueError. | ||
| -1 are filled with ``fill_value``. If the Index's dtype cannot hold | ||
| ``fill_value``, the result is promoted (e.g. an integer Index is | ||
| cast to float or object). |
There was a problem hiding this comment.
Which of the two is it?
There was a problem hiding this comment.
depends on fill_value, will update to clarify
| if isinstance(values, np.ndarray): | ||
| taken = algos.take( | ||
| values, indices, allow_fill=allow_fill, fill_value=self._na_value | ||
| values, indices, allow_fill=allow_fill, fill_value=fill_value |
There was a problem hiding this comment.
Shouldn't fill_value get replaces with self._na_value if it is None (which should probably better be no_default)
Or is that done under the hood in the algos function?
There was a problem hiding this comment.
agreed, will update
| "all indices must be >= -1" | ||
| ) | ||
| else: | ||
| allow_fill = False |
There was a problem hiding this comment.
Can you add a comment why this is being done?
There was a problem hiding this comment.
(essentially what was in the docstring of _maybe_disallow_fill)
| result = result.putmask(mask, np.nan) | ||
|
|
||
| result = lev.take( | ||
| new_codes[0], allow_fill=True, fill_value=lev._na_value |
There was a problem hiding this comment.
The fill_value=lev._na_value could (in the future) be avoided if we would change the default of allow_fill?
| fill_value=None, | ||
| **kwargs, | ||
| ) -> Self: | ||
| """ |
There was a problem hiding this comment.
Can you keep this docstring?
(is the removal related to the fact that we stopped using the Appender / Substitution solution?)
- Use lib.no_default as the Index.take fill_value sentinel so fill_value=None can take its natural ExtensionArray meaning (substitute self._na_value) without colliding with "no fill_value supplied" - Simplify internal call sites (_drop_level_numbers, MultiIndex, idxmin/idxmax, excel formatter) to use fill_value=None instead of passing self._na_value explicitly - Restore the DatetimeTimedeltaMixin.take docstring that was accidentally dropped - Clarify the Index.take docstring around dtype promotion and add a comment explaining the no-explicit-fill branch
Make allow_fill default to lib.no_default so we can distinguish "user explicitly asked for fill" from "user didn't specify". With the new semantics, explicit allow_fill=True (without fill_value) substitutes self._na_value, matching the docstring's stated intent; bare idx.take([-1]) still wraps (numpy-style) for backward compatibility. fill_value=<scalar> still opts into fill even when allow_fill is not supplied. Apply the same signature update to DatetimeTimedeltaMixin.take and MultiIndex.take, and simplify the internal call sites (_drop_level_numbers, MultiIndex.get_level_values, DataFrame.idxmin/ idxmax, ExcelFormatter) to use just allow_fill=True.
Summary
Fixes two bugs in
Index.takewhenallow_fill=Trueandfill_valueis passed:fill_valuewas silently discarded in favor ofself._na_value.ValueError: Unable to fill values because Index cannot contain NA, even when the caller supplied a valid in-dtype fill.This PR:
fill_value._can_hold_nagate; the underlying array handles type promotion.Index.takebecause of those bugs:frame.py(idxmin/idxmax),reshape/merge.py,indexes/base.py(reset_index),io/formats/excel.py, andindexes/multi.py.Backwards-compatible for bare
idx.take(indices)and forallow_fill=Truewithfill_value=None.Opening as a draft — still to do before ready for review:
Index.takedocstring (still references the removed "doesn't hold NA" raise)indices < -1ValueError pathcloses #65210
Test plan
pandas/tests/indexes/{categorical,datetimes,multi,numeric,period,ranges,timedeltas}/test_indexing.pyandtest_base.pypandas/tests/frame/methods/test_reset_index.py,tests/reshape/merge,tests/io/excel🤖 Generated with Claude Code