Skip to content

BUG: Index.take respects fill_value and allows fills on integer dtypes#65268

Open
jbrockmendel wants to merge 5 commits intopandas-dev:mainfrom
jbrockmendel:enh-index-take
Open

BUG: Index.take respects fill_value and allows fills on integer dtypes#65268
jbrockmendel wants to merge 5 commits intopandas-dev:mainfrom
jbrockmendel:enh-index-take

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

Summary

Fixes two bugs in Index.take when allow_fill=True and fill_value is passed:

  1. The caller's fill_value was silently discarded in favor of self._na_value.
  2. Integer/range dtypes raised ValueError: Unable to fill values because Index cannot contain NA, even when the caller supplied a valid in-dtype fill.

This PR:

  • Respects the caller's fill_value.
  • Removes the _can_hold_na gate; the underlying array handles type promotion.
  • Simplifies internal workaround sites that previously bypassed Index.take because of those bugs: frame.py (idxmin/idxmax), reshape/merge.py, indexes/base.py (reset_index), io/formats/excel.py, and indexes/multi.py.

Backwards-compatible for bare idx.take(indices) and for allow_fill=True with fill_value=None.

Opening as a draft — still to do before ready for review:

  • Whatsnew entry
  • Update the Index.take docstring (still references the removed "doesn't hold NA" raise)
  • Restore a MultiIndex test for the indices < -1 ValueError path

closes #65210

Test plan

  • pandas/tests/indexes/{categorical,datetimes,multi,numeric,period,ranges,timedeltas}/test_indexing.py and test_base.py
  • pandas/tests/frame/methods/test_reset_index.py, tests/reshape/merge, tests/io/excel

🤖 Generated with Claude Code

jbrockmendel and others added 3 commits April 16, 2026 18:18
Index.take previously silently discarded the caller's fill_value in
favor of self._na_value, and raised ValueError on integer/range dtypes
via the _can_hold_na gate. Respect the caller's fill_value and let the
underlying array handle type promotion.

Simplify internal workaround sites in frame.py idxmin/idxmax,
reshape/merge.py, indexes/base.py reset_index, io/formats/excel.py, and
indexes/multi.py that previously bypassed Index.take.

Ref GH#65210.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel marked this pull request as ready for review April 17, 2026 16:51
Copy link
Copy Markdown
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good! Some comments/questions

Comment thread pandas/core/indexes/base.py Outdated
-1 are regarded as NA. If Index doesn't hold NA, raise ValueError.
-1 are filled with ``fill_value``. If the Index's dtype cannot hold
``fill_value``, the result is promoted (e.g. an integer Index is
cast to float or object).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which of the two is it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on fill_value, will update to clarify

if isinstance(values, np.ndarray):
taken = algos.take(
values, indices, allow_fill=allow_fill, fill_value=self._na_value
values, indices, allow_fill=allow_fill, fill_value=fill_value
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't fill_value get replaces with self._na_value if it is None (which should probably better be no_default)
Or is that done under the hood in the algos function?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, will update

Comment thread pandas/core/indexes/base.py Outdated
"all indices must be >= -1"
)
else:
allow_fill = False
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment why this is being done?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(essentially what was in the docstring of _maybe_disallow_fill)

Comment thread pandas/core/indexes/base.py Outdated
result = result.putmask(mask, np.nan)

result = lev.take(
new_codes[0], allow_fill=True, fill_value=lev._na_value
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fill_value=lev._na_value could (in the future) be avoided if we would change the default of allow_fill?

fill_value=None,
**kwargs,
) -> Self:
"""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep this docstring?

(is the removal related to the fact that we stopped using the Appender / Substitution solution?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

- Use lib.no_default as the Index.take fill_value sentinel so
  fill_value=None can take its natural ExtensionArray meaning
  (substitute self._na_value) without colliding with "no fill_value
  supplied"
- Simplify internal call sites (_drop_level_numbers, MultiIndex,
  idxmin/idxmax, excel formatter) to use fill_value=None instead of
  passing self._na_value explicitly
- Restore the DatetimeTimedeltaMixin.take docstring that was
  accidentally dropped
- Clarify the Index.take docstring around dtype promotion and add a
  comment explaining the no-explicit-fill branch
Make allow_fill default to lib.no_default so we can distinguish "user
explicitly asked for fill" from "user didn't specify". With the new
semantics, explicit allow_fill=True (without fill_value) substitutes
self._na_value, matching the docstring's stated intent; bare
idx.take([-1]) still wraps (numpy-style) for backward compatibility.
fill_value=<scalar> still opts into fill even when allow_fill is not
supplied.

Apply the same signature update to DatetimeTimedeltaMixin.take and
MultiIndex.take, and simplify the internal call sites
(_drop_level_numbers, MultiIndex.get_level_values, DataFrame.idxmin/
idxmax, ExcelFormatter) to use just allow_fill=True.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API: Index.take discards fill_value and raises unnecessarily

2 participants