Skip to content

BUG: Regression - df[key] = df[key] / x silently drops assignment on MultiIndex columns with mixed-dtype level and single sub-column #65118

@aukeschaap

Description

@aukeschaap

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

cols = pd.MultiIndex.from_tuples(
    [('info', 'M'), ('info', 0),
     ('earnings', 1), ('earnings', 2),
     ('prices', 0)]
)
df = pd.DataFrame(
    np.arange(20, dtype=float).reshape(4, 5),
    columns=cols,
)

df['earnings'] = df['earnings'] / 100   # works
df['prices']   = df['prices']   / 100   # silent no-op — no error, no warning

print(df)
#    info       earnings       prices
#       M     0        1     2      0
# 0   0.0   1.0     0.02  0.03    4.0
# 1   5.0   6.0     0.07  0.08    9.0
# 2  10.0  11.0     0.12  0.13   14.0
# 3  15.0  16.0     0.17  0.18   19.0
#
# 'earnings' was divided by 100; 'prices' was not.

Issue Description

Disclaimer: In writing this bug report I have used AI. This bug was found because the same code has a different behaviour in version 2.2.3 as in 3.0.1. I have verified the reproducible example in both latest versions, 2.3.3 vs. 3.0.2. I'm familiar with the copy on write changes, but I'm far from a pandas expert.

Top-level __setitem__ on a column MultiIndex silently discards the assignment — no error, no warning — when all three of the following conditions hold:

  1. The DataFrame has a column MultiIndex.
  2. The second level has object dtype due to mixed element types. In the example above, 'M' (string) is on level 1 alongside integers 0, 1, 2, which forces df.columns.levels[1] to object.
  3. The top-level label being assigned to contains exactly one sub-column (here ('prices', 0)). Top-level labels with multiple sub-columns (here earnings with 1 and 2) are assigned correctly.

I isolated the trigger by varying level 1:

level 1 multi sub-col (earnings) single sub-col (prices)
all int [0, 1, 2] ✅ divided ✅ divided
all string ['M', '0', '1', '2'] ✅ divided ✅ divided
mixed [0, 1, 2, 'M'] ✅ divided ❌ silent no-op

So the bug is specifically the combination of an object-dtype level and a single-element top-level group going through the frame-assignment path.

This is a regression from 2.3.3 → 3.0.x. The same code divides prices correctly on 2.3.3 and silently drops the write on 3.0.2.

Workarounds that do work on 3.0.2:

df[('prices', 0)] = df[('prices', 0)] / 100          # tuple form — OK
df.loc[:, ('prices', 0)] = df.loc[:, ('prices', 0)] / 100   # OK

Workarounds that also silently fail or corrupt data on 3.0.2:

df['prices'] /= 100                                   # silent no-op
df.loc[:, 'prices'] = df.loc[:, 'prices'] / 100       # fills 'prices' with NaN

Given this is silent data loss on a plain (non-chained) __setitem__, it seems clearly unintended rather than a CoW semantic change.

Expected Behavior

df['prices'] = df['prices'] / 100 should either divide the values in place, raise, or warn. It should not silently discard the assignment. The presence of a string label elsewhere on level 1, or the number of sub-columns under the selected top-level label, should not affect whether the assignment takes effect.

Installed Versions

Details

My own verification:

INSTALLED VERSIONS
------------------
commit                : ab90747e3dae0e69b1bdbf083820b8075689b34b
python                : 3.12.13
python-bits           : 64
OS                    : Windows
OS-release            : 11
Version               : 10.0.26200
machine               : AMD64
processor             : AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : English_Netherlands.1252

pandas                : 3.0.2
numpy                 : 2.4.4
dateutil              : 2.9.0.post0
pip                   : None
Cython                : None
sphinx                : None
IPython               : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
fastparquet           : None
fsspec                : None
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : None
lxml.etree            : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
psycopg2              : None
pymysql               : None
pyarrow               : None
pyiceberg             : None
pyreadstat            : None
pytest                : None
python-calamine       : None
pytz                  : None
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
qtpy                  : None
pyqt5                 : None

Verification in AI sandbox:

INSTALLED VERSIONS
------------------
commit                : ab90747e3dae0e69b1bdbf083820b8075689b34b
python                : 3.12.3
python-bits           : 64
OS                    : Linux
OS-release            : 4.4.0
Version               : #1 SMP Sun Jan 10 15:06:54 PST 2016
machine               : x86_64
processor             : x86_64
byteorder             : little
LC_ALL                : None
LANG                  : None
LOCALE                : C.UTF-8

pandas                : 3.0.2
numpy                 : 2.4.4
dateutil              : 2.9.0.post0
pip                   : 24.0
Cython                : None
sphinx                : None
IPython               : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
fastparquet           : None
fsspec                : None
html5lib              : None
hypothesis            : None
gcsfs                 : None
jinja2                : None
lxml.etree            : None
matplotlib            : None
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
psycopg2              : None
pymysql               : None
pyarrow               : None
pyiceberg             : None
pyreadstat            : None
pytest                : None
python-calamine       : None
pytz                  : None
pyxlsb                : None
s3fs                  : None
scipy                 : None
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
xlsxwriter            : None
zstandard             : None
qtpy                  : None
pyqt5                 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingRelated to indexing on series/frames, not to indexes themselvesMultiIndexRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions