Pandas version checks
Reproducible Example
import pandas as pd
import numpy as np
cols = pd.MultiIndex.from_tuples(
[('info', 'M'), ('info', 0),
('earnings', 1), ('earnings', 2),
('prices', 0)]
)
df = pd.DataFrame(
np.arange(20, dtype=float).reshape(4, 5),
columns=cols,
)
df['earnings'] = df['earnings'] / 100 # works
df['prices'] = df['prices'] / 100 # silent no-op — no error, no warning
print(df)
# info earnings prices
# M 0 1 2 0
# 0 0.0 1.0 0.02 0.03 4.0
# 1 5.0 6.0 0.07 0.08 9.0
# 2 10.0 11.0 0.12 0.13 14.0
# 3 15.0 16.0 0.17 0.18 19.0
#
# 'earnings' was divided by 100; 'prices' was not.
Issue Description
Disclaimer: In writing this bug report I have used AI. This bug was found because the same code has a different behaviour in version 2.2.3 as in 3.0.1. I have verified the reproducible example in both latest versions, 2.3.3 vs. 3.0.2. I'm familiar with the copy on write changes, but I'm far from a pandas expert.
Top-level __setitem__ on a column MultiIndex silently discards the assignment — no error, no warning — when all three of the following conditions hold:
- The DataFrame has a column MultiIndex.
- The second level has
object dtype due to mixed element types. In the example above, 'M' (string) is on level 1 alongside integers 0, 1, 2, which forces df.columns.levels[1] to object.
- The top-level label being assigned to contains exactly one sub-column (here
('prices', 0)). Top-level labels with multiple sub-columns (here earnings with 1 and 2) are assigned correctly.
I isolated the trigger by varying level 1:
| level 1 |
multi sub-col (earnings) |
single sub-col (prices) |
all int [0, 1, 2] |
✅ divided |
✅ divided |
all string ['M', '0', '1', '2'] |
✅ divided |
✅ divided |
mixed [0, 1, 2, 'M'] |
✅ divided |
❌ silent no-op |
So the bug is specifically the combination of an object-dtype level and a single-element top-level group going through the frame-assignment path.
This is a regression from 2.3.3 → 3.0.x. The same code divides prices correctly on 2.3.3 and silently drops the write on 3.0.2.
Workarounds that do work on 3.0.2:
df[('prices', 0)] = df[('prices', 0)] / 100 # tuple form — OK
df.loc[:, ('prices', 0)] = df.loc[:, ('prices', 0)] / 100 # OK
Workarounds that also silently fail or corrupt data on 3.0.2:
df['prices'] /= 100 # silent no-op
df.loc[:, 'prices'] = df.loc[:, 'prices'] / 100 # fills 'prices' with NaN
Given this is silent data loss on a plain (non-chained) __setitem__, it seems clearly unintended rather than a CoW semantic change.
Expected Behavior
df['prices'] = df['prices'] / 100 should either divide the values in place, raise, or warn. It should not silently discard the assignment. The presence of a string label elsewhere on level 1, or the number of sub-columns under the selected top-level label, should not affect whether the assignment takes effect.
Installed Versions
Details
My own verification:
INSTALLED VERSIONS
------------------
commit : ab90747e3dae0e69b1bdbf083820b8075689b34b
python : 3.12.13
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.26200
machine : AMD64
processor : AMD64 Family 25 Model 117 Stepping 2, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : English_Netherlands.1252
pandas : 3.0.2
numpy : 2.4.4
dateutil : 2.9.0.post0
pip : None
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : None
pyiceberg : None
pyreadstat : None
pytest : None
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : None
pyqt5 : None
Verification in AI sandbox:
INSTALLED VERSIONS
------------------
commit : ab90747e3dae0e69b1bdbf083820b8075689b34b
python : 3.12.3
python-bits : 64
OS : Linux
OS-release : 4.4.0
Version : #1 SMP Sun Jan 10 15:06:54 PST 2016
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : None
LOCALE : C.UTF-8
pandas : 3.0.2
numpy : 2.4.4
dateutil : 2.9.0.post0
pip : 24.0
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : None
pyiceberg : None
pyreadstat : None
pytest : None
python-calamine : None
pytz : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : None
pyqt5 : None
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Top-level
__setitem__on a column MultiIndex silently discards the assignment — no error, no warning — when all three of the following conditions hold:objectdtype due to mixed element types. In the example above,'M'(string) is on level 1 alongside integers0, 1, 2, which forcesdf.columns.levels[1]toobject.('prices', 0)). Top-level labels with multiple sub-columns (hereearningswith1and2) are assigned correctly.I isolated the trigger by varying level 1:
earnings)prices)[0, 1, 2]['M', '0', '1', '2'][0, 1, 2, 'M']So the bug is specifically the combination of an object-dtype level and a single-element top-level group going through the frame-assignment path.
This is a regression from 2.3.3 → 3.0.x. The same code divides
pricescorrectly on 2.3.3 and silently drops the write on 3.0.2.Workarounds that do work on 3.0.2:
Workarounds that also silently fail or corrupt data on 3.0.2:
Given this is silent data loss on a plain (non-chained)
__setitem__, it seems clearly unintended rather than a CoW semantic change.Expected Behavior
df['prices'] = df['prices'] / 100should either divide the values in place, raise, or warn. It should not silently discard the assignment. The presence of a string label elsewhere on level 1, or the number of sub-columns under the selected top-level label, should not affect whether the assignment takes effect.Installed Versions
Details
My own verification:
Verification in AI sandbox: