Feature Type
Problem Description
When creating a contingency table using pd.crosstab, if a user specifies normalize='index' while also requesting margins=True, the resulting DataFrame includes the marginal row (at the bottom) but excludes the marginal column (on the right). While the marginal column would mathematically consist of constants (1.0), its omission prevents the generation of a complete "framed" table for reporting and visualization purposes.
It is normal in data science to display both row and column totals in a crosstab. If you select row or column as the sum, it's expect to show 100% on the non-selected column, this is supported by the grand total selection where only the total in the bottom right is 100%. This helps users quick understand what you are trying to convey and reinforces the choice of display
The default should be to hide the result which is 1.0
Feature Description
def _add_margins(self, table, margins_name):
# CURRENT BEHAVIOR:
# Pandas calculates the sum of columns and rows.
# If normalize='index', the row sum is always 1.0.
# PROPOSED FIX:
# Explicitly check if normalize is 'index' or 'columns'
# to prevent the 'Total' column/row from being optimized away.
if self.normalize == 'index':
# Force add the marginal column of 1.0s
table[margins_name] = 1.0
# Calculate the marginal row (the distribution of the columns)
# This is the mean of the normalized rows or a separate total calculation
column_totals = table.iloc[:, :-1].mean(axis=0)
table.loc[margins_name] = column_totals
table.at[margins_name, margins_name] = 1.0
return table
Alternative Solutions
none
Additional Context
No response
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
When creating a contingency table using pd.crosstab, if a user specifies normalize='index' while also requesting margins=True, the resulting DataFrame includes the marginal row (at the bottom) but excludes the marginal column (on the right). While the marginal column would mathematically consist of constants (1.0), its omission prevents the generation of a complete "framed" table for reporting and visualization purposes.
It is normal in data science to display both row and column totals in a crosstab. If you select row or column as the sum, it's expect to show 100% on the non-selected column, this is supported by the grand total selection where only the total in the bottom right is 100%. This helps users quick understand what you are trying to convey and reinforces the choice of display
The default should be to hide the result which is 1.0
Feature Description
def _add_margins(self, table, margins_name):
# CURRENT BEHAVIOR:
# Pandas calculates the sum of columns and rows.
# If normalize='index', the row sum is always 1.0.
Alternative Solutions
none
Additional Context
No response