Skip to content

ENH: pd.crosstab with normalize='index' and margins=True drops the marginal 'Total' column. #65249

@creilly94010

Description

@creilly94010

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

When creating a contingency table using pd.crosstab, if a user specifies normalize='index' while also requesting margins=True, the resulting DataFrame includes the marginal row (at the bottom) but excludes the marginal column (on the right). While the marginal column would mathematically consist of constants (1.0), its omission prevents the generation of a complete "framed" table for reporting and visualization purposes.

It is normal in data science to display both row and column totals in a crosstab. If you select row or column as the sum, it's expect to show 100% on the non-selected column, this is supported by the grand total selection where only the total in the bottom right is 100%. This helps users quick understand what you are trying to convey and reinforces the choice of display

The default should be to hide the result which is 1.0

Feature Description

def _add_margins(self, table, margins_name):
# CURRENT BEHAVIOR:
# Pandas calculates the sum of columns and rows.
# If normalize='index', the row sum is always 1.0.

# PROPOSED FIX:
# Explicitly check if normalize is 'index' or 'columns' 
# to prevent the 'Total' column/row from being optimized away.

if self.normalize == 'index':
    # Force add the marginal column of 1.0s
    table[margins_name] = 1.0 
    
    # Calculate the marginal row (the distribution of the columns)
    # This is the mean of the normalized rows or a separate total calculation
    column_totals = table.iloc[:, :-1].mean(axis=0) 
    table.loc[margins_name] = column_totals
    table.at[margins_name, margins_name] = 1.0
    
return table

Alternative Solutions

none

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions