Fix rank mismatch in MaxText synthetic data sharding by lukebaumann · Pull Request #4122 · AI-Hypercomputer/maxtext

lukebaumann · 2026-06-09T22:16:59Z

Description

This PR fixes a rank mismatch issue in MaxText synthetic data sharding during data loading.

Root Cause

SyntheticDataIterator was using the legacy config.data_sharding which resolved to a 1D sharding spec P(('data', 'fsdp')) (after filtering). When applied to 2D output tensors of shape (batch, seq), JAX sharding validation failed with AssertionError: (1, 2) (rank mismatch) on JAX builds that strictly enforce this check.

Solution

Modified SyntheticDataIterator to use sharding.get_input_data_sharding(config, mesh). This helper uses config.input_data_sharding_logical_axes which correctly resolves to a 2D sharding spec P(('data', 'fsdp'), None), matching the rank of the output tensors.

Also removed the unused PartitionSpec as P import in synthetic_data_processing.py.

Tests

Added a new unit test tests/unit/synthetic_data_test.py which:

Forces 4 CPU devices.
Creates a 2x2 mesh.
Initializes SyntheticDataIterator with llama3.1-8b config.
Verifies the output shape is (8, 16) and sharding is exactly P(('data', 'fsdp'), None).

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-06-09T22:21:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

* Change SyntheticDataIterator to use get_input_data_sharding instead of manual 1D sharding. * This ensures the sharding spec is 2D, matching the rank of the output tensors. * Fixes AssertionError: (1, 2) in JAX sharding validation on some JAX builds. * Remove unused PartitionSpec import in synthetic_data_processing.py. * Add unit test `tests/unit/synthetic_data_test.py` to verify synthetic data sharding.

NuojCheng · 2026-06-09T22:40:28Z

    self.config = config
-    data_pspec = sharding.remove_size_one_mesh_axis(P(*config.data_sharding), mesh)
-    data_pspec_shardings = jax.tree_util.tree_map(lambda p: jax.sharding.NamedSharding(mesh, p), data_pspec)
+    data_pspec_shardings = sharding.get_input_data_sharding(config, mesh)


love you are using functions in sharding.py!

NuojCheng · 2026-06-09T22:43:57Z

+        "enable_checkpointing": False,
+        "dataset_type": "synthetic",
+        "model_name": "llama3.1-8b",
+        "max_target_length": 16,


could you add another testing instance using explicit sharding? Basically everything are same except adding shard_mode=explicit? Explicit sharding data pipeline has been broken but never get protected.

lukebaumann requested review from NicoGrande, SurbhiJainUSC, aireenmei, darisoy, richjames0 and shralex as code owners June 9, 2026 22:17

lukebaumann force-pushed the fix-synthetic-sharding branch from c7f9815 to 91b79f9 Compare June 9, 2026 22:30

lukebaumann requested review from A9isha, NuojCheng, RissyRan, abhinavclemson, bvandermoon, dipannita08, gagika, gobbleturk, hengtaoguo, igorts-git, jesselu-google, jiangjy1982, khatwanimohit, suexu1025 and vipannalla as code owners June 9, 2026 22:30

NuojCheng reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rank mismatch in MaxText synthetic data sharding#4122

Fix rank mismatch in MaxText synthetic data sharding#4122
lukebaumann wants to merge 1 commit into
AI-Hypercomputer:mainfrom
lukebaumann:fix-synthetic-sharding

lukebaumann commented Jun 9, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

NuojCheng Jun 9, 2026

Uh oh!

NuojCheng Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lukebaumann commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root Cause

Solution

Tests

Checklist

Uh oh!

codecov Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

NuojCheng Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

NuojCheng Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lukebaumann commented Jun 9, 2026 •

edited

Loading

codecov Bot commented Jun 9, 2026 •

edited

Loading