Skip to content

feat: support size() for MapType input#4580

Open
marvelshan wants to merge 4 commits into
apache:mainfrom
marvelshan:feat/size-map-support
Open

feat: support size() for MapType input#4580
marvelshan wants to merge 4 commits into
apache:mainfrom
marvelshan:feat/size-map-support

Conversation

@marvelshan
Copy link
Copy Markdown

Which issue does this PR close?

Closes #4472

Rationale for this change

Spark's Size expression accepts both ArrayType and MapType inputs, but Comet previously marked MapType as Unsupported, causing size() on maps to always fall back to Spark. The native Rust implementation already had DataType::Map handling in spark_size_array; the only missing pieces were the ScalarValue::Map arm in spark_size_scalar and the serde gate that blocked MapType from reaching native execution.

What changes are included in this PR?

  • Added ScalarValue::Map arm to spark_size_scalar in native/spark-expr/src/array_funcs/size.rs so the scalar code path also handles map inputs
  • Changed CometSize.getSupportLevel from Unsupported to Compatible() for MapType in spark/src/main/scala/org/apache/comet/serde/arrays.scala
  • Removed getUnsupportedReasons override (MapType is no longer unsupported) and the outdated reason string
  • Enabled test_spark_size_map_array Rust test (removed #[ignore], fixed MapArray::try_new API usage)
  • Changed size.sql SQL test from query spark_answer_only to query so size(m) runs through Comet
  • Updated posexplode.sql fallback reason to match the actual serde reason from Explode/GenerateExec
  • Promoted ignore("size with map input") to test("size with map input") in CometMapExpressionSuite, removed the now-invalid fallback test

How are these changes tested?

  • Rust unit test: test_spark_size_map_array verifies map size calculation (2 entries, 1 entry, empty map, null map returning -1)
cd native && cargo test --package datafusion-comet-spark-expr test_spark_size_map_array
  • SQL file test: size.sql now exercises size(m) through Comet native path
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite size" -Dtest=none
  • ScalaTest: CometMapExpressionSuite "size with map input" uses column references to avoid constant folding
./mvnw test -Dtest=none -Dsuites="org.apache.comet.CometMapExpressionSuite"
  • posexplode fallback reason update:
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite posexplode" -Dtest=none

@marvelshan
Copy link
Copy Markdown
Author

@andygrove

@marvelshan marvelshan force-pushed the feat/size-map-support branch from afd4792 to 58d254f Compare June 3, 2026 09:56
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marvelshan
Before running the CI, please include .sql tests to for the maptype size

Comment thread native/spark-expr/src/array_funcs/size.rs
Comment thread native/spark-expr/src/array_funcs/size.rs Outdated

query spark_answer_only
query
SELECT size(arr), size(m) FROM test_size
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add literal map cases below the existing literal-args query? Something like:

query
SELECT size(map('a', 1, 'b', 2)), size(map()), size(cast(NULL as map<string,int>))

That way the literal path is covered for both shapes. While you're here, a cardinality(m) query (Spark registers it as an alias for size) and a query with spark.sql.legacy.sizeOfNull=false would lock down the alias and the non-legacy null branch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think literal map is not supported yet

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — all three items added:

  1. Literal map cases: size(map(...)) queries added (0b3c451). Those going through CreateMap use spark_answer_only since Comet has no CreateMap serde yet; cast(NULL as map) uses default mode as CometLiteral supports null MapType.
  2. Cardinality alias: SELECT cardinality(arr), cardinality(m) FROM test_size added (0b3c451) — Spark registers it as Size in FunctionRegistry, so it goes through CometSize automatically.
  3. sizeOfNull=false: New size_legacy_off.sql with -- Config: spark.sql.legacy.sizeOfNull=false covering column-based, literal null, and cardinality queries under non-legacy semantics (0b3c451).

ScalarValue::Map unit tests also added per review (ff01bbc), and stale docs notes cleared (b1d177a).

Should I file a follow-up issue for CreateMap/MapType literal support so the literal map queries can be promoted to native execution later?

@andygrove
Copy link
Copy Markdown
Member

Thanks for tackling this. The fix is small and focused, and the cleanup on posexplode.sql (catching that the old fallback reason was unrelated to posexplode) was a nice find.

One docs gap to flag, since the file isn't in the diff: docs/source/user-guide/latest/expressions.md still says "MapType input falls back" for both size (line 189) and cardinality (line 186). After this PR that caveat is no longer accurate. cardinality is registered as an alias for Size in Spark (FunctionRegistry.scala), so this PR transparently enables it too. Could you clear those notes in the same PR?

Inline comments cover smaller test-coverage suggestions.

@marvelshan marvelshan force-pushed the feat/size-map-support branch 2 times, most recently from d36788f to d4d53d8 Compare June 4, 2026 03:04
@marvelshan marvelshan force-pushed the feat/size-map-support branch 2 times, most recently from ecb7b5e to 8591ee8 Compare June 4, 2026 03:55
@marvelshan marvelshan force-pushed the feat/size-map-support branch from 8591ee8 to b1d177a Compare June 4, 2026 06:31
@marvelshan
Copy link
Copy Markdown
Author

marvelshan commented Jun 4, 2026

Done, cleared the "MapType input falls back" notes for both size and cardinality in expressions.md

@marvelshan marvelshan changed the title feat: support size() for MapType input (#4472) feat: support size() for MapType input Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] support size() for MapType inputs

3 participants