Skip to content

[python] Fix null blob write#7901

Open
XiaoHongbo-Hope wants to merge 5 commits into
apache:masterfrom
XiaoHongbo-Hope:fix-null-blob-write-via-public-api
Open

[python] Fix null blob write#7901
XiaoHongbo-Hope wants to merge 5 commits into
apache:masterfrom
XiaoHongbo-Hope:fix-null-blob-write-via-public-api

Conversation

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented May 19, 2026

Purpose

PR #7847 added null support to BlobFormatWriter.add_element / write_value and FormatBlobReader, but the public write_arrow path (DataBlobWriterBlobWriterBlobFileWriter._to_blob) still rejects None and raises ValueError, so writing a batch with a NULL inline blob fails end-to-end:

ValueError: Blob field value must be bytes/blob or serialized BlobDescriptor bytes, got <class 'NoneType'>.

This PR makes BlobFileWriter._to_blob return None for None input.

Tests

  • New e2e test DataBlobWriterTest.test_null_blob

Adds test_blob_write_read_end_to_end_with_null_values which writes a
batch containing None values in an inline blob column through the
standard write_arrow -> commit -> read path and asserts the NULLs
round-trip.

The test currently fails: BlobFileWriter._to_blob raises ValueError on
None. PR apache#7847 added null support to BlobFormatWriter.add_element /
write_value and FormatBlobReader, plus the FileIO.write_blob direct
path, but the DataBlobWriter -> BlobWriter -> BlobFileWriter chain
used by TableWrite.write_arrow still rejects None.

This test is added as a no-fix reproduction; the writer fix lands in
a follow-up commit.
BlobFileWriter._to_blob now returns None for None input instead of
raising. The downstream BlobFormatWriter.add_element already encodes
None as a -1 length marker, and FormatBlobReader returns None on
read, so values now round-trip end-to-end through write_arrow.

Also renames the e2e reproduction test added in the previous commit
from test_blob_write_read_end_to_end_with_null_values to
test_null_blob.
@XiaoHongbo-Hope XiaoHongbo-Hope force-pushed the fix-null-blob-write-via-public-api branch from 727405c to e50b290 Compare May 19, 2026 12:22
@JingsongLi
Copy link
Copy Markdown
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants