Skip to content

GH-1142: Fix buffer ownership in VectorSchemaRoot addVector/removeVector#1145

Open
John-W-Lewis wants to merge 1 commit into
apache:mainfrom
John-W-Lewis:Correcting-VectorSchemaRoot
Open

GH-1142: Fix buffer ownership in VectorSchemaRoot addVector/removeVector#1145
John-W-Lewis wants to merge 1 commit into
apache:mainfrom
John-W-Lewis:Correcting-VectorSchemaRoot

Conversation

@John-W-Lewis
Copy link
Copy Markdown

What's Changed

VectorSchemaRoot#addVector and #removeVector previously created a new VectorSchemaRoot by sharing raw FieldVector references between the original and new root. This violates Arrow's ownership model: closing either root would release buffers still in use by the other, leading to use-after-free errors.

This fix uses TransferPair to properly transfer buffer ownership to the returned root, consistent with how VectorSchemaRoot#slice already handles this. After the operation, the original root's vectors are left in a transferred (empty) state and can be reused via allocateNew().

Tests added to verify that the returned root's data remains valid after the original root and input vector are closed.

Closes #1142.

Made with Cursor

…oveVector

Use TransferPair to properly transfer buffer ownership when creating a
new VectorSchemaRoot, preventing use-after-free when the original root
is closed.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

Thank you for opening a pull request!

Please label the PR with one or more of:

  • bug-fix
  • chore
  • dependencies
  • documentation
  • enhancement

Also, add the 'breaking-change' label if appropriate.

See CONTRIBUTING.md for details.

@John-W-Lewis
Copy link
Copy Markdown
Author

John-W-Lewis commented May 11, 2026

Being neither a chicken nor an egg, I understand that I am unable to label this as "bug-fix".
Thank you to whoever can do that.

@John-W-Lewis
Copy link
Copy Markdown
Author

There are also a couple of enhancements I would suggest for VectorSchemaRoot but, based on the advice I've seen, it seems better to describe those in a separate PR.

@axreldable
Copy link
Copy Markdown
Contributor

axreldable commented May 17, 2026

Hi @John-W-Lewis , thank you for the bug-fix. Yes, we need to wait for the maintainers to add the label and trigger the other checks. In the meantime, you can build locally, for example using Maven. Now, there is a format violation in TestVectorSchemaRoot.java.

mvn clean install
...
[ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:3.4.0:check (spotless-check) on project arrow-vector: The following files had format violations:
[ERROR]     src/test/java/org/apache/arrow/vector/TestVectorSchemaRoot.java
[ERROR]         @@ -363,8 +363,7 @@
[ERROR]          ······intVector2.setValueCount(5);
[ERROR]          ······intVector3.setValueCount(5);
[ERROR]          
[ERROR]         -······VectorSchemaRoot·original·=
[ERROR]         -··········new·VectorSchemaRoot(Arrays.asList(intVector1,·intVector2));
[ERROR]         +······VectorSchemaRoot·original·=·new·VectorSchemaRoot(Arrays.asList(intVector1,·intVector2));
[ERROR]          ······original.setRowCount(5);
[ERROR]          
[ERROR]          ······VectorSchemaRoot·result·=·original.addVector(1,·intVector3);
[ERROR] Run 'mvn spotless:apply' to fix these violations.

You can fix it with the suggested command. mvn spotless:apply

image

With the formatting fix, the build was successful for me, which means that the unit tests passed.

Copy link
Copy Markdown
Contributor

@axreldable axreldable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the nit (nitpick) and the doc clarification, LGTM.

FieldVector v = fieldVectors.get(i);
TransferPair transferPair = v.getTransferPair(v.getAllocator());
transferPair.transfer();
newVectors.add((FieldVector) transferPair.getTo());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider extracting the transferring logic into a separate method to avoid code duplication.
e.g.

private static FieldVector transferVector(FieldVector vector) {
    TransferPair transferPair = vector.getTransferPair(vector.getAllocator());
    transferPair.transfer();
    return (FieldVector) transferPair.getTo();
  }

*
* <p>Buffer ownership is transferred to the returned root via {@link TransferPair}. After this
* operation, the vectors in this root are left in a transferred (empty) state. The removed
* vector's data is not transferred and is released. This root can be reused by calling {@link
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed vector's data is not transferred and is released.

is released is not 100% accurate and can be confusing. The vector remains in the original schema root until it's closed/cleared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Managing ownership in VectorSchemaRoot#addVector, recent changes miss the main fault.

2 participants