GH-1142: Fix buffer ownership in VectorSchemaRoot addVector/removeVector#1145
GH-1142: Fix buffer ownership in VectorSchemaRoot addVector/removeVector#1145John-W-Lewis wants to merge 1 commit into
Conversation
…oveVector Use TransferPair to properly transfer buffer ownership when creating a new VectorSchemaRoot, preventing use-after-free when the original root is closed. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Thank you for opening a pull request! Please label the PR with one or more of:
Also, add the 'breaking-change' label if appropriate. See CONTRIBUTING.md for details. |
|
Being neither a chicken nor an egg, I understand that I am unable to label this as "bug-fix". |
|
There are also a couple of enhancements I would suggest for VectorSchemaRoot but, based on the advice I've seen, it seems better to describe those in a separate PR. |
|
Hi @John-W-Lewis , thank you for the bug-fix. Yes, we need to wait for the maintainers to add the label and trigger the other checks. In the meantime, you can build locally, for example using Maven. Now, there is a format violation in You can fix it with the suggested command.
With the formatting fix, the build was successful for me, which means that the unit tests passed. |
axreldable
left a comment
There was a problem hiding this comment.
Other than the nit (nitpick) and the doc clarification, LGTM.
| FieldVector v = fieldVectors.get(i); | ||
| TransferPair transferPair = v.getTransferPair(v.getAllocator()); | ||
| transferPair.transfer(); | ||
| newVectors.add((FieldVector) transferPair.getTo()); |
There was a problem hiding this comment.
nit: Consider extracting the transferring logic into a separate method to avoid code duplication.
e.g.
private static FieldVector transferVector(FieldVector vector) {
TransferPair transferPair = vector.getTransferPair(vector.getAllocator());
transferPair.transfer();
return (FieldVector) transferPair.getTo();
}
| * | ||
| * <p>Buffer ownership is transferred to the returned root via {@link TransferPair}. After this | ||
| * operation, the vectors in this root are left in a transferred (empty) state. The removed | ||
| * vector's data is not transferred and is released. This root can be reused by calling {@link |
There was a problem hiding this comment.
The removed vector's data is not transferred and is released.
is released is not 100% accurate and can be confusing. The vector remains in the original schema root until it's closed/cleared.

What's Changed
VectorSchemaRoot#addVectorand#removeVectorpreviously created a newVectorSchemaRootby sharing rawFieldVectorreferences between the original and new root. This violates Arrow's ownership model: closing either root would release buffers still in use by the other, leading to use-after-free errors.This fix uses
TransferPairto properly transfer buffer ownership to the returned root, consistent with howVectorSchemaRoot#slicealready handles this. After the operation, the original root's vectors are left in a transferred (empty) state and can be reused viaallocateNew().Tests added to verify that the returned root's data remains valid after the original root and input vector are closed.
Closes #1142.
Made with Cursor