Skip to content

fix: merge duplicate-index entries in streaming tool_call deltas#3232

Open
sharziki wants to merge 1 commit into
openai:mainfrom
sharziki:fix/accumulate-delta-duplicate-index
Open

fix: merge duplicate-index entries in streaming tool_call deltas#3232
sharziki wants to merge 1 commit into
openai:mainfrom
sharziki:fix/accumulate-delta-duplicate-index

Conversation

@sharziki
Copy link
Copy Markdown

Summary

Fixes #3201

When the first streamed chunk contains multiple tool_calls entries with the same logical index (e.g. a tool-call name and its first argument fragment in one SSE event), the two early-return branches in accumulate_delta store the list as-is without merging by index. Later chunks merge into physical position 0 only, leaving the duplicate entry orphaned and producing invalid final JSON.

Root cause

Both early-return branches at the top of the per-key loop in _deltas.py (and the identical copy in _assistants.py) assign the delta list wholesale and skip the index-based merge logic that handles duplicate indexes in steady state:

if key not in acc:          # L8-10: first occurrence
    acc[key] = delta_value
    continue

acc_value = acc[key]
if acc_value is None:       # L13-15: None placeholder
    acc[key] = delta_value
    continue

Fix

Add _merge_indexed_list() that collapses list entries sharing the same index value before the list is stored for the first time. Applied in both _deltas.py and _assistants.py.

  • Non-indexed lists (strings, ints) pass through unchanged
  • Lists without an index field in the first entry pass through unchanged
  • Lists with duplicate index values are merged using the existing accumulate_delta recursion

Test plan

  • Verified with exact reproduction from issue (duplicate index: 0 in first chunk, acc_value is None)
  • Verified with key not in acc variant
  • Verified multiple distinct tool calls (different indexes) still work correctly
  • Verified non-indexed string lists still accumulate normally
  • Existing streaming test suite passes (35/35)

When the first streamed chunk contains multiple tool_calls entries with
the same logical index (e.g. a tool name and its first argument fragment
in one SSE event), the two early-return branches in accumulate_delta
store the list as-is without merging by index.  Later chunks merge into
physical position 0 only, leaving the duplicate entry orphaned and
producing invalid final JSON.

Add _merge_indexed_list() that collapses entries sharing the same index
value before the list is stored for the first time.  Applied in both
_deltas.py and _assistants.py which have identical accumulate_delta
implementations.

Fixes openai#3201
@sharziki sharziki requested a review from a team as a code owner May 12, 2026 15:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2bcb65bb3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +20 to +24
if not is_dict(item):
continue
idx = item.get("index") # type: ignore[union-attr]
if not isinstance(idx, int):
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not silently drop malformed indexed list entries

When _merge_indexed_list() is triggered (first item is a dict with index), entries that are not dicts or whose index is missing/non-int are skipped via continue, which silently removes data from the stream snapshot. This differs from the existing list-merge path in accumulate_delta() that explicitly raises on invalid indexed entries, so malformed-but-present payloads now become silent corruption instead of a detectable error. In streams where providers emit mixed or partially malformed tool_calls arrays, this can produce incomplete final messages without surfacing any failure.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming tool_call deltas with duplicate indexes in first chunk are accumulated incorrectly

1 participant