fix: avoid inferred caps for explicit-only providers#402
Conversation
🦋 Changeset detectedLatest commit: 1731864 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7423570e4a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if ( | ||
| args.budget.hardCap === undefined && | ||
| args.provider.completionBudgetStrategy === 'explicit-only' | ||
| ) { | ||
| return args.provider; |
There was a problem hiding this comment.
Honor max_output_size for explicit-only providers
When an openai/openai_responses model alias sets max_output_size, resolveCompletionBudget still produces only a fallback unless an env var is set, and toKosongProviderConfig only forwards maxOutputSize to Anthropic (packages/agent-core/src/session/provider-manager.ts:220-260). Because this early return fires before withMaxCompletionTokens, those explicit-only aliases drop the configured per-alias cap and send no max_tokens/max_output_tokens, leaving users unable to cap OpenAI-compatible providers that reject oversized output budgets.
Useful? React with 👍 / 👎.
7423570 to
1731864
Compare
Related Issue
Resolve #349
Related to #318
Problem
OpenAI-compatible providers can reject requests when Kimi Code derives a completion-token cap from a catalog model's context window and sends that value as
max_tokens/max_output_tokens. This surfaced in #349 with MiMo-V2.5-Pro: the catalog context window is much larger than the provider's accepted output-token cap, so the request returned400 Param Incorrect.#318 addresses the common configured-output-limit path by using
max_output_sizeas the completion cap. This PR adds the companion guard for providers that should only receive configured hard caps and should not receive fallback caps inferred from context windows.What changed
completionBudgetStrategyfield toChatProvider.explicit-onlyfor completion budgeting.explicit-onlyproviders still receive configured hard caps, but skip fallback/context-window inferred caps.Checklist
gen-changesetsskill, or this PR needs no changeset.gen-docsskill, or this PR needs no doc update.Verification
node node_modules/vitest/vitest.mjs run packages/agent-core/test/utils/completion-budget.test.ts packages/agent-core/test/agent/kosong-llm.test.tsnode node_modules/vitest/vitest.mjs run packages/kosong/test/openai-legacy.test.ts packages/kosong/test/openai-responses.test.tsnode node_modules/typescript/bin/tsc -p packages/agent-core/tsconfig.json --noEmitnode node_modules/typescript/bin/tsc -p packages/kosong/tsconfig.json --noEmitgit diff --checkxiaomi-token-plan-sgp/mimo-v2.5-prowith thinking enabled returned successfully after avoiding the inferred context-sized cap.