fix(upstream): Request body limit is not a network error#6149
Conversation
| return false; | ||
| } | ||
|
|
||
| // TODO: there's probably more exceptions to this rule. |
There was a problem hiding this comment.
Imo todo's should link to issues and actually be something we fix, maybe better as a Note:?
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7088763. Configure here.
| /// The request failed without indicating the state of the upstream connection. | ||
| Failed, |
There was a problem hiding this comment.
We need three states now, because in case of a client-side request error like the length limit error, we want to skip retries but we also do not want to trigger a network outage, nor mark a network outage as resolved.
| self.conn.reset_error(); | ||
| self.queue.trigger_retries(); | ||
| } | ||
| RequestOutcome::Failed => {} |
There was a problem hiding this comment.
Why does this skip notify_error?
There was a problem hiding this comment.
The doc comment on notify error says:
relay/relay-server/src/services/upstream.rs
Line 1506 in f4aa3a8
A local request failure does not contain any information about the connection (e.g. a length limit error should not initiate a reconnection attempt).
There was a problem hiding this comment.
I guess I don't know enough about the upstream whether we need to actually establish a new connection using that mechanism or if everything is fine. My intuition is, that we will have to actually make a new http connection and can't re-use the original one, but maybe reqwest already deals with that for us.
There was a problem hiding this comment.
Fair point, but notify_error wouldn't close the connection either. It just starts a grace period after which a reconnect is triggered.

When the request body is a stream, errors can occur during sending that relate to the client / request, not the server. There's probably more than this one but it's unclear to me how to filter them.
This introduces a third request outcome variant that does not get retried, but also does not influence connection outage handling.
With help from Codex.