fix(flags): retry flag requests on transient network errors#587
fix(flags): retry flag requests on transient network errors#587marandaneto wants to merge 2 commits into
Conversation
|
Reviews (1): Last reviewed commit: "fix: retry feature flag requests on netw..." | Re-trigger Greptile |
| @Test | ||
| fun `flags retries IOException and returns successful response`() { | ||
| val file = File("src/test/resources/json/flags-v1/basic-flags-no-errors.json") | ||
| val responseFlagsApi = file.readText() | ||
| val attempts = AtomicInteger(0) | ||
| val client = | ||
| OkHttpClient.Builder() | ||
| .addInterceptor { chain -> | ||
| if (attempts.incrementAndGet() == 1) { | ||
| throw IOException("network failure") | ||
| } | ||
| chain.proceed(chain.request()) | ||
| } | ||
| .build() | ||
| val http = mockHttp(response = MockResponse().setBody(responseFlagsApi)) | ||
| val url = http.url("/") | ||
|
|
||
| try { | ||
| val sut = getSut(host = url.toString(), httpClient = client, maxRetries = 1) | ||
|
|
||
| val response = sut.flags("distinctId", anonymousId = "anonId", groups = emptyMap()) | ||
|
|
||
| assertNotNull(response) | ||
| assertEquals(2, attempts.get()) | ||
| assertEquals(1, http.requestCount) | ||
| } finally { | ||
| http.shutdown() | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| fun `flags does not retry HTTP error responses`() { | ||
| listOf(408, 429, 500, 502, 503).forEach { statusCode -> | ||
| val http = mockHttp(response = MockResponse().setResponseCode(statusCode).setBody("error")) | ||
| val url = http.url("/") | ||
|
|
||
| try { | ||
| val sut = getSut(host = url.toString()) | ||
|
|
||
| val exc = | ||
| assertThrows(PostHogApiError::class.java) { | ||
| sut.flags("distinctId", anonymousId = "anonId", groups = emptyMap()) | ||
| } | ||
|
|
||
| assertEquals(statusCode, exc.statusCode) | ||
| assertEquals(1, http.requestCount) | ||
| } finally { | ||
| http.shutdown() | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing exhausted-retries coverage
The tests verify success-after-1-retry and no-retry-on-HTTP-error, but there is no test confirming that once all retries are consumed, the IOException is actually rethrown to the caller. Without that assertion, a regression where executeFlagsWithRetry swallows the error (e.g., returns null silently) would go undetected. Per the custom rule, retry logic should include a test for the complete flow through all retry attempts.
| @Test | ||
| fun `flags does not retry HTTP error responses`() { | ||
| listOf(408, 429, 500, 502, 503).forEach { statusCode -> | ||
| val http = mockHttp(response = MockResponse().setResponseCode(statusCode).setBody("error")) | ||
| val url = http.url("/") | ||
|
|
||
| try { | ||
| val sut = getSut(host = url.toString()) | ||
|
|
||
| val exc = | ||
| assertThrows(PostHogApiError::class.java) { | ||
| sut.flags("distinctId", anonymousId = "anonId", groups = emptyMap()) | ||
| } | ||
|
|
||
| assertEquals(statusCode, exc.statusCode) | ||
| assertEquals(1, http.requestCount) | ||
| } finally { | ||
| http.shutdown() | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Prefer parameterised tests over forEach loops
The listOf(408, 429, 500, 502, 503).forEach { … } pattern inside a single @Test hides which status code failed when the test breaks and does not satisfy the project convention of always preferring parameterised tests. Each status code should be its own parameterised case so failures are individually identified.
Context Used: Do not attempt to comment on incorrect alphabetica... (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
|
||
| @Throws(PostHogApiError::class, IOException::class) | ||
| private fun executeFlagsWithRetry(request: Request): PostHogFlagsResponse? { | ||
| val maxRetries = config.maxRetries.coerceAtLeast(0) |
There was a problem hiding this comment.
maxRetries now controls two unrelated concerns
PostHogConfig.maxRetries is documented as "Maximum number of retries for failed flush attempts before events are dropped" (default: 3). Reusing it here means a user who lowers maxRetries to reduce queue/flush behaviour (e.g. sets it to 0) will silently disable flags network-retry as well, and vice-versa — increasing it to improve flags resilience also inflates event-flush retry counts. A separate flagsMaxRetries (or similar) would keep the two concerns independent, and PostHogConfig's KDoc for maxRetries should at minimum be updated to document the flags behaviour.
posthog-android Compliance ReportDate: 2026-06-25 07:06:47 UTC
|
| Test | Status | Duration |
|---|---|---|
| Format Validation.Event Has Required Fields | ✅ | 376ms |
| Format Validation.Event Has Uuid | ✅ | 38ms |
| Format Validation.Event Has Lib Properties | ✅ | 33ms |
| Format Validation.Distinct Id Is String | ✅ | 28ms |
| Format Validation.Token Is Present | ✅ | 25ms |
| Format Validation.Custom Properties Preserved | ✅ | 30ms |
| Format Validation.Event Has Timestamp | ✅ | 24ms |
| Retry Behavior.Retries On 503 | ✅ | 7027ms |
| Retry Behavior.Does Not Retry On 400 | ✅ | 4022ms |
| Retry Behavior.Does Not Retry On 401 | ✅ | 4024ms |
| Retry Behavior.Respects Retry After Header | ✅ | 7028ms |
| Retry Behavior.Implements Backoff | ✅ | 17035ms |
| Retry Behavior.Retries On 500 | ✅ | 7019ms |
| Retry Behavior.Retries On 502 | ✅ | 7019ms |
| Retry Behavior.Retries On 504 | ✅ | 7017ms |
| Retry Behavior.Max Retries Respected | ✅ | 17038ms |
| Deduplication.Generates Unique Uuids | ✅ | 39ms |
| Deduplication.Preserves Uuid On Retry | ✅ | 7017ms |
| Deduplication.Preserves Uuid And Timestamp On Retry | ✅ | 12033ms |
| Deduplication.Preserves Uuid And Timestamp On Batch Retry | ✅ | 7018ms |
| Deduplication.No Duplicate Events In Batch | ✅ | 34ms |
| Deduplication.Different Events Have Different Uuids | ✅ | 23ms |
| Compression.Sends Gzip When Enabled | ✅ | 21ms |
| Batch Format.Uses Proper Batch Structure | ✅ | 21ms |
| Batch Format.Flush With No Events Sends Nothing | ✅ | 13ms |
| Batch Format.Multiple Events Batched Together | ✅ | 37ms |
| Error Handling.Does Not Retry On 403 | ✅ | 4023ms |
| Error Handling.Does Not Retry On 413 | ❌ | 4016ms |
| Error Handling.Retries On 408 | ✅ | 5028ms |
Failures
error_handling.does_not_retry_on_413
Expected 1 requests, got 2
Feature_Flags Tests
View Details
| Test | Status | Duration |
|---|---|---|
| Request Payload.Request With Person Properties Device Id | ❌ | 34ms |
| Request Payload.Flags Request Uses V2 Query Param | ✅ | 25ms |
| Request Payload.Flags Request Hits Flags Path Not Decide | ✅ | 22ms |
| Request Payload.Flags Request Omits Authorization Header | ✅ | 25ms |
| Request Payload.Token In Flags Body Matches Init | ✅ | 22ms |
| Request Payload.Groups Round Trip | ✅ | 22ms |
| Request Payload.Groups Default To Empty Object | ❌ | 25ms |
| Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It | ❌ | 22ms |
| Request Payload.Disable Geoip False Propagates As Geoip Disable False | ❌ | 22ms |
| Request Payload.Disable Geoip Omitted Defaults To False | ❌ | 22ms |
| Request Payload.Flag Keys To Evaluate Contains Only Requested Key | ❌ | 35ms |
| Request Lifecycle.No Flags Request On Init Alone | ✅ | 9ms |
| Request Lifecycle.No Flags Request On Normal Capture | ✅ | 29ms |
| Request Lifecycle.Two Flag Calls Produce Two Remote Requests | ✅ | 2029ms |
| Request Lifecycle.Mock Response Value Is Returned To Caller | ✅ | 21ms |
| Side Effect Events.Get Feature Flag Captures Feature Flag Called Event | ✅ | 29ms |
Failures
request_payload.request_with_person_properties_device_id
Expected distinct_id='test_user_123', got '019efd9a-4030-7702-9a69-b0688f744bc4'
request_payload.groups_default_to_empty_object
Field 'groups' not found in /flags request body at path 'groups'. Available keys: ['$anon_distinct_id', '$device_id', 'api_key', 'distinct_id', 'timezone', 'person_properties']
request_payload.person_properties_distinct_id_auto_populated_when_caller_omits_it
Field 'distinct_id' not found in /flags request body at path 'person_properties.distinct_id'. Available keys: ['$lib', '$lib_version', 'email']
request_payload.disable_geoip_false_propagates_as_geoip_disable_false
Field 'geoip_disable' not found in /flags request body at path 'geoip_disable'. Available keys: ['$anon_distinct_id', '$device_id', 'api_key', 'distinct_id', 'timezone', 'person_properties']
request_payload.disable_geoip_omitted_defaults_to_false
Field 'geoip_disable' not found in /flags request body at path 'geoip_disable'. Available keys: ['$anon_distinct_id', '$device_id', 'api_key', 'distinct_id', 'timezone', 'person_properties']
request_payload.flag_keys_to_evaluate_contains_only_requested_key
Field 'flag_keys_to_evaluate' not found in /flags request body at path 'flag_keys_to_evaluate'. Available keys: ['$anon_distinct_id', '$device_id', 'api_key', 'distinct_id', 'timezone', 'person_properties']
💡 Motivation and Context
/flags/?v=2evaluation should tolerate transient network failures, but should not retry HTTP/API responses such as 408, 429, or 5xx.This PR adds bounded retry/backoff only for transient network failures (timeouts, connection resets/lost connections, and EOF-style failures) and keeps HTTP/API status errors terminal.
💚 How did you test it?
See the repo-specific command below.
📝 Checklist
If releasing new changes
pnpm changesetto generate a changeset file🤖 Agent context
Autonomy: Human-driven (agent-assisted)
Implemented by pi after a maintainer-directed cross-SDK pass. The chosen policy is transient network-only retry/backoff for
/flags/?v=2; HTTP/API statuses remain non-retryable by design. Connection-refused failures also fail fast where the platform exposes that distinction.Tested with: