Skip to content

fix(flags): retry flag requests on transient network errors#587

Draft
marandaneto wants to merge 2 commits into
mainfrom
pi-flags-network-retry-20260624
Draft

fix(flags): retry flag requests on transient network errors#587
marandaneto wants to merge 2 commits into
mainfrom
pi-flags-network-retry-20260624

Conversation

@marandaneto

@marandaneto marandaneto commented Jun 25, 2026

Copy link
Copy Markdown
Member

💡 Motivation and Context

/flags/?v=2 evaluation should tolerate transient network failures, but should not retry HTTP/API responses such as 408, 429, or 5xx.

This PR adds bounded retry/backoff only for transient network failures (timeouts, connection resets/lost connections, and EOF-style failures) and keeps HTTP/API status errors terminal.

💚 How did you test it?

See the repo-specific command below.

📝 Checklist

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • I updated the docs if needed.
  • No breaking change or entry added to the changelog.

If releasing new changes

  • Ran pnpm changeset to generate a changeset file

🤖 Agent context

Autonomy: Human-driven (agent-assisted)

Implemented by pi after a maintainer-directed cross-SDK pass. The chosen policy is transient network-only retry/backoff for /flags/?v=2; HTTP/API statuses remain non-retryable by design. Connection-refused failures also fail fast where the platform exposes that distinction.

Tested with:

./gradlew :posthog:test --tests com.posthog.internal.PostHogApiTest --no-daemon

@marandaneto marandaneto self-assigned this Jun 25, 2026
@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown

Reviews (1): Last reviewed commit: "fix: retry feature flag requests on netw..." | Re-trigger Greptile

Comment on lines +147 to +197
@Test
fun `flags retries IOException and returns successful response`() {
val file = File("src/test/resources/json/flags-v1/basic-flags-no-errors.json")
val responseFlagsApi = file.readText()
val attempts = AtomicInteger(0)
val client =
OkHttpClient.Builder()
.addInterceptor { chain ->
if (attempts.incrementAndGet() == 1) {
throw IOException("network failure")
}
chain.proceed(chain.request())
}
.build()
val http = mockHttp(response = MockResponse().setBody(responseFlagsApi))
val url = http.url("/")

try {
val sut = getSut(host = url.toString(), httpClient = client, maxRetries = 1)

val response = sut.flags("distinctId", anonymousId = "anonId", groups = emptyMap())

assertNotNull(response)
assertEquals(2, attempts.get())
assertEquals(1, http.requestCount)
} finally {
http.shutdown()
}
}

@Test
fun `flags does not retry HTTP error responses`() {
listOf(408, 429, 500, 502, 503).forEach { statusCode ->
val http = mockHttp(response = MockResponse().setResponseCode(statusCode).setBody("error"))
val url = http.url("/")

try {
val sut = getSut(host = url.toString())

val exc =
assertThrows(PostHogApiError::class.java) {
sut.flags("distinctId", anonymousId = "anonId", groups = emptyMap())
}

assertEquals(statusCode, exc.statusCode)
assertEquals(1, http.requestCount)
} finally {
http.shutdown()
}
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing exhausted-retries coverage

The tests verify success-after-1-retry and no-retry-on-HTTP-error, but there is no test confirming that once all retries are consumed, the IOException is actually rethrown to the caller. Without that assertion, a regression where executeFlagsWithRetry swallows the error (e.g., returns null silently) would go undetected. Per the custom rule, retry logic should include a test for the complete flow through all retry attempts.

Comment on lines +177 to +197
@Test
fun `flags does not retry HTTP error responses`() {
listOf(408, 429, 500, 502, 503).forEach { statusCode ->
val http = mockHttp(response = MockResponse().setResponseCode(statusCode).setBody("error"))
val url = http.url("/")

try {
val sut = getSut(host = url.toString())

val exc =
assertThrows(PostHogApiError::class.java) {
sut.flags("distinctId", anonymousId = "anonId", groups = emptyMap())
}

assertEquals(statusCode, exc.statusCode)
assertEquals(1, http.requestCount)
} finally {
http.shutdown()
}
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Prefer parameterised tests over forEach loops

The listOf(408, 429, 500, 502, 503).forEach { … } pattern inside a single @Test hides which status code failed when the test breaks and does not satisfy the project convention of always preferring parameterised tests. Each status code should be its own parameterised case so failures are individually identified.

Context Used: Do not attempt to comment on incorrect alphabetica... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


@Throws(PostHogApiError::class, IOException::class)
private fun executeFlagsWithRetry(request: Request): PostHogFlagsResponse? {
val maxRetries = config.maxRetries.coerceAtLeast(0)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 maxRetries now controls two unrelated concerns

PostHogConfig.maxRetries is documented as "Maximum number of retries for failed flush attempts before events are dropped" (default: 3). Reusing it here means a user who lowers maxRetries to reduce queue/flush behaviour (e.g. sets it to 0) will silently disable flags network-retry as well, and vice-versa — increasing it to improve flags resilience also inflates event-flush retry counts. A separate flagsMaxRetries (or similar) would keep the two concerns independent, and PostHogConfig's KDoc for maxRetries should at minimum be updated to document the flags behaviour.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

posthog-android Compliance Report

Date: 2026-06-25 07:06:47 UTC
Duration: 119810ms

⚠️ Some Tests Failed

38/45 tests passed, 7 failed


Capture Tests

⚠️ 28/29 tests passed, 1 failed

View Details
Test Status Duration
Format Validation.Event Has Required Fields 376ms
Format Validation.Event Has Uuid 38ms
Format Validation.Event Has Lib Properties 33ms
Format Validation.Distinct Id Is String 28ms
Format Validation.Token Is Present 25ms
Format Validation.Custom Properties Preserved 30ms
Format Validation.Event Has Timestamp 24ms
Retry Behavior.Retries On 503 7027ms
Retry Behavior.Does Not Retry On 400 4022ms
Retry Behavior.Does Not Retry On 401 4024ms
Retry Behavior.Respects Retry After Header 7028ms
Retry Behavior.Implements Backoff 17035ms
Retry Behavior.Retries On 500 7019ms
Retry Behavior.Retries On 502 7019ms
Retry Behavior.Retries On 504 7017ms
Retry Behavior.Max Retries Respected 17038ms
Deduplication.Generates Unique Uuids 39ms
Deduplication.Preserves Uuid On Retry 7017ms
Deduplication.Preserves Uuid And Timestamp On Retry 12033ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry 7018ms
Deduplication.No Duplicate Events In Batch 34ms
Deduplication.Different Events Have Different Uuids 23ms
Compression.Sends Gzip When Enabled 21ms
Batch Format.Uses Proper Batch Structure 21ms
Batch Format.Flush With No Events Sends Nothing 13ms
Batch Format.Multiple Events Batched Together 37ms
Error Handling.Does Not Retry On 403 4023ms
Error Handling.Does Not Retry On 413 4016ms
Error Handling.Retries On 408 5028ms

Failures

error_handling.does_not_retry_on_413

Expected 1 requests, got 2

Feature_Flags Tests

⚠️ 10/16 tests passed, 6 failed

View Details
Test Status Duration
Request Payload.Request With Person Properties Device Id 34ms
Request Payload.Flags Request Uses V2 Query Param 25ms
Request Payload.Flags Request Hits Flags Path Not Decide 22ms
Request Payload.Flags Request Omits Authorization Header 25ms
Request Payload.Token In Flags Body Matches Init 22ms
Request Payload.Groups Round Trip 22ms
Request Payload.Groups Default To Empty Object 25ms
Request Payload.Person Properties Distinct Id Auto Populated When Caller Omits It 22ms
Request Payload.Disable Geoip False Propagates As Geoip Disable False 22ms
Request Payload.Disable Geoip Omitted Defaults To False 22ms
Request Payload.Flag Keys To Evaluate Contains Only Requested Key 35ms
Request Lifecycle.No Flags Request On Init Alone 9ms
Request Lifecycle.No Flags Request On Normal Capture 29ms
Request Lifecycle.Two Flag Calls Produce Two Remote Requests 2029ms
Request Lifecycle.Mock Response Value Is Returned To Caller 21ms
Side Effect Events.Get Feature Flag Captures Feature Flag Called Event 29ms

Failures

request_payload.request_with_person_properties_device_id

Expected distinct_id='test_user_123', got '019efd9a-4030-7702-9a69-b0688f744bc4'

request_payload.groups_default_to_empty_object

Field 'groups' not found in /flags request body at path 'groups'. Available keys: ['$anon_distinct_id', '$device_id', 'api_key', 'distinct_id', 'timezone', 'person_properties']

request_payload.person_properties_distinct_id_auto_populated_when_caller_omits_it

Field 'distinct_id' not found in /flags request body at path 'person_properties.distinct_id'. Available keys: ['$lib', '$lib_version', 'email']

request_payload.disable_geoip_false_propagates_as_geoip_disable_false

Field 'geoip_disable' not found in /flags request body at path 'geoip_disable'. Available keys: ['$anon_distinct_id', '$device_id', 'api_key', 'distinct_id', 'timezone', 'person_properties']

request_payload.disable_geoip_omitted_defaults_to_false

Field 'geoip_disable' not found in /flags request body at path 'geoip_disable'. Available keys: ['$anon_distinct_id', '$device_id', 'api_key', 'distinct_id', 'timezone', 'person_properties']

request_payload.flag_keys_to_evaluate_contains_only_requested_key

Field 'flag_keys_to_evaluate' not found in /flags request body at path 'flag_keys_to_evaluate'. Available keys: ['$anon_distinct_id', '$device_id', 'api_key', 'distinct_id', 'timezone', 'person_properties']

@marandaneto marandaneto changed the title fix: retry feature flag requests on network errors fix(flags): retry flag requests on transient network errors Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant