branch-4.1: [fix](streaming-job) recompute derived fields after replay and ALTER #62936#63261
Open
github-actions[bot] wants to merge 1 commit into
Open
branch-4.1: [fix](streaming-job) recompute derived fields after replay and ALTER #62936#63261github-actions[bot] wants to merge 1 commit into
github-actions[bot] wants to merge 1 commit into
Conversation
…62936) ### What problem does this PR solve? Problem Summary: `StreamingInsertJob` initializes two derived fields from `jobProperties.max_interval` in `init()`: - `sampleWindowMs` = `max_interval * 10 * 1000` — used by `checkDataQuality()` for the `load.max_filter_ratio` time window - `jobConfig.timerDefinition.interval` = `max_interval` — used by `JobScheduler` to compute next trigger time Neither is persisted in the gson image, and neither is refreshed in two paths: 1. **gson replay (`gsonPostProcess`)**: after FE checkpoint restart, `sampleWindowMs` stays at default `0`. The time-window check `(now - sampleStartTime) > sampleWindowMs` is then always true, so the sample window expires on every commit. The window-accumulation contract used by `load.max_filter_ratio` degrades to single-batch judgment, and a job recovered from image can be wrongly paused on a small bad batch that should be diluted by the surrounding window. 2. **ALTER PROPERTIES (`modifyPropertiesInternal`)**: changing `max_interval` only updates `properties` and `jobProperties`. Neither `sampleWindowMs` nor `timerDefinition.interval` is refreshed. The scheduler keeps reading the old interval (the new value never reaches `JobExecutionConfiguration.getTriggerDelayTimes`), so ALTER `max_interval` never takes effect — not even after FE restart, since image carries the stale `interval` too. ### Fix Extract a single `recomputeDerivedFields()` that re-derives all transient state from `jobProperties`: - `sampleWindowMs = maxIntervalSec * 10 * 1000` - `timerDefinition.interval = maxIntervalSec` - reset `sampleStartTime` / `sampleWindowScannedRows` / `sampleWindowFilteredRows` Call it at every entry point where `jobProperties` is rebuilt: - `init()` (job creation) - `gsonPostProcess()` (image replay) - `modifyPropertiesInternal()` (ALTER PROPERTIES) Resetting the sample counters on ALTER is intentional: changing `max_interval` redefines the window itself, so accumulated counts from the old window have no meaningful interpretation in the new one. ### Release note Fix streaming insert job sample window and scheduler interval not being restored after FE checkpoint replay or ALTER PROPERTIES.
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
Contributor
FE Regression Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #62936