Skip to content

branch-4.1: [fix](streaming-job) recompute derived fields after replay and ALTER #62936#63261

Open
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-62936-branch-4.1
Open

branch-4.1: [fix](streaming-job) recompute derived fields after replay and ALTER #62936#63261
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-62936-branch-4.1

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #62936

…62936)

### What problem does this PR solve?

Problem Summary:

`StreamingInsertJob` initializes two derived fields from
`jobProperties.max_interval` in `init()`:
- `sampleWindowMs` = `max_interval * 10 * 1000` — used by
`checkDataQuality()` for the `load.max_filter_ratio` time window
- `jobConfig.timerDefinition.interval` = `max_interval` — used by
`JobScheduler` to compute next trigger time

Neither is persisted in the gson image, and neither is refreshed in two
paths:

1. **gson replay (`gsonPostProcess`)**: after FE checkpoint restart,
`sampleWindowMs` stays at default `0`. The time-window check `(now -
sampleStartTime) > sampleWindowMs` is then always true, so the sample
window expires on every commit. The window-accumulation contract used by
`load.max_filter_ratio` degrades to single-batch judgment, and a job
recovered from image can be wrongly paused on a small bad batch that
should be diluted by the surrounding window.

2. **ALTER PROPERTIES (`modifyPropertiesInternal`)**: changing
`max_interval` only updates `properties` and `jobProperties`. Neither
`sampleWindowMs` nor `timerDefinition.interval` is refreshed. The
scheduler keeps reading the old interval (the new value never reaches
`JobExecutionConfiguration.getTriggerDelayTimes`), so ALTER
`max_interval` never takes effect — not even after FE restart, since
image carries the stale `interval` too.

### Fix

Extract a single `recomputeDerivedFields()` that re-derives all
transient state from `jobProperties`:
- `sampleWindowMs = maxIntervalSec * 10 * 1000`
- `timerDefinition.interval = maxIntervalSec`
- reset `sampleStartTime` / `sampleWindowScannedRows` /
`sampleWindowFilteredRows`

Call it at every entry point where `jobProperties` is rebuilt:
- `init()` (job creation)
- `gsonPostProcess()` (image replay)
- `modifyPropertiesInternal()` (ALTER PROPERTIES)

Resetting the sample counters on ALTER is intentional: changing
`max_interval` redefines the window itself, so accumulated counts from
the old window have no meaningful interpretation in the new one.

### Release note

Fix streaming insert job sample window and scheduler interval not being
restored after FE checkpoint replay or ALTER PROPERTIES.
@github-actions github-actions Bot requested a review from yiguolei as a code owner May 14, 2026 15:19
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Copy Markdown
Contributor

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 78.57% (11/14) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants