[fix](fe) add --drop_backends param to start_fe.sh#63306
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR adds a --drop_backends startup flag for Doris FE, wiring it through bin/start_fe.sh into FE argument parsing, and executing backend removal when the FE becomes master.
Changes:
- Add
--drop_backendsoption tobin/start_fe.shand pass it through toorg.apache.doris.DorisFE. - Register/parse a new FE CLI option
drop_backendsand set a corresponding system property. - When transferring to master, drop all registered backends if the system property is enabled.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
fe/fe-core/src/main/java/org/apache/doris/DorisFE.java |
Adds the --drop_backends CLI option and sets a system property when present. |
fe/fe-core/src/main/java/org/apache/doris/common/FeConstants.java |
Introduces the DROP_BACKENDS_KEY constant. |
fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java |
Drops all backends during transferToMaster() when the property is enabled. |
bin/start_fe.sh |
Adds --drop_backends getopt parsing and forwards the flag to FE. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/review |
There was a problem hiding this comment.
I found one blocking correctness issue. The new startup flag is wired through the shell script and FE argument parsing, and bash -n bin/start_fe.sh passes, but the backend removal path currently treats individual drop failures as non-fatal. That can leave the cluster in a partially dropped state while the FE continues to become ready, which violates the semantics of a destructive recovery/startup option.
Critical checkpoint conclusions:
- Goal/test: The PR aims to add
--drop_backendsand remove all BEs when the FE becomes MASTER. The wiring is present, but the implementation does not guarantee all requested BEs are removed; I did not find tests proving success/failure behavior. - Scope: The change is small and focused.
- Concurrency/locking: No new explicit locks; the drop path uses existing
SystemInfoServiceimmutable-map replacement. The main concern is failure atomicity rather than lock ordering. - Lifecycle: The flag is a process system property observed during
transferToMaster; no static initialization ordering issue found. Repeated leadership transitions are intentionally tied to the process flag per the current help text. - Config/compatibility: No persistent config or protocol/storage-format compatibility changes.
- Parallel paths: The PR routes through existing
dropBackend; cloud mode overrides can fail via meta-service RPC, making the non-fatal catch especially problematic. - Tests: No regression/unit test coverage was added for successful forwarding or partial-failure behavior.
- Observability: Existing logs are useful, but observability cannot compensate for continuing after a failed destructive operation.
- Transactions/persistence/data writes: Backend metadata changes are journaled through existing drop paths; however, partial failure can persist only a subset of requested drops.
- Performance: No hot-path performance concern found for this startup/master-transfer option.
Focus points: The provided focus file says there is no additional review focus; no focus-specific issue was found.
|
run buildall |
TPC-H: Total hot run time: 30924 ms |
TPC-DS: Total hot run time: 169317 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
This PR adds a
--drop_backendsstartup flag for Doris FE, wiring it through bin/start_fe.sh into FE argument parsing, and executing backend removal when the FE becomes master