Add ConfigNode ReadOnly heartbeat self-check (DiskFull/DiskCrash) by CRZbulabula · Pull Request #17724 · apache/iotdb

CRZbulabula · 2026-05-20T03:28:50Z

Description

Why

ConfigNode currently has no notion of being ReadOnly — only DataNode samples its disk and self-marks. The heartbeat from the leader to other ConfigNodes is effectively just a liveness ping, so a ConfigNode with a full or crashed disk would silently keep accepting writes from peers / clients. This PR extends ReadOnly tracking to ConfigNode and adds a new disk-failure reason (DiskCrash) on top of the existing DiskFull.

What changes

New status reason. NodeStatus.DISK_CRASH constant added next to DISK_FULL (both are still string identifiers stored in statusReason).
Shared utility. New org.apache.iotdb.commons.cluster.DiskChecker in node-commons: probes a list of directories via test-write + free-space ratio, then runs a single state-machine apply on CommonConfig with priority DiskCrash > DiskFull > Normal. Recovery to Running only fires when the prior reason was disk-related; other ReadOnly reasons (e.g. manual maintenance) are left untouched.
ConfigNode leader. Self-checks [systemDir, consensusDir] at the top of HeartbeatService#heartbeatLoopBody (before fanning out heartbeats).
ConfigNode follower. Self-checks on every received heartbeat in ConfigNodeRPCServiceProcessor#getConfigNodeHeartBeat, and reports status + statusReason back via newly-added optional fields 4 and 5 on TConfigNodeHeartbeatResp (forward-compatible — old peers simply leave them unset).
Leader's self entry in the cache. ConfigNodeHeartbeatCache#updateCurrentStatistics no longer short-circuits for CURRENT_NODE_ID; instead, the self entry mirrors CommonConfig, so show confignodes reflects the leader's own disk state.
DataNode. FolderManager now registers each instance into a weak-ref static list and exposes static boolean hasAnyAbnormalFolder(). DataNodeInternalRPCServiceImpl#sampleDiskLoad consults that aggregator and, when any folder is ABNORMAL, maps to DiskCrash (which outranks the existing free-ratio DiskFull check). State-machine application is delegated to DiskChecker.apply so DataNode and ConfigNode follow identical transition rules.

Design notes

DataNode keeps its current system-wide free-ratio check (SystemMetric.SYS_DISK_AVAILABLE_SPACE) for DiskFull. The new DiskCrash signal is path-scoped — it just observes already-recorded write failures rather than probing IO itself. ConfigNode runs both checks per-directory through DiskChecker (File.getUsableSpace/getTotalSpace + a tiny Files.createTempFile/write/delete probe), giving symmetric behavior on the two nodes from the cluster's perspective.
Thrift fields are optional to keep rolling upgrade safe: an older ConfigNode that doesn't populate them parses as Running with no reason.
ABNORMAL on DataNode is sticky: once a folder fails a business write it stays ABNORMAL until restart. This PR intentionally does not introduce an auto-recovery path for ReadOnly(DiskCrash) on DataNode (would require new ABNORMAL -> HEALTHY transitions inside FolderManager and is left as follow-up). ConfigNode does auto-recover, because testWrite reruns every heartbeat.

i18n

New disk health messages live in CommonMessages under both src/main/i18n/en and src/main/i18n/zh:

DISK_FULL_SET_READ_ONLY
DISK_CRASH_SET_READ_ONLY
DISK_CRASH_PROBE_FAILED
DISK_RECOVERED_SET_RUNNING

The existing inline English log in sampleDiskLoad is retained (just descriptive context); the state-change log itself is routed through CommonMessages so the Chinese build works out of the box.

This PR has:

been self-reviewed.
added Javadocs for the new public surface (`DiskChecker`, `FolderManager.hasAnyAbnormalFolder`, `ConfigNodeConfig.getCriticalDirs`).
added comments explaining the why where non-obvious (cache self-entry rewrite, optional Thrift fields, priority ordering).
added unit tests — `DiskCheckerTest` covers 17 cases including: NORMAL/DISK_FULL/DISK_CRASH detection, crash-wins-over-full, recovery clearing reason, non-disk `ReadOnly` reason left untouched, idempotency, probe-file cleanup.
added integration tests. (Heartbeat-driven state transitions are exercised at unit-test level; cluster-level disk failure injection felt heavier than this PR warrants — happy to add an IT in a follow-up if reviewers prefer.)
been tested in a test IoTDB cluster.

Known follow-ups

DataNode cannot auto-recover from `ReadOnly(DiskCrash)` once any folder is marked ABNORMAL by `FolderManager` — would need an ABNORMAL→HEALTHY re-probe path.
Consider unifying DataNode's free-ratio source with ConfigNode's per-directory `File.getUsableSpace` (currently DataNode goes through `MetricService`).

Key changed/added classes in this PR

New

`org.apache.iotdb.commons.cluster.DiskChecker`
`org.apache.iotdb.commons.cluster.DiskCheckerTest`

Modified

`org.apache.iotdb.commons.cluster.NodeStatus`
`org.apache.iotdb.commons.i18n.CommonMessages` (en + zh)
`org.apache.iotdb.confignode.rpc.thrift.TConfigNodeHeartbeatResp` (Thrift IDL)
`org.apache.iotdb.confignode.conf.ConfigNodeConfig`
`org.apache.iotdb.confignode.manager.load.service.HeartbeatService`
`org.apache.iotdb.confignode.manager.load.cache.node.ConfigNodeHeartbeatCache`
`org.apache.iotdb.confignode.manager.load.cache.node.NodeHeartbeatSample`
`org.apache.iotdb.confignode.service.thrift.ConfigNodeRPCServiceProcessor`
`org.apache.iotdb.db.storageengine.rescon.disk.FolderManager`
`org.apache.iotdb.db.protocol.thrift.impl.DataNodeInternalRPCServiceImpl`

…check ConfigNode now reports its own NodeStatus.ReadOnly when its critical directories (systemDir, consensusDir) are unwritable or near-full, mirroring the existing DataNode behavior. NodeStatus reasons are extended with a new DISK_CRASH constant alongside DISK_FULL, and the ConfigNode heartbeat carries status/statusReason back to the leader. - node-commons: new DiskChecker utility (probe + state-machine apply), with priority DiskCrash > DiskFull and recovery to Running when the reason was disk-related. i18n messages added in en + zh. - thrift-confignode: TConfigNodeHeartbeatResp gains optional status and statusReason fields (forward-compatible). - confignode: leader self-checks before fanning out heartbeats; follower self-checks on receive and reports back; cache reads from CommonConfig for the leader's self entry, otherwise from the sample. - datanode: FolderManager exposes a static hasAnyAbnormalFolder() aggregator; sampleDiskLoad treats any ABNORMAL folder as DiskCrash (which wins over DiskFull) and reuses DiskChecker.apply.

codecov · 2026-05-20T04:14:53Z

Codecov Report

❌ Patch coverage is 35.02538% with 128 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.55%. Comparing base (34cc346) to head (6d662fd).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
...g/apache/iotdb/consensus/ratis/RatisConsensus.java	0.00%	28 Missing ⚠️
.../confignode/manager/load/service/EventService.java	20.58%	27 Missing ⚠️
...db/db/storageengine/rescon/disk/FolderManager.java	11.11%	16 Missing ⚠️
...ol/thrift/impl/DataNodeInternalRPCServiceImpl.java	0.00%	9 Missing ⚠️
...fignode/manager/load/service/HeartbeatService.java	0.00%	8 Missing ⚠️
...a/org/apache/iotdb/commons/cluster/NodeStatus.java	0.00%	8 Missing ⚠️
.../org/apache/iotdb/consensus/ratis/utils/Utils.java	36.36%	7 Missing ⚠️
...ager/load/cache/node/ConfigNodeHeartbeatCache.java	62.50%	6 Missing ⚠️
.../service/thrift/ConfigNodeRPCServiceProcessor.java	0.00%	6 Missing ⚠️
...e/manager/load/cache/node/NodeHeartbeatSample.java	0.00%	5 Missing ⚠️
... and 5 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #17724      +/-   ##
============================================
+ Coverage     40.41%   40.55%   +0.13%     
- Complexity     2574     2577       +3     
============================================
  Files          5179     5181       +2     
  Lines        349767   350066     +299     
  Branches      44714    44768      +54     
============================================
+ Hits         141373   141978     +605     
+ Misses       208394   208088     -306

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…n after fanout - HeartbeatService: the leader's DiskChecker.checkAndApply now runs after pingRegisteredConfigNodes/DataNodes/AINodes so its (blocking) probe IO does not delay the async heartbeat dispatch. Gated on a snapshot of heartbeatCounter taken at loop entry so the leader self-samples on the same iterations DataNode/AINode load sampling fires. The 10-iteration cadence is lifted into a shared LOAD_SAMPLING_INTERVAL constant and reused by the existing setNeedSamplingLoad callsites. - ConfigNodeRPCServiceProcessor: follower side gates its disk check on a local heartbeatReceivedCounter (every Nth receive, matching the leader cadence) instead of running on every received heartbeat. - Tidy comments per code-review feedback.

…sive crash detect Three changes that let ReadOnly state actually shape Raft behavior on ConfigNode: - Utils.rejectWrite / stallApply now match ConfigRegion in addition to DataRegion, so a ReadOnly ConfigNode leader hits the same forceStepDownLeader path that DataRegion leaders already use. Comment at RatisConsensus.write updated. - New NodeStatus.priorityForStatus maps Running=0, ReadOnly(DiskFull)=-1, ReadOnly(DiskCrash)=-2. HeartbeatService runs a reconciliation step on the leader (same cadence as the load-sampling pass, after async fanout) that pushes each ConfigNode peer's desired priority into Ratis. Unknown/Removing/manual ReadOnly are left empty so transient blips do not churn the group config. IConsensus gains a default-no-op reconfigurePeerPriorities; RatisConsensus overrides it to rebuild the peer list and call sendReconfiguration. - Replace DiskChecker.check (active testWrite probe) with a passive observer threaded through Ratis. ApplicationStateMachineProxy gains a diskFailureListener parameter and fires it from the applyTransaction catch when Utils.isDiskFailure matches the cause (IOError / FileSystemException). RatisConsensus also tags IOException out of writeLocallyWithRetry / writeRemotelyWithRetry so log-write failures register as DiskCrash. DiskChecker keeps only checkFreeRatio (for the DiskFull path) and apply (for the state machine); DiskCrash is now sticky on both DataNode and ConfigNode until restart. DiskCheckerTest trimmed to drop testWrite-specific cases and to assert that NORMAL no longer recovers DiskCrash; 14 cases pass.

…rtbeatService - TConfigNodeHeartbeatReq carries needSamplingLoad; ConfigNodeRPCServiceProcessor gates its disk-health check on that flag instead of a local counter, matching the DataNode/AINode heartbeat shape. - HeartbeatService now bumps heartbeatCounter exactly once at the top of heartbeatLoopBody and threads the iterationIndex into genHeartbeatReq / genConfigNodeHeartbeatReq / genAIHeartbeatReq / addConfigNodeLocationsToReq; the gen methods no longer touch the counter. - Cross-peer priority reconciliation moved out of HeartbeatService into a new ConfigRegionPriorityBalancer that implements IClusterStatusSubscriber. It is registered on the EventService alongside RouteBalancer/TopologyService and only reacts when NodeStatisticsChangeEvent reports a transition whose priority bucket actually moved (filtered to ConfigNode peers, gated on isLeader). - RatisConsensus.reconfigurePeerPriorities is now a mechanical merge — it rebuilds the peer list from the requested priorities and unconditionally calls sendReconfiguration. Decisions about "did the priority change" live in the balancer, not at the consensus layer.

…ndex threading - HeartbeatService bumps heartbeatCounter once at the tail of heartbeatLoopBody; the gen* methods read heartbeatCounter.get() directly again instead of taking an iterationIndex parameter. - Remove the standalone ConfigRegionPriorityBalancer. Priority reconciliation now lives in EventService, fired from checkAndBroadcastNodeStatisticsChangeEventIfNecessary exactly when ConfigNode statistics change. It is leader-gated, filtered to ConfigNode peers, and only pushes peers whose priority bucket actually moved. EventService now takes the IManager to reach the consensus impl.

sonarqubecloud · 2026-05-20T10:47:16Z

Quality Gate passed

Issues
8 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

CRZbulabula added 4 commits May 20, 2026 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ConfigNode ReadOnly heartbeat self-check (DiskFull/DiskCrash)#17724

Add ConfigNode ReadOnly heartbeat self-check (DiskFull/DiskCrash)#17724
CRZbulabula wants to merge 5 commits into
masterfrom
confignode-readonly-disk-check

CRZbulabula commented May 20, 2026

Uh oh!

codecov Bot commented May 20, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CRZbulabula commented May 20, 2026

Description

Why

What changes

Design notes

i18n

Known follow-ups

Key changed/added classes in this PR

Uh oh!

codecov Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sonarqubecloud Bot commented May 20, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 20, 2026 •

edited

Loading