Skip to content

<fix>[storage]: fix unstable DeleteSnapShotGroup2Case#4037

Open
MatheMatrix wants to merge 1 commit into
zsv_5.1.0from
sync/wenhao.zhang/c-0
Open

<fix>[storage]: fix unstable DeleteSnapShotGroup2Case#4037
MatheMatrix wants to merge 1 commit into
zsv_5.1.0from
sync/wenhao.zhang/c-0

Conversation

@MatheMatrix
Copy link
Copy Markdown
Owner

Resolves: ZSV-12287
Related: ZSV-5936

Change-Id: I627364726d796a7a67666162637065687263676c

sync from gitlab !9933

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Review Change Stack

Walkthrough

在 VolumeSnapshotBase 中:1) changeStatus 先 reload 最新实体,若存在则通过按 uuid 的条件 SQL 仅更新 status 并同步内存;2) 删除主存快照完成时同样 reload 后通过条件 SQL 仅清空主存相关字段并回复消息。

变更清单

快照状态与主存字段更新

Layer / File(s) Summary
条件化状态更新实现
storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotBase.java
新增 SQL 导入;changeStatus 改为先 dbf.reload(self),若为 null 则返回;否则用 SQL.New(VolumeSnapshotVO.class).eq(uuid).set(status).update() 仅更新数据库中的 status 字段,并同步内存对象状态。
主存字段条件清空
storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotBase.java
删除主存快照完成阶段先 dbf.reload(self),若为 null 则直接回复空消息;否则用 SQL.New(VolumeSnapshotVO.class).eq(uuid).set(primaryStorageInstallPath=null).set(primaryStorageUuid=null).update() 仅清空数据库中的主存字段并回复。

序列图

(已在隐藏审查栈中为状态更新层提供序列图;总体变更为局部条件更新,故无需额外图示。)

代码审查工作量评估

🎯 4 (Complex) | ⏱️ ~45 分钟

兔子的诗

我在数据库边轻声敲,🐰
先探路再填笔稿,
精准一条 SQL 抹去旧痕,
状态回归内存安好,
并发春风不再闹。

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确反映了主要变更:修复DeleteSnapShotGroup2Case中的不稳定性,这是PR的核心目标。
Description check ✅ Passed 描述通过关联的问题追踪器和提交ID与变更内容相关,清晰表明这是为了修复存储组件中的不稳定问题。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/wenhao.zhang/c-0

Comment @coderabbitai help to get the list of available commands and usage tips.

@zstack-robot-1
Copy link
Copy Markdown
Collaborator

Comment from yaohua.wu:

Review: MR !9933 — ZSV-12287

变更概述:将 VolumeSnapshotBase.changeStatus()dbf.update(self)(Hibernate 实体图 merge/flush)改为先 dbf.reload(self) 再用 SQL.New(VolumeSnapshotVO.class).set(status).update() 定点单字段更新,规避并发删除 snapshot group 时由实体图级联触发的 VolumeSnapshotGroupRefVO 外键写入冲突。

结论:方案正确,精准命中 Jira 中分析的根因——changeStatusdbf.update 触发 entity-graph flush,导致子表 VolumeSnapshotGroupRefVO 级联写入,在父表 VolumeSnapshotGroupVO 已被并发删除的窗口期命中 FK 冲突。HQL 批量 update 是直接 DML,不加载实体、不走级联、不 flush 持久化上下文,从结构上消除该路径风险,确实优于单纯重试。新增的 reloadnull 判空也正确处理了"snapshot 已被并发删除"的竞态。未发现 P0/Critical 阻塞性问题。

🟡 Warning

# File:Line 问题 建议
1 VolumeSnapshotBase.java:264-268 (updateDb()) 同一文件、同一删除流程(handle(VolumeSnapshotPrimaryStorageDeletionMsg))中的 updateDb() 仍使用 dbf.update(self),存在与本次修复点完全相同的 entity-graph 级联写风险。当前仅靠 252-260 行的 catch-retry 兜底——而 Jira 分析本身明确指出"单纯重试仍可能走到同一高风险写路径",这与本 MR 选择的"缩小写范围、避开关联写"思路相矛盾。若 DeleteSnapShotGroup2Case 在本次修复后仍偶发失败,updateDb() 是首要嫌疑点。 建议将 updateDb() 一并改为定点 SQL 更新:SQL.New(VolumeSnapshotVO.class).eq(VolumeSnapshotVO_.uuid, self.getUuid()).set(VolumeSnapshotVO_.primaryStorageInstallPath, null).set(VolumeSnapshotVO_.primaryStorageUuid, null).update(),与 changeStatus 保持一致;若确有理由保留 retry,请补注释说明。注:updateDb() 及其 retry 为 pre-existing 代码,非本 MR 引入,但与本 MR 目标(消除 case 不稳定)直接相关。

🟢 Suggestion

# File:Line 问题 建议
1 VolumeSnapshotBase.java:215-217 changeStatuslatest == null(snapshot 已被并发删除)时静默 returnhandle(ChangeVolumeSnapshotStatusMsg) 随后仍回复成功。竞态下该行为本身可接受,但完全无日志,后续排查"状态为何未变更"时缺少线索。 补一行 debug 日志,例如 logger.debug(String.format("volume snapshot[uuid:%s] has been deleted, skip changeStatus", self.getUuid()))

Coverage / 说明

  • 目标分支兼容性:目标分支 zsv_5.1.0 领先 2 个 commit,MR merge_status = can_be_mergedchangeStatus 区域无文本冲突。建议 rebase 后再跑一次稳定性 case 复核。
  • 并发正确性changeStatusreload → nextState → update 存在 read-modify-write 窗口,但同一 snapshot 的消息经 @MessageSafe + cloudbus 按资源 uuid 串行化处理,不存在对同一 snapshot 的并发 changeStatus,无 lost-update 风险——非缺陷。
  • 行为变更:旧实现对已删除的 self 执行 dbf.update 可能触发 Hibernate merge 重新 INSERT(资源复活);新实现的 null 判空消除了该隐患,属正向改进。
  • 单文件 13 行改动,已逐行审查;未发现安全 / 性能问题。

Verdict: APPROVED

方案正确,根治性优于重试。updateDb() 的残留风险为非阻塞 Warning,但强烈建议同步处理——否则本 case 的不稳定来源未被完整覆盖。


🤖 Robot Reviewer

Resolves: ZSV-12287
Related: ZSV-5936

Change-Id: I627364726d796a7a67666162637065687263676c
@MatheMatrix MatheMatrix force-pushed the sync/wenhao.zhang/c-0 branch from fd55fb2 to 95aa779 Compare May 21, 2026 06:51
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotBase.java (1)

262-267: ⚡ Quick win

清库后顺手同步 self,避免内存态滞后。

这里已经把数据库里的 primaryStorageInstallPathprimaryStorageUuid 清空了,但 self 还保留旧值。建议和 changeStatus() 一样把内存对象同步一下,避免后续复用这个对象时继续带着过期的主存信息。

♻️ 建议修改
                         SQL.New(VolumeSnapshotVO.class)
                                 .eq(VolumeSnapshotVO_.uuid, latest.getUuid())
                                 .set(VolumeSnapshotVO_.primaryStorageInstallPath, null)
                                 .set(VolumeSnapshotVO_.primaryStorageUuid, null)
                                 .update();
+                        latest.setPrimaryStorageInstallPath(null);
+                        latest.setPrimaryStorageUuid(null);
+                        self = latest;
                         bus.reply(msg, dreply);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotBase.java`
around lines 262 - 267, The database fields primaryStorageInstallPath and
primaryStorageUuid are being cleared via SQL.New(...) but the in-memory
VolumeSnapshotBase object (self) still holds stale values; after the SQL update
in VolumeSnapshotBase (the block using
SQL.New(VolumeSnapshotVO.class).eq(...).set(...).update()), set
self.primaryStorageInstallPath = null and self.primaryStorageUuid = null
(mirroring how changeStatus() syncs the in-memory VO) so the memory state
matches the DB and prevents reuse of outdated primary storage info.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotBase.java`:
- Line 217: The debug log message incorrectly refers to a "volume snapshot
group" while the code is handling a VolumeSnapshotVO; update the logger.debug
calls in VolumeSnapshotBase (the statements that currently log "volume snapshot
group[uuid=%s] is not found") to use the correct object name such as "volume
snapshot" (or include VolumeSnapshotVO) so that missing snapshot and missing
snapshot-group messages are not confused; locate the logger.debug occurrences
around the null-checks for VolumeSnapshotVO and replace the string accordingly.

---

Nitpick comments:
In `@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotBase.java`:
- Around line 262-267: The database fields primaryStorageInstallPath and
primaryStorageUuid are being cleared via SQL.New(...) but the in-memory
VolumeSnapshotBase object (self) still holds stale values; after the SQL update
in VolumeSnapshotBase (the block using
SQL.New(VolumeSnapshotVO.class).eq(...).set(...).update()), set
self.primaryStorageInstallPath = null and self.primaryStorageUuid = null
(mirroring how changeStatus() syncs the in-memory VO) so the memory state
matches the DB and prevents reuse of outdated primary storage info.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6a5e7664-d748-4e5f-889e-eaddca2f84f5

📥 Commits

Reviewing files that changed from the base of the PR and between fd55fb2 and 95aa779.

📒 Files selected for processing (1)
  • storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotBase.java

String uuid = self.getUuid();
VolumeSnapshotVO latest = dbf.reload(self);
if (latest == null) {
logger.debug(String.format("volume snapshot group[uuid=%s] is not found", uuid));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

修正调试日志里的对象名称。

这里处理的是 VolumeSnapshotVO,当前日志写成了 “volume snapshot group”,并发删除排障时会把“快照不存在”和“快照组不存在”混淆。

📝 建议修改
- logger.debug(String.format("volume snapshot group[uuid=%s] is not found", uuid));
+ logger.debug(String.format("volume snapshot[uuid=%s] is not found", uuid));

Also applies to: 257-257

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotBase.java` at
line 217, The debug log message incorrectly refers to a "volume snapshot group"
while the code is handling a VolumeSnapshotVO; update the logger.debug calls in
VolumeSnapshotBase (the statements that currently log "volume snapshot
group[uuid=%s] is not found") to use the correct object name such as "volume
snapshot" (or include VolumeSnapshotVO) so that missing snapshot and missing
snapshot-group messages are not confused; locate the logger.debug occurrences
around the null-checks for VolumeSnapshotVO and replace the string accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants