Kernel GPF in ip6_dst_lookup_tail leads to RCU stall and full hang (6.12.77-haos)

## The problem

On HAOS 17.2 (kernel `6.12.77-haos`, `generic-x86-64`), a repeatable kernel GPF fires on the IPv6 UDP `connect()` path:

```
Oops: general protection fault, probably for non-canonical address 0xfffdd2fbc4a10020: 0000 [#56] PREEMPT SMP NOPTI
CPU: 1 UID: 0 PID: 897463 Comm: sshd-session Tainted: G      D W          6.12.77-haos #1
RIP: 0010:ip6_dst_lookup_tail.constprop.0+0xa8/0x350
Call Trace:
 <TASK>
 ip6_dst_lookup_flow+0x42/0xc0
 ip6_datagram_dst_update+0x179/0x2c0
 __ip6_datagram_connect+0x195/0x3e0
 ? ip6_datagram_release_cb+0x20/0x80
 ip6_datagram_connect+0x26/0x40
 __sys_connect+0x9c/0xc0
 __x64_sys_connect+0x13/0x20
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
```

The non-canonical address `0xfffdd2fbc4a10020` shows up repeatedly across CPUs and processes — classic use-after-free in an IPv6 `dst_entry` pointer being dereferenced from the dst cache.

## Why it matters

The GPF kills the task but leaves a `udpv6` socket holding its spin lock (the task exits in D state in `udpv6_destroy_sock` → `lock_sock_nested`). Each occurrence leaks one stuck socket. After enough accumulation, `rcu_preempt` can no longer make forward progress:

```
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:  Tasks blocked on level-0 rcu_node (CPUs 0-1): P897143/2:b..l
rcu:  (detected by 0, t=3297262 jiffies, g=11630773, q=29202 ncpus=2)
task:sshd-session    state:D stack:0     pid:897143 ...
  __schedule → schedule → __lock_sock → lock_sock_nested → udpv6_destroy_sock →
  sk_common_release → inet_release → __sock_release → sock_close → __fput →
  task_work_run → do_exit → make_task_dead → rewind_stack_and_make_dead
```

Once enough tasks pile up on that lock, Docker health checks time out, user-facing HA becomes unresponsive, and the box needs a hardware power cycle. (Soft reboot can't run because so many tasks are wedged in D state.)

In my case the last recorded Oops before the hang was `[#94]`, over a ~34-minute window of 1 Oops every ~30s.

## Repro (observed, not deliberate)

Two processes reliably trigger the GPF on each invocation — both do an IPv6 UDP `connect()` as part of normal operation:

1. **`cloudflared` addon** (QUIC keep-alive to Cloudflare edge). Triggers every few seconds.
2. **`sshd-session`** spawned by the official SSH & Web Terminal addon, on every new SSH login (likely via `pam_systemd` / NSS hostname lookups).

Removing both triggers (uninstall cloudflared + set `ipv6.method=disabled` on the primary NetworkManager interface) stopped further Oopses.

## System information

```
Host: Home Assistant OS 17.2 (cpe:2.3:o:home-assistant:haos:17.2:*:production:*:*:*:generic-x86-64:*)
Supervisor: 2026.04.0
Core: 2026.4.x
Kernel: 6.12.77-haos
Board: generic-x86-64
Hardware: Fanless Mini PC Quieter2 / GMLR1 (Intel Apollo Lake, 2 CPUs)
Boot slot: B (RAUC A/B, slot A on 17.1 also affected)
```

Loaded modules (abridged):
```
rfcomm xfrm_user xt_set ip_set nft_chain_nat nft_compat nf_tables algif_hash
algif_skcipher af_alg bnep snd_soc_dmic iwlmvm sch_fq_codel mac80211 libarc4
btusb btmtk btrtl btbcm btintel bluetooth iwlwifi cfg80211 x86_pkg_temp_thermal
coretemp ax88796b ttm snd_soc_es8316 drm_buddy regmap_i2c drm_display_helper
asix usbnet phylink ...
```

## Suspected root cause

The faulting offset `ip6_dst_lookup_tail+0xa8` plus the non-canonical pointer (`0xfffd...`) point to a freed `struct dst_entry` being read from a per-socket / per-cache slot. There have been several IPv6 dst-cache lifetime fixes in the net-next tree post-6.12. Candidate commits worth checking against HAOS's 6.12.77 base:

- `ip6_dst_lookup_tail` / `__ip6_dst_lookup` refcount handling
- `udpv6_destroy_sock` → `__udpv6_disconnect` ordering

Happy to produce a kdump/vmcore if helpful — let me know what artifacts would be useful.

## Workaround I applied

1. Uninstall the `cloudflared` addon (eliminates the per-second QUIC trigger).
2. `ha network update <primary-iface> --ipv6-method disabled` (no outbound IPv6 routes, forces IPv4 paths everywhere).

After both, the Oopses stopped. I can't persistently `sysctl net.ipv6.conf.all.disable_ipv6=1` on HAOS because `/etc/sysctl.d` inside the SSH addon is not the host, and `ha` CLI has no host-level sysctl interface — so for a fully-fixed end-user experience this probably needs the kernel backport upstream.

## What would help in HAOS

- Backport the relevant IPv6 dst-cache fix into the HAOS 6.12 kernel config, **or**
- Expose a supported way for users to disable IPv6 at the host level (`ha os options --ipv6-disable`, kernel cmdline injection, or host sysctl interface).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kernel GPF in ip6_dst_lookup_tail leads to RCU stall and full hang (6.12.77-haos) #4653

The problem

Why it matters

Repro (observed, not deliberate)

System information

Suspected root cause

Workaround I applied

What would help in HAOS

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Kernel GPF in ip6_dst_lookup_tail leads to RCU stall and full hang (6.12.77-haos) #4653

Description

The problem

Why it matters

Repro (observed, not deliberate)

System information

Suspected root cause

Workaround I applied

What would help in HAOS

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions