HA Yellow (CM4): BCM GENET ethernet DMA stalls during heavy network upload due to CMA pool exhausted at boot

### Describe the issue you are experiencing

## Symptom

The device becomes **completely unreachable over the network** during automatic backup when cloud upload (`cloud.cloud` agent) is enabled. The system itself keeps running — CPU active, hardware watchdog serviced — but the ethernet interface is frozen. Recovery requires a full hardware power cycle. No soft reboot is possible because the device cannot be reached.

Confirmed via UniFi switch port activity log: the device disconnected during the backup window and did not reconnect for up to 17 hours until a manual PoE power cycle.

## Root cause

  `dmesg` shows the **entire 64 MiB CMA pool (16384 pages) is exhausted at boot** — before userspace starts — and stays
  at 0 free pages for the full uptime:

  [    0.000000] Reserved memory: created CMA memory pool at 0x000000002ac00000, size 64 MiB
  [    1.054415] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
  [    1.054432] cma: number of available pages: => 0 free of 16384 total pages
  [    1.054460] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
  [    1.054470] cma: number of available pages: => 0 free of 16384 total pages
  [    3.129110] cma: __cma_alloc: linux,cma: alloc failed, req-size: 256 pages, ret: -12
  [    3.136900] cma: number of available pages: => 0 free of 16384 total pages
  [   19022.595450] cma: __cma_alloc: linux,cma: alloc failed, req-size: 512 pages, ret: -12
  [   19022.609697] cma: number of available pages: => 0 free of 16384 total pages

The VideoCore shared memory driver loads at boot and consumes the entire pool:

  [    6.389499] vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned.
  [    6.426354] bcm2835_vc_sm_cma_probe: Videocore shared memory driver

The BCM GENET driver initialises normally and the physical link comes up at 1 Gbps. However, under **sustained
  high-throughput upload** (cloud backup streaming for 20+ minutes), the GENET driver attempts to allocate additional
  DMA ring buffers. With CMA at 0 free pages this fails silently and the ethernet controller stalls.

## Backup config at time of failure

```json
  "agent_ids": ["hassio.local", "cloud.cloud"],
  "include_all_addons": true,
  "include_database": true,
  "schedule": { "recurrence": "daily", "time": null }
```
  Backup duration: ~20 minutes. The stall occurs mid-upload.

###  Hardware watchdog note

The system runs a 15-second hardware watchdog (Broadcom BCM2835 Watchdog timer). Because the CPU stays alive and continues servicing the watchdog, no automatic reboot occurs — the device remains in a zombie network-dead state indefinitely until power-cycled.

###  Suggested fix

The Yellow is a headless device; the VideoCore GPU does not require 64 MiB of CMA. Reducing gpu_mem in the CM4 device tree or boot config (e.g. gpu_mem=16) would free the majority of the CMA pool for the GENET DMA driver. Alternatively, increasing the CMA reservation (e.g. cma=128M in the kernel command line) would give both GPU and ethernet DMA sufficient headroom.

###  Workaround

Removing cloud.cloud from the automatic backup agent list eliminates the sustained upload and prevents the stall.

### What operating system image do you use?

yellow (Home Assistant Yellow)

### What version of Home Assistant Operating System is installed?

17.2

### Did the problem occur after upgrading the Operating System?

Yes

### Hardware details

## Environment

  | | |
  |---|---|
  | **Hardware** | Home Assistant Yellow |
  | **SoC** | Broadcom BCM2711 — Raspberry Pi Compute Module 4 Rev 1.0 (revision `d03140`) |
  | **RAM** | 8 GB LPDDR4 |
  | **Storage** | Samsung SSD 980 500 GB (PCIe, firmware `3B4QFXO7`) |
  | **Network** | BCM GENET Gigabit Ethernet (`bcmgenet fd580000.ethernet end0`) |
  | **Power** | PoE+ (802.3at) |
  | **HAOS version** | 17.2 |
  | **Kernel** | `6.12.75-haos-raspi` |
  | **HA Core** | 2026.4.2 |
  | **Supervisor** | 2026.04.0 |


### Steps to reproduce the issue

  1. Enable automatic daily backups with both `hassio.local` and `cloud.cloud` agents
  2. Enable `include_all_addons: true` and `include_database: true`
  3. Leave `schedule.time` as `null` (HAOS picks a random time)
  4. Wait for the scheduled backup to run (backup duration ~20 minutes)
  5. Monitor network connectivity during the backup window — the device becomes completely unreachable mid-upload
  6. Attempt to reach the device via web UI or ping — no response
  7. Attempt `ha core restart` or any supervisor command — unreachable
  8. Only recovery: full hardware power cycle (PoE port cycle or physical power)

  **Confirmed pattern** via UniFi switch activity log: device disconnected at the start of the backup window and did not reconnect for 17 hours until manual PoE power cycle.

### Anything in the Supervisor logs that might be useful for us?

```txt
2026-04-16 09:51:08.760 INFO (MainThread) [supervisor.docker.manager] Restarting homeassistant
  2026-04-16 09:51:28.008 INFO (MainThread) [supervisor.homeassistant.core] Wait until Home Assistant is ready
  2026-04-16 09:51:43.140 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to
  APIState(core_state='NOT_RUNNING', offline_db_migration=False)
  2026-04-16 09:52:59.223 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to
  APIState(core_state='STARTING', offline_db_migration=False)
  2026-04-16 09:53:17.191 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to
  APIState(core_state='RUNNING', offline_db_migration=False)
  2026-04-16 10:27:03.222 INFO (MainThread) [supervisor.api.middleware.security] /supervisor/info access from
  cebe7a76_hassio_google_drive_backup
  2026-04-16 10:27:03.228 INFO (MainThread) [supervisor.api.middleware.security] /backups access from
  cebe7a76_hassio_google_drive_backup
```

### Anything in the Host logs that might be useful for us?

```txt
2026-04-16 16:51:27.612 homeassistant systemd[1]: docker-02fd9c2e....scope: Consumed 4h 50min 46.530s CPU time, 1.5G memory peak, 394.1M read from disk, 1.5G written to disk.

This is the Home Assistant Core container metrics at time of restart — 1.5 GB memory peak and 1.5 GB written to disk in a single session, consistent with heavy backup I/O.
```

### System information

```json
  {
    "version": "17.2",
    "version_latest": "17.2",
    "update_available": false,
    "board": "yellow",
    "boot": "B",
    "data_disk": "Samsung-SSD-980-500GB-S64ENU0W902223L",
    "boot_slots": {
      "A": { "state": "inactive", "status": "good", "version": "17.1" },
      "B": { "state": "booted",   "status": "good", "version": "17.2" }
    }
  }
```

Resolution center: no unsupported, unhealthy, or flagged issues.

### Additional information

dmesg (kernel: 6.12.75-haos-raspi, boot: 2026-04-16T03:44:21 UTC)

  ```
  [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
  [    0.000000] Linux version 6.12.75-haos-raspi (builder@131c7abb4df6) (aarch64-buildroot-linux-gnu-gcc.br_real
  (Buildroot -gc0e24ce5) 13.4.0, GNU ld (GNU Binutils) 2.43.1) #1 SMP PREEMPT Tue Apr  7 10:10:15 UTC 2026
  [    0.000000] Machine model: Raspberry Pi Compute Module 4 Rev 1.0
  [    0.000000] Reserved memory: created CMA memory pool at 0x000000002ac00000, size 64 MiB
  [    0.000000] OF: reserved mem: 0x000000002ac00000..0x000000002ebfffff (65536 KiB) map reusable linux,cma
  [    0.000000] Memory: 7965148K/8290304K available (16128K kernel code, 2512K rwdata, 5400K rodata, 2176K init, 601K
  bss, 253928K reserved, 65536K cma-reserved)
  [    0.000000] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_headphones=0
  bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 bcm2708_fb.fbswap=1 smsc95xx.macaddr=E4:5F:01:27:D5:CC
  vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  dwc_otg.lpm_enable=0 console=tty1 console=ttyAMA2,115200n8
  zram.enabled=1 zram.num_devices=3 rootwait systemd.machine_id=0a05ce2a8e304120b1aa2f3131361945 cgroup_enable=memory
  fsck.repair=yes  root=PARTUUID=a3ec664e-32ce-4665-95ea-7ae90ce9aa20 ro rauc.slot=B
  [    0.652708] bcm2708_fb soc:fb: Unable to determine number of FBs. Disabling driver.
  [    0.652720] bcm2708_fb soc:fb: probe with driver bcm2708_fb failed with error -2
  [    0.668472] bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
  [    1.054415] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
  [    1.054432] cma: number of available pages: => 0 free of 16384 total pages
  [    1.054460] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
  [    1.054470] cma: number of available pages: => 0 free of 16384 total pages
  [    1.054510] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
  [    1.054519] cma: number of available pages: => 0 free of 16384 total pages
  [    1.054542] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
  [    1.054550] cma: number of available pages: => 0 free of 16384 total pages
  [    1.054587] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
  [    1.054595] cma: number of available pages: => 0 free of 16384 total pages
  [    1.054613] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
  [    1.054621] cma: number of available pages: => 0 free of 16384 total pages
  [    1.054658] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
  [    1.054667] cma: number of available pages: => 0 free of 16384 total pages
  [    1.054685] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
  [    1.054693] cma: number of available pages: => 0 free of 16384 total pages
  [    1.068421] cma: __cma_alloc: linux,cma: alloc failed, req-size: 67 pages, ret: -12
  [    1.068432] cma: number of available pages: => 0 free of 16384 total pages
  [    1.099388] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer
  [    3.129110] cma: __cma_alloc: linux,cma: alloc failed, req-size: 256 pages, ret: -12
  [    3.136900] cma: number of available pages: => 0 free of 16384 total pages
  [    3.413944] systemd[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
  [    3.424417] systemd[1]: Watchdog running with a hardware timeout of 15s.
  [    6.115381] bcmgenet fd580000.ethernet end0: renamed from eth0
  [    6.389499] vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned.
  [    6.426354] bcm2835_vc_sm_cma_probe: Videocore shared memory driver
  [    7.926330] rtc-pcf85063 6-0051: setting system clock to 2026-04-16T03:44:21 UTC (1776311061)
  [    9.425055] systemd-journald[151]: File /var/log/journal/0a05ce2a8e304120b1aa2f3131361945/system.journal corrupted
  or uncleanly shut down, renaming and replacing.
  [   10.863069] bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay)
  [   10.864088] bcmgenet fd580000.ethernet end0: Link is Down
  [   14.944335] bcmgenet fd580000.ethernet end0: Link is Up - 1Gbps/Full - flow control off
  [   19022.595450] cma: __cma_alloc: linux,cma: alloc failed, req-size: 512 pages, ret: -12
  [   19022.609697] cma: number of available pages: => 0 free of 16384 total pages
  ```



Hardware	Home Assistant Yellow
SoC	Broadcom BCM2711 — Raspberry Pi Compute Module 4 Rev 1.0 (revision `d03140`)
RAM	8 GB LPDDR4
Storage	Samsung SSD 980 500 GB (PCIe, firmware `3B4QFXO7`)
Network	BCM GENET Gigabit Ethernet (`bcmgenet fd580000.ethernet end0`)
Power	PoE+ (802.3at)
HAOS version	17.2
Kernel	`6.12.75-haos-raspi`
HA Core	2026.4.2
Supervisor	2026.04.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HA Yellow (CM4): BCM GENET ethernet DMA stalls during heavy network upload due to CMA pool exhausted at boot #4651

Describe the issue you are experiencing

Symptom

Root cause

Backup config at time of failure

Hardware watchdog note

Suggested fix

Workaround

What operating system image do you use?

What version of Home Assistant Operating System is installed?

Did the problem occur after upgrading the Operating System?

Hardware details

Environment

Steps to reproduce the issue

Anything in the Supervisor logs that might be useful for us?

Anything in the Host logs that might be useful for us?

System information

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

HA Yellow (CM4): BCM GENET ethernet DMA stalls during heavy network upload due to CMA pool exhausted at boot #4651

Description

Describe the issue you are experiencing

Symptom

Root cause

Backup config at time of failure

Hardware watchdog note

Suggested fix

Workaround

What operating system image do you use?

What version of Home Assistant Operating System is installed?

Did the problem occur after upgrading the Operating System?

Hardware details

Environment

Steps to reproduce the issue

Anything in the Supervisor logs that might be useful for us?

Anything in the Host logs that might be useful for us?

System information

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions