Describe the issue you are experiencing
Symptom
The device becomes completely unreachable over the network during automatic backup when cloud upload (cloud.cloud agent) is enabled. The system itself keeps running — CPU active, hardware watchdog serviced — but the ethernet interface is frozen. Recovery requires a full hardware power cycle. No soft reboot is possible because the device cannot be reached.
Confirmed via UniFi switch port activity log: the device disconnected during the backup window and did not reconnect for up to 17 hours until a manual PoE power cycle.
Root cause
dmesg shows the entire 64 MiB CMA pool (16384 pages) is exhausted at boot — before userspace starts — and stays
at 0 free pages for the full uptime:
[ 0.000000] Reserved memory: created CMA memory pool at 0x000000002ac00000, size 64 MiB
[ 1.054415] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[ 1.054432] cma: number of available pages: => 0 free of 16384 total pages
[ 1.054460] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[ 1.054470] cma: number of available pages: => 0 free of 16384 total pages
[ 3.129110] cma: __cma_alloc: linux,cma: alloc failed, req-size: 256 pages, ret: -12
[ 3.136900] cma: number of available pages: => 0 free of 16384 total pages
[ 19022.595450] cma: __cma_alloc: linux,cma: alloc failed, req-size: 512 pages, ret: -12
[ 19022.609697] cma: number of available pages: => 0 free of 16384 total pages
The VideoCore shared memory driver loads at boot and consumes the entire pool:
[ 6.389499] vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned.
[ 6.426354] bcm2835_vc_sm_cma_probe: Videocore shared memory driver
The BCM GENET driver initialises normally and the physical link comes up at 1 Gbps. However, under sustained
high-throughput upload (cloud backup streaming for 20+ minutes), the GENET driver attempts to allocate additional
DMA ring buffers. With CMA at 0 free pages this fails silently and the ethernet controller stalls.
Backup config at time of failure
"agent_ids": ["hassio.local", "cloud.cloud"],
"include_all_addons": true,
"include_database": true,
"schedule": { "recurrence": "daily", "time": null }
Backup duration: ~20 minutes. The stall occurs mid-upload.
Hardware watchdog note
The system runs a 15-second hardware watchdog (Broadcom BCM2835 Watchdog timer). Because the CPU stays alive and continues servicing the watchdog, no automatic reboot occurs — the device remains in a zombie network-dead state indefinitely until power-cycled.
Suggested fix
The Yellow is a headless device; the VideoCore GPU does not require 64 MiB of CMA. Reducing gpu_mem in the CM4 device tree or boot config (e.g. gpu_mem=16) would free the majority of the CMA pool for the GENET DMA driver. Alternatively, increasing the CMA reservation (e.g. cma=128M in the kernel command line) would give both GPU and ethernet DMA sufficient headroom.
Workaround
Removing cloud.cloud from the automatic backup agent list eliminates the sustained upload and prevents the stall.
What operating system image do you use?
yellow (Home Assistant Yellow)
What version of Home Assistant Operating System is installed?
17.2
Did the problem occur after upgrading the Operating System?
Yes
Hardware details
Environment
|
|
| Hardware |
Home Assistant Yellow |
| SoC |
Broadcom BCM2711 — Raspberry Pi Compute Module 4 Rev 1.0 (revision d03140) |
| RAM |
8 GB LPDDR4 |
| Storage |
Samsung SSD 980 500 GB (PCIe, firmware 3B4QFXO7) |
| Network |
BCM GENET Gigabit Ethernet (bcmgenet fd580000.ethernet end0) |
| Power |
PoE+ (802.3at) |
| HAOS version |
17.2 |
| Kernel |
6.12.75-haos-raspi |
| HA Core |
2026.4.2 |
| Supervisor |
2026.04.0 |
Steps to reproduce the issue
- Enable automatic daily backups with both
hassio.local and cloud.cloud agents
- Enable
include_all_addons: true and include_database: true
- Leave
schedule.time as null (HAOS picks a random time)
- Wait for the scheduled backup to run (backup duration ~20 minutes)
- Monitor network connectivity during the backup window — the device becomes completely unreachable mid-upload
- Attempt to reach the device via web UI or ping — no response
- Attempt
ha core restart or any supervisor command — unreachable
- Only recovery: full hardware power cycle (PoE port cycle or physical power)
Confirmed pattern via UniFi switch activity log: device disconnected at the start of the backup window and did not reconnect for 17 hours until manual PoE power cycle.
Anything in the Supervisor logs that might be useful for us?
2026-04-16 09:51:08.760 INFO (MainThread) [supervisor.docker.manager] Restarting homeassistant
2026-04-16 09:51:28.008 INFO (MainThread) [supervisor.homeassistant.core] Wait until Home Assistant is ready
2026-04-16 09:51:43.140 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to
APIState(core_state='NOT_RUNNING', offline_db_migration=False)
2026-04-16 09:52:59.223 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to
APIState(core_state='STARTING', offline_db_migration=False)
2026-04-16 09:53:17.191 INFO (MainThread) [supervisor.homeassistant.core] Home Assistant Core state changed to
APIState(core_state='RUNNING', offline_db_migration=False)
2026-04-16 10:27:03.222 INFO (MainThread) [supervisor.api.middleware.security] /supervisor/info access from
cebe7a76_hassio_google_drive_backup
2026-04-16 10:27:03.228 INFO (MainThread) [supervisor.api.middleware.security] /backups access from
cebe7a76_hassio_google_drive_backup
Anything in the Host logs that might be useful for us?
2026-04-16 16:51:27.612 homeassistant systemd[1]: docker-02fd9c2e....scope: Consumed 4h 50min 46.530s CPU time, 1.5G memory peak, 394.1M read from disk, 1.5G written to disk.
This is the Home Assistant Core container metrics at time of restart — 1.5 GB memory peak and 1.5 GB written to disk in a single session, consistent with heavy backup I/O.
System information
{
"version": "17.2",
"version_latest": "17.2",
"update_available": false,
"board": "yellow",
"boot": "B",
"data_disk": "Samsung-SSD-980-500GB-S64ENU0W902223L",
"boot_slots": {
"A": { "state": "inactive", "status": "good", "version": "17.1" },
"B": { "state": "booted", "status": "good", "version": "17.2" }
}
}
Resolution center: no unsupported, unhealthy, or flagged issues.
Additional information
dmesg (kernel: 6.12.75-haos-raspi, boot: 2026-04-16T03:44:21 UTC)
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[ 0.000000] Linux version 6.12.75-haos-raspi (builder@131c7abb4df6) (aarch64-buildroot-linux-gnu-gcc.br_real
(Buildroot -gc0e24ce5) 13.4.0, GNU ld (GNU Binutils) 2.43.1) #1 SMP PREEMPT Tue Apr 7 10:10:15 UTC 2026
[ 0.000000] Machine model: Raspberry Pi Compute Module 4 Rev 1.0
[ 0.000000] Reserved memory: created CMA memory pool at 0x000000002ac00000, size 64 MiB
[ 0.000000] OF: reserved mem: 0x000000002ac00000..0x000000002ebfffff (65536 KiB) map reusable linux,cma
[ 0.000000] Memory: 7965148K/8290304K available (16128K kernel code, 2512K rwdata, 5400K rodata, 2176K init, 601K
bss, 253928K reserved, 65536K cma-reserved)
[ 0.000000] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_headphones=0
bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 bcm2708_fb.fbswap=1 smsc95xx.macaddr=E4:5F:01:27:D5:CC
vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000 dwc_otg.lpm_enable=0 console=tty1 console=ttyAMA2,115200n8
zram.enabled=1 zram.num_devices=3 rootwait systemd.machine_id=0a05ce2a8e304120b1aa2f3131361945 cgroup_enable=memory
fsck.repair=yes root=PARTUUID=a3ec664e-32ce-4665-95ea-7ae90ce9aa20 ro rauc.slot=B
[ 0.652708] bcm2708_fb soc:fb: Unable to determine number of FBs. Disabling driver.
[ 0.652720] bcm2708_fb soc:fb: probe with driver bcm2708_fb failed with error -2
[ 0.668472] bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
[ 1.054415] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[ 1.054432] cma: number of available pages: => 0 free of 16384 total pages
[ 1.054460] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[ 1.054470] cma: number of available pages: => 0 free of 16384 total pages
[ 1.054510] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[ 1.054519] cma: number of available pages: => 0 free of 16384 total pages
[ 1.054542] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[ 1.054550] cma: number of available pages: => 0 free of 16384 total pages
[ 1.054587] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[ 1.054595] cma: number of available pages: => 0 free of 16384 total pages
[ 1.054613] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[ 1.054621] cma: number of available pages: => 0 free of 16384 total pages
[ 1.054658] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[ 1.054667] cma: number of available pages: => 0 free of 16384 total pages
[ 1.054685] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[ 1.054693] cma: number of available pages: => 0 free of 16384 total pages
[ 1.068421] cma: __cma_alloc: linux,cma: alloc failed, req-size: 67 pages, ret: -12
[ 1.068432] cma: number of available pages: => 0 free of 16384 total pages
[ 1.099388] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer
[ 3.129110] cma: __cma_alloc: linux,cma: alloc failed, req-size: 256 pages, ret: -12
[ 3.136900] cma: number of available pages: => 0 free of 16384 total pages
[ 3.413944] systemd[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog0
[ 3.424417] systemd[1]: Watchdog running with a hardware timeout of 15s.
[ 6.115381] bcmgenet fd580000.ethernet end0: renamed from eth0
[ 6.389499] vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned.
[ 6.426354] bcm2835_vc_sm_cma_probe: Videocore shared memory driver
[ 7.926330] rtc-pcf85063 6-0051: setting system clock to 2026-04-16T03:44:21 UTC (1776311061)
[ 9.425055] systemd-journald[151]: File /var/log/journal/0a05ce2a8e304120b1aa2f3131361945/system.journal corrupted
or uncleanly shut down, renaming and replacing.
[ 10.863069] bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay)
[ 10.864088] bcmgenet fd580000.ethernet end0: Link is Down
[ 14.944335] bcmgenet fd580000.ethernet end0: Link is Up - 1Gbps/Full - flow control off
[ 19022.595450] cma: __cma_alloc: linux,cma: alloc failed, req-size: 512 pages, ret: -12
[ 19022.609697] cma: number of available pages: => 0 free of 16384 total pages
Describe the issue you are experiencing
Symptom
The device becomes completely unreachable over the network during automatic backup when cloud upload (
cloud.cloudagent) is enabled. The system itself keeps running — CPU active, hardware watchdog serviced — but the ethernet interface is frozen. Recovery requires a full hardware power cycle. No soft reboot is possible because the device cannot be reached.Confirmed via UniFi switch port activity log: the device disconnected during the backup window and did not reconnect for up to 17 hours until a manual PoE power cycle.
Root cause
dmesgshows the entire 64 MiB CMA pool (16384 pages) is exhausted at boot — before userspace starts — and staysat 0 free pages for the full uptime:
[ 0.000000] Reserved memory: created CMA memory pool at 0x000000002ac00000, size 64 MiB
[ 1.054415] cma: __cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[ 1.054432] cma: number of available pages: => 0 free of 16384 total pages
[ 1.054460] cma: __cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[ 1.054470] cma: number of available pages: => 0 free of 16384 total pages
[ 3.129110] cma: __cma_alloc: linux,cma: alloc failed, req-size: 256 pages, ret: -12
[ 3.136900] cma: number of available pages: => 0 free of 16384 total pages
[ 19022.595450] cma: __cma_alloc: linux,cma: alloc failed, req-size: 512 pages, ret: -12
[ 19022.609697] cma: number of available pages: => 0 free of 16384 total pages
The VideoCore shared memory driver loads at boot and consumes the entire pool:
[ 6.389499] vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned.
[ 6.426354] bcm2835_vc_sm_cma_probe: Videocore shared memory driver
The BCM GENET driver initialises normally and the physical link comes up at 1 Gbps. However, under sustained
high-throughput upload (cloud backup streaming for 20+ minutes), the GENET driver attempts to allocate additional
DMA ring buffers. With CMA at 0 free pages this fails silently and the ethernet controller stalls.
Backup config at time of failure
Backup duration: ~20 minutes. The stall occurs mid-upload.
Hardware watchdog note
The system runs a 15-second hardware watchdog (Broadcom BCM2835 Watchdog timer). Because the CPU stays alive and continues servicing the watchdog, no automatic reboot occurs — the device remains in a zombie network-dead state indefinitely until power-cycled.
Suggested fix
The Yellow is a headless device; the VideoCore GPU does not require 64 MiB of CMA. Reducing gpu_mem in the CM4 device tree or boot config (e.g. gpu_mem=16) would free the majority of the CMA pool for the GENET DMA driver. Alternatively, increasing the CMA reservation (e.g. cma=128M in the kernel command line) would give both GPU and ethernet DMA sufficient headroom.
Workaround
Removing cloud.cloud from the automatic backup agent list eliminates the sustained upload and prevents the stall.
What operating system image do you use?
yellow (Home Assistant Yellow)
What version of Home Assistant Operating System is installed?
17.2
Did the problem occur after upgrading the Operating System?
Yes
Hardware details
Environment
d03140)3B4QFXO7)bcmgenet fd580000.ethernet end0)6.12.75-haos-raspiSteps to reproduce the issue
hassio.localandcloud.cloudagentsinclude_all_addons: trueandinclude_database: trueschedule.timeasnull(HAOS picks a random time)ha core restartor any supervisor command — unreachableConfirmed pattern via UniFi switch activity log: device disconnected at the start of the backup window and did not reconnect for 17 hours until manual PoE power cycle.
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System information
{ "version": "17.2", "version_latest": "17.2", "update_available": false, "board": "yellow", "boot": "B", "data_disk": "Samsung-SSD-980-500GB-S64ENU0W902223L", "boot_slots": { "A": { "state": "inactive", "status": "good", "version": "17.1" }, "B": { "state": "booted", "status": "good", "version": "17.2" } } }Resolution center: no unsupported, unhealthy, or flagged issues.
Additional information
dmesg (kernel: 6.12.75-haos-raspi, boot: 2026-04-16T03:44:21 UTC)