Skip to content

fix(RK3566): workaround reboot hang caused by DRM VBlank wait with no timeout#2832

Open
mattafaak wants to merge 2 commits into
ROCKNIX:nextfrom
mattafaak:fix/rk3566-reboot-hang
Open

fix(RK3566): workaround reboot hang caused by DRM VBlank wait with no timeout#2832
mattafaak wants to merge 2 commits into
ROCKNIX:nextfrom
mattafaak:fix/rk3566-reboot-hang

Conversation

@mattafaak

@mattafaak mattafaak commented Jun 3, 2026

Copy link
Copy Markdown

Problem

All RK3566 devices on ROCKNIX kernel 7.0.2 hang for ~5 minutes when rebooting or shutting down. The device appears frozen — screen dark, power LED active — and requires waiting or a forced power cycle.

Root cause: drm_atomic_helper_wait_for_vblanks() in the Rockchip DRM driver has no timeout. This function is called from the DRM reboot notifier chain, which is traversed on any reboot path — including sysrq-b (machine_emergency_restart). If the display has been powered off before the reboot, hardware vblank interrupts stop and wait_for_vblanks() blocks indefinitely until an internal ~5-minute timeout expires.

Diagnosis

Diagnosed using persistent systemd-journald across reboots (journalctl -b -1). The shutdown sequence captured:

18:11:52  swaymsg output '*' power off → {"success": true}   ← display powered off, vblanks stop
18:11:53  kernel: sysrq: Emergency Sync
18:11:54  echo b fires → machine_emergency_restart()
             ↓ DRM reboot notifier → drm_atomic_helper_wait_for_vblanks()
             ↓ no vblanks (display is off) → hangs ~5 minutes
18:17:55  new boot (hardware timeout finally fires)

Two assumptions in the original approach were incorrect:

  1. swaymsg output '*' power off wakes pending VBlank waiters cleanly — it stops hardware vblanks, causing the hang
  2. sysrq-b bypasses the DRM notifier chain — it does not on this kernel/platform

Fix

Do not power off the display before rebooting. With the display active and hardware vblanks still occurring, drm_atomic_helper_wait_for_vblanks() returns in milliseconds. The service and service unit are unchanged; only the script is corrected to remove the swaymsg display poweroff.

Testing

  • Device: Anbernic RG ARC-D (RK3566)
  • ROCKNIX: 20260601, kernel 7.0.2
  • Verified with journalctl -b -1 across two reboot cycles:
Scenario Shutdown → new boot
Before fix (swaymsg power off + sysrq-b) ~6 minutes
After fix (sysrq-b only, display active) ~17 seconds
  • Reboot via EmulationStation menu: ✅
  • Reboot via systemctl reboot: ✅
  • Poweroff: ✅ (sysrq o path)

Notes

The proper long-term fix is a kernel patch to add a timeout to drm_atomic_helper_wait_for_vblanks(). This userspace workaround is safe for all RK3566 devices in the meantime.

🤖 Generated with Claude Code

mattafaak and others added 2 commits June 2, 2026 21:53
The Rockchip DRM driver's drm_atomic_helper_wait_for_vblanks() has no
timeout, causing the kernel reboot notifier chain to hang indefinitely
when there are pending display commits at shutdown time. This affects
all RK3566 devices on ROCKNIX kernel 7.0.2 and manifests as the device
appearing completely frozen after a reboot/shutdown is triggered (screen
dark, power LED off, but device still powered with USB visible).

Fix: add a systemd service that fires before essway/sway are stopped
during shutdown. The service:
1. Disables sway display outputs via IPC, which calls drm_crtc_vblank_off()
   internally and wakes any pending VBlank waiters
2. Uses sysrq emergency restart (echo b > /proc/sysrq-trigger) which
   calls machine_emergency_restart() directly, bypassing the broken
   reboot notifier chain

The Conflicts= directive ensures ExecStop fires during any shutdown/
reboot/halt sequence. The After=essway.service sway.service ordering
ensures this runs before those services are stopped.

Tested on: Anbernic RG ARC-D (RK3566)
Previous approach was based on two incorrect assumptions:
1. That swaymsg output power-off would wake pending VBlank waiters cleanly
2. That sysrq-b bypasses the DRM reboot notifier chain

Both are wrong. Testing with persistent systemd journal shows:
- Powering off the display stops hardware vblank interrupts, causing
  drm_atomic_helper_wait_for_vblanks() to hang for ~5 minutes
- sysrq-b (machine_emergency_restart) traverses the same notifier chain
  on this kernel/platform — it does not bypass it

Fix: remove the swaymsg display poweroff entirely. With the display
active and vblanks still occurring when sysrq-b fires, the DRM notifier
completes in <100ms.

Measured result:
  Before: shutdown → new boot ≈ 6 minutes
  After:  shutdown → new boot ≈ 17 seconds

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant