fix(RK3566): workaround reboot hang caused by DRM VBlank wait with no timeout#2832
Open
mattafaak wants to merge 2 commits into
Open
fix(RK3566): workaround reboot hang caused by DRM VBlank wait with no timeout#2832mattafaak wants to merge 2 commits into
mattafaak wants to merge 2 commits into
Conversation
The Rockchip DRM driver's drm_atomic_helper_wait_for_vblanks() has no timeout, causing the kernel reboot notifier chain to hang indefinitely when there are pending display commits at shutdown time. This affects all RK3566 devices on ROCKNIX kernel 7.0.2 and manifests as the device appearing completely frozen after a reboot/shutdown is triggered (screen dark, power LED off, but device still powered with USB visible). Fix: add a systemd service that fires before essway/sway are stopped during shutdown. The service: 1. Disables sway display outputs via IPC, which calls drm_crtc_vblank_off() internally and wakes any pending VBlank waiters 2. Uses sysrq emergency restart (echo b > /proc/sysrq-trigger) which calls machine_emergency_restart() directly, bypassing the broken reboot notifier chain The Conflicts= directive ensures ExecStop fires during any shutdown/ reboot/halt sequence. The After=essway.service sway.service ordering ensures this runs before those services are stopped. Tested on: Anbernic RG ARC-D (RK3566)
Previous approach was based on two incorrect assumptions: 1. That swaymsg output power-off would wake pending VBlank waiters cleanly 2. That sysrq-b bypasses the DRM reboot notifier chain Both are wrong. Testing with persistent systemd journal shows: - Powering off the display stops hardware vblank interrupts, causing drm_atomic_helper_wait_for_vblanks() to hang for ~5 minutes - sysrq-b (machine_emergency_restart) traverses the same notifier chain on this kernel/platform — it does not bypass it Fix: remove the swaymsg display poweroff entirely. With the display active and vblanks still occurring when sysrq-b fires, the DRM notifier completes in <100ms. Measured result: Before: shutdown → new boot ≈ 6 minutes After: shutdown → new boot ≈ 17 seconds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
All RK3566 devices on ROCKNIX kernel 7.0.2 hang for ~5 minutes when rebooting or shutting down. The device appears frozen — screen dark, power LED active — and requires waiting or a forced power cycle.
Root cause:
drm_atomic_helper_wait_for_vblanks()in the Rockchip DRM driver has no timeout. This function is called from the DRM reboot notifier chain, which is traversed on any reboot path — includingsysrq-b(machine_emergency_restart). If the display has been powered off before the reboot, hardware vblank interrupts stop andwait_for_vblanks()blocks indefinitely until an internal ~5-minute timeout expires.Diagnosis
Diagnosed using persistent
systemd-journaldacross reboots (journalctl -b -1). The shutdown sequence captured:Two assumptions in the original approach were incorrect:
— it stops hardware vblanks, causing the hangswaymsg output '*' power offwakes pending VBlank waiters cleanly— it does not on this kernel/platformsysrq-bbypasses the DRM notifier chainFix
Do not power off the display before rebooting. With the display active and hardware vblanks still occurring,
drm_atomic_helper_wait_for_vblanks()returns in milliseconds. The service and service unit are unchanged; only the script is corrected to remove theswaymsgdisplay poweroff.Testing
journalctl -b -1across two reboot cycles:systemctl reboot: ✅opath)Notes
The proper long-term fix is a kernel patch to add a timeout to
drm_atomic_helper_wait_for_vblanks(). This userspace workaround is safe for all RK3566 devices in the meantime.🤖 Generated with Claude Code