-
Notifications
You must be signed in to change notification settings - Fork 403
Performance Regression on MT7621 + MT7915: CPU Context Switching overhead in Thermal Mutex path #1059
Description
Hi Felix,
I have been testing the patch "mt76: mt7915: hold dev->mutex while interacting with the thermal state" on MT7621 + MT7915 (Kernel 5.4). While the patch ensures safety during hardware restarts, it introduces a significant TX throughput drop (from 440Mbps down to 200Mbps) on the MT7621.
Technical Analysis:
Software Mutex vs. Hardware Mailbox:
The MT7915 hardware mailbox itself is efficient at handling concurrent requests. However, wrapping the thermal calls in dev->mt76.mutex (a sleeping mutex) forces the MT7621 CPU into frequent context switches.
MIPS Context Switch Penalty:
On the MT7621 (MIPS 1004Kc), the overhead of putting a high-speed TX thread to sleep and waking it up is far greater than on newer ARM SoCs. In a 440Mbps+ scenario (iperf3 -P 8), these millisecond-level "sleep gaps" cause the TCP congestion window to collapse and trigger the firmware's rate-control (Minstrel) to down-shift MCS due to perceived software-induced latency.
Long-term Fragmentation:
In 10-hour stress tests, I observed that as memory fragmentation increases over time, the CPU takes even longer to manage these mutex sleep/wake cycles. This leads to sustained hw-queued backlogs and a permanent drop in throughput, even though the hardware radio conditions remain excellent.
Recommendation:
I found that removing the mutex_lock from the thermal sysfs show/store functions (mt7915_thermal_temp_show/store) restores the full 440Mbps+ stability. The system remains stable over 10 hours with >1M BA MISS events, provided that the thermal polling frequency is reasonable (e.g., 5s).
Perhaps we can consider removing the global mutex from the thermal path for MIPS-based targets, or implement a lockless check for the "restarting" state to prevent the data plane from being blocked by a sleeping lock.
Best regards,
Simon