-
Notifications
You must be signed in to change notification settings - Fork 12
Performance improvements to support the Flex waveform #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Sorry I just saw this one now, as it was in my junk folder. Will take a look when I get some time. Would suggest just the minimum optimisation is done to get the in Flex port running. As per PLT policy with RADE V1 we don't want to make RADE V1 optimisation/maintenance a regular activity - it will all be deleted soon. |
Agreed. Walter and I have been testing with the PR version of RADE and the waveform seems to be holding up well. I did test the Flex waveform with main and the RX audio was nowhere near as smooth, even though in theory we still have remaining idle CPU. |
|
|
@tmiw - do the ctests pass on the Pi 4 platform? (Assuming they can be run) |
|
This PR has some pretty extensive mods to several really important DSP functions. It's really hard to tell from simply looking at the source mods if the changes are OK. Really, really easy for subtle issues to cause problems here, and they may not be picked up by the ctests. Nervous about this code being pushed out to non-Flex users at this time. I think we need some evidence that each function has identical performance to the Alternatively, use this branch just for the Flex port, and consider it experimental. |
No further changes planned unless bugs are uncovered.
Yep, it's being used for freedv-gui too since the waveform does share a fair amount of code with it.
They do run but take quite a while (for example, ~2600 seconds on the CM4 board I'm using for waveform testing). No issues that I can tell. Also, GitHub has Linux ARM runners now so I added ARM to the automated ctests in this repo. Note: main doesn't currently pass on either x86_64 or ARM due to some changes made in Opus. This PR updates the default
I'll have to think about this some more. |
If you decide you are still keen to merge this code or use it for non-Flex targets pls let me know and I'll design adequate tests for you to implement. Please do not start coding any more tests until I have approved a test plan. Alternatively, just using the code in this PR for experimental use on Flex only is acceptable to me. Perhaps the appropriate use of this PR is something we need to discuss at PLT level:
|
I went ahead and updated CMake in freedv-gui to only use this branch for the Flex and KA9Q/web SDR integrations for now. Something we can maybe consider too is limiting the changes to only the C code and |
|
@tmiw - further to our PLT discussion - when convenient could you pls to break out just the BPF optimisation into a separate PR:
|
Done: #60. I also added some additional comments to hopefully explain what the new code is doing. |
|
Closing so these proposed changes can be broken out into smaller PRs such as #60 |
This PR contains various performance improvements to ensure that the hybrid C/Python version of RADEV1 can run acceptably on a Raspberry Pi 4 (the hardware inside the Flex 8000 and Aurora series of radios):
rade_open()(ensures that each run throughrade_tx()andrade_rx()take a deterministic amount of time).rade_rx()to avoid taking the Python lock multiple times per block of audio (also improves determinism).malloc(), etc.)np.lib.stride_tricks.as_strided()to generate arrays of sliding arrays where possible (i.e.[[1,2,3],[2,3,4],...]) and then perform a single e.g.np.matmulon the result. This reduces the overhead from having to go back and forth between Python and C while using NumPy.check_pilotswe generate a single array of items from the randomly-selectedrxsamples and perform a single NumPy call to calculate the result (which we put back intoDt1andDt2).Performance comparison in the GitHub environment using
ctest -V -R radae_rx_profile:main(f4254de):This PR:
(~20% improvement based on the first line of both results)
Real-world CPU usage testing with the Flex waveform in freedv-gui:
top): 95% CPU usingmain, 30% using this PR (!)main, 60% CPU using this PR