Skip to content

Conversation

@Shubh3005
Copy link

Description

This PR changes the default thread count for the AprilTag detector from 4 to 2 in AprilTagPipelineSettings.java.

Why:
As discussed in issue #2125, running 4 threads by default can cause thread contention on resource-constrained coprocessors (like Raspberry Pi 4/5), potentially starving other critical processes like NetworkTables or the web server.

Benchmarks:
I tested this on a local build using standard AprilTag test images. While the absolute FPS was high due to host hardware (MacBook Pro), the latency scaling confirms the diminishing returns of higher thread counts:

Thread Count Avg Latency Avg FPS Notes
1 Thread ~11ms 30 Baseline
2 Threads ~7.5ms 30 ~30% improvement (Sweet spot)
4 Threads ~6.5ms 30-32 Negligible gain over 2 threads

The data suggests that 2 threads hits the optimal balance between performance and resource efficiency.

Closes #2125

@Shubh3005 Shubh3005 requested a review from a team as a code owner January 1, 2026 22:33
@github-actions github-actions bot added the backend Things relating to photon-core and photon-server label Jan 1, 2026
@Gold856
Copy link
Collaborator

Gold856 commented Jan 2, 2026

Latency is also impacted by processing time, camera FPS, and the exact settings of the AprilTag detector. On a Orange Pi 5, I'm seeing that 3 threads is actually the sweet spot, for my set of settings. 2 threads has a mean latency of 33 ms, while 3 has a mean latency of 29 ms, which I would say is significant (and it looks like you do too). I'd like to see benchmarks on platforms that are commonly used (notably, this doesn't include Macs of any sort.)

@Shubh3005
Copy link
Author

Thanks for running those benchmarks on the Orange Pi 5! That 4ms latency improvement (33ms -> 29ms) with 3 threads is definitely significant on that hardware.

One concern regarding the default: The Orange Pi 5 has 8 cores, so running 3 worker threads leaves plenty of headroom. However, the Raspberry Pi 4 (standard FRC coprocessor) only has 4 cores.

If we default to 3 threads, we might saturate a Pi 4 (3 workers + 1 OS/NetworkTables/Driver thread), re-introducing the starvation/jitter issue.

Should the default be optimized for:

Performance (3 Threads) - Best for OPi5 / Mini PCs.

Safety (2 Threads) - Best for RPi 3/4 to ensure system stability.

I am happy to update the PR to 3 threads if you think the Pi 4 can handle the load, but I wanted to flag the core-count difference first.

@Gold856
Copy link
Collaborator

Gold856 commented Jan 2, 2026

How do you know that 3 threads saturates a Raspberry Pi 4? It might; it has a very weak processor compared to the Orange Pi 5, but these threads aren’t saturating the entire core, like you’re assuming they will. More benchmarks on a variety of hardware with a variety of settings need to be done before a conclusion can be drawn.

@crschardt
Copy link
Contributor

crschardt commented Jan 2, 2026

One concern regarding the default: The Orange Pi 5 has 8 cores, so running 3 worker threads leaves plenty of headroom. However, the Raspberry Pi 4 (standard FRC coprocessor) only has 4 cores.

We only run PV on the 4 high performance cores on the OPi5 and RubikPi3, so it's really the same as a Pi 4 (or 5) in that sense.

Is there any data showing resource starvation on any of our supported platforms? I've tested on RPi4 with two cameras and haven't been able to saturate the processor despite trying a wide range of settings.

Here's some data from one of my attempts to load down the CPU on my RPi4b. The first figure shows data from running two OV9281 cameras at 800x600 resolution with decimate set to 1 and varying the number of threads:
image

The second figure shows the same experiment, but now with decimate set to 2:
image

The first case had the highest CPU load, but it never exceeded 90%. Having 2 threads gave the highest FPS and adding threads actually lowered performance despite increasing CPU load!

In the second case, the FPS was much higher and didn't increase with the number of threads. Adding threads made the frame rate less consistent.

I wasn't able to saturate the cores with either of these tests, and these provided some of the highest CPU loads that I could achieve. It seems like the choice of number of threads should be driven by a team's goals regarding performance. Perhaps 2 is a better default than 4, but without more testing on other platforms, we really don't know.

@crschardt
Copy link
Contributor

crschardt commented Jan 2, 2026

I decided to test single camera performance on the Raspberry Pi 4b too. I did this test with the resolution of 1280x800 to try to get the highest CPU load. Here are the results for decimate of 1 and decimate of 2:

One camera, decimate = 1
image

One camera, decimate = 2
image

For a single camera, FPS is maximized with 4 threads for either decimate setting. The highest CPU load was less than 75% for the decimate 1 case and less than 56% for the decimate 2 case. So for 1 camera on a RPi4, 4 threads seems like a reasonable default to get the highest FPS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Things relating to photon-core and photon-server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce default number of threads for AprilTag detector

3 participants