Skip to content

Prevents Topology Infinite Loop#142

Open
matt001k wants to merge 2 commits into
mainfrom
fix/topology_crashing
Open

Prevents Topology Infinite Loop#142
matt001k wants to merge 2 commits into
mainfrom
fix/topology_crashing

Conversation

@matt001k
Copy link
Copy Markdown
Contributor

@matt001k matt001k commented May 5, 2026

What changed?

On a timeout event in the topology task, if on the root node, prevent changing the cursor.

How does it make Bristlemouth better?

If running a topology request on the root node, this prevents an infinite loop from occurring by ensuring that the topology cursor cannot be changed if the neighbor request retry limit has been hit.

Where should reviewers focus?

This prevents a node from hanging in the topology task and tripping the watchdog, but this does not prevent handling link up/down events from a non-ADIN device, such as the UART for bm_sbc. Because of this behavior, when a UART device is no longer available, the topology task takes about 3 seconds to finish after waiting for all of the retries to occur (because the Bristlemouth stack still thinks the UART port is up).
We need to think of a way to trigger link_change in l2.c for the UART device, but that will have to occur when implementing the composite device.

Checklist

  • Add or update unit tests for changed code
  • Ensure all submodules up to date. If this PR relies on changes in submodules, merge those PRs first, then point this PR at/after the merge commit
  • Ensure code is formatted correctly with clang-format. If there are large formatting changes, they should happen in a separate whitespace-only commit on this PR after all approvals.

If running a topology request on the root node, this prevents an
infinite loop from occurring by ensuring that the topology cursor cannot
be changed if the neighbor request retry limit has been hit.
@matt001k matt001k added the bug Something isn't working label May 5, 2026
@matt001k matt001k self-assigned this May 5, 2026
@matt001k matt001k requested review from towynlin and victorsowa12 May 5, 2026 20:52
Copy link
Copy Markdown
Contributor

@victorsowa12 victorsowa12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked IRL with Matt.

He has found another area that fixes the condition that causes the infinite loop be switching a == check to a <= check.

We also discussed potentially removing the network_topology_move calls from here and having this timeout just send a check node event. This will require some more testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants