sidecar-seq: re-sequence on bad PCIe link#2402
Draft
Aaron-Hartwig wants to merge 1 commit intomasterfrom
Draft
Conversation
Contributor
Author
|
I'm opening this as a draft to get feedback on the strategy before attempting to test this more broadly (i.e., outside my equipment). Here we can watch the Sidecar SP detect the problem has occurred and resequence itself And then we see the link come up: |
labbott
reviewed
Feb 27, 2026
Comment on lines
+174
to
+176
| // used to track how many notification loops elapsed while in A0 without a | ||
| // PCIe link | ||
| no_pcie_count: u8, |
Collaborator
There was a problem hiding this comment.
brief comment explaining that this is limited so we won't overflow?
| { | ||
| self.no_pcie_count += 1; | ||
|
|
||
| if self.no_pcie_count >= 30 { |
| ringbuf_entry!(Trace::TofinoResequence); | ||
| self.tofino.power_down()?; | ||
| self.tofino.power_up()?; | ||
| self.resequenced = true; |
Collaborator
There was a problem hiding this comment.
It doesn't look like we ever reset self.resequenced after we've sucessfully reset. This function is still called in the timer notification past bootup though. I get that this is supposed to be a one time bootup thing but something here seems strange.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For reasons we've been unable to nail down, sometimes on a cold boot of a rack the Tofino is unhappy with its PCIe link to the SP5 on Cosmo. To date, the workaround of a Tofino resequence has reliably addressed the issue. The goal of this commit to have Sidecar's SP automatically do this. The SP will monitor for this case by starting a 30 second timer once we observe the SP5 to have released PERST to the Tofino. When the link comes up, it does so within a few seconds. When it doesn't, we will resequence the Tofino after waiting 30 seconds. We will only do this once.