Add shared ereport types crate, more sequencer ereports#2399
Add shared ereport types crate, more sequencer ereports#2399
Conversation
drv/cosmo-seq-server/src/main.rs
Outdated
| @@ -219,17 +221,15 @@ fn main() -> ! { | |||
|
|
|||
| if let Some(last_cause) = last_cause { | |||
| // Report the failure even if we eventually succeeded. | |||
| try_send_ereport( | |||
| &packrat, | |||
| &mut ereport_buf[..], | |||
| EreportClass::Bmr491MitigationFailure, | |||
| EreportKind::Bmr491MitigationFailure { | |||
| refdes: FixedStr::from_str(dev.component_id()), | |||
| failures, | |||
| last_cause, | |||
| succeeded, | |||
| }, | |||
| ); | |||
| let ereport = ereports::pwr::Bmr491MitigationFailure { | |||
| refdes: FixedStr::<{ REFDES_LEN }>::from_str( | |||
| dev.component_id(), | |||
There was a problem hiding this comment.
It's a bit annoying that these are generic over REFDES_LEN, since it means that the downstream task must specify it both in the max_cbor_len_for! macro and when actually constructing the FixedStrs (as the type is not inferred there, since it could be any FixedStr).
The alternative to this would be making the ereports crate depend on build_i2c, so that it can get the MAX_COMPONENT_ID_LEN value from there. But, that felt a bit gross for the lib that just defines a bunch of types to have to go build all the I2C stuff. Also, there might be some cases where a task needs to override the length in order to use a refdes that isn't coming from the I2C config. For instance, when we wire up the "look up refdes for a sensor ID" stuff in #2364 (which i started on in #2383), a sensor which is not an I2C device may have a longer refdes than any of the I2C devices in the config (unlikely but who knows!). Hmm.
|
Testing out the eliza@jeeves ~ $ pfexec humility hiffy -c FmcDemo.poke32 -a addr=0xc0000208,value=0x2
humility: WARNING: archive in environment variable overriding archive in environment file
humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP
FmcDemo.poke32() => ()
eliza@jeeves ~ $ pfexec humility hiffy -c Sequencer.get_state
humility: WARNING: archive in environment variable overriding archive in environment file
humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP
Sequencer.get_state() => A0Thermtrip
eliza@jeeves ~ $ pfexec humility hiffy -c Sequencer.set_state -a state=A2
humility: WARNING: archive in environment variable overriding archive in environment file
humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP
Sequencer.set_state() => Changed
eliza@jeeves ~ $ pfexec humility hiffy -c Sequencer.set_state -a state=A0
humility: WARNING: archive in environment variable overriding archive in environment file
humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP
Sequencer.set_state() => Changed
eliza@jeeves ~ $ pilot sp -i e1000g0 exec -e 'ereports' fe80::aa40:25ff:fe04:11c5
Feb 26 18:46:53.957 INFO creating SP handle on interface e1000g0, component: faux-mgs
Feb 26 18:46:53.988 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:11c5%2]:11111, interface: e1000g0, socket: control-plane-agent, component: faux-mgs
restart ID: 7b2025d6-239d-9da0-5ddd-d6b13b6fb64b
restart IDs did not match (requested 00000000-0000-0000-0000-000000000000)
count: 2
ereports:
0x1: {
"baseboard_part_number": String("913-0000023"),
"baseboard_rev": Number(2),
"baseboard_serial_number": String("2J80WF9K\0\0\0"),
"ereport_message_version": Number(0),
"hubris_caboose": Object {
"board": String("cosmo-b"),
"commit": String("579b66a6f5c2a940073f4574228df91ca1f69306-dirty"),
"version": Null,
},
"hubris_task_gen": Number(0),
"hubris_task_name": String("packrat"),
"hubris_uptime_ms": Number(0),
"lost": Null,
}
0x2: {
"baseboard_part_number": String("913-0000023"),
"baseboard_rev": Number(2),
"baseboard_serial_number": String("2J80WF9K\0\0\0"),
"dev_id": String("sp5-host-cpu"),
"ereport_message_version": Number(0),
"hubris_caboose": Object {
"board": String("cosmo-b"),
"commit": String("579b66a6f5c2a940073f4574228df91ca1f69306-dirty"),
"version": Null,
},
"hubris_task_gen": Number(0),
"hubris_task_name": String("cosmo_seq"),
"hubris_uptime_ms": Number(568146),
"k": String("hw.cpu.amd.thermtrip"),
"refdes": String("P0"),
"state": Object {
"cur": String("A0PlusHP"),
"since": Number(314536),
},
"v": Number(0),
}
eliza@jeeves ~ $ pfexec humility hiffy -c Sequencer.get_state
humility: WARNING: archive in environment variable overriding archive in environment file
humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP
Sequencer.get_state() => A0
eliza@jeeves ~ $ pfexec humility hiffy -c FmcDemo.poke32 -a addr=0xc0000208,value=0x4
humility: WARNING: archive in environment variable overriding archive in environment file
humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP
FmcDemo.poke32() => ()
eliza@jeeves ~ $ pilot sp -i e1000g0 exec -e 'ereports' fe80::aa40:25ff:fe04:11c5
Feb 26 18:47:39.106 INFO creating SP handle on interface e1000g0, component: faux-mgs
Feb 26 18:47:39.137 INFO initial discovery complete, addr: [fe80::aa40:25ff:fe04:11c5%2]:11111, interface: e1000g0, socket: control-plane-agent, component: faux-mgs
restart ID: 7b2025d6-239d-9da0-5ddd-d6b13b6fb64b
restart IDs did not match (requested 00000000-0000-0000-0000-000000000000)
count: 3
ereports:
0x1: {
"baseboard_part_number": String("913-0000023"),
"baseboard_rev": Number(2),
"baseboard_serial_number": String("2J80WF9K\0\0\0"),
"ereport_message_version": Number(0),
"hubris_caboose": Object {
"board": String("cosmo-b"),
"commit": String("579b66a6f5c2a940073f4574228df91ca1f69306-dirty"),
"version": Null,
},
"hubris_task_gen": Number(0),
"hubris_task_name": String("packrat"),
"hubris_uptime_ms": Number(0),
"lost": Null,
}
0x2: {
"baseboard_part_number": String("913-0000023"),
"baseboard_rev": Number(2),
"baseboard_serial_number": String("2J80WF9K\0\0\0"),
"dev_id": String("sp5-host-cpu"),
"ereport_message_version": Number(0),
"hubris_caboose": Object {
"board": String("cosmo-b"),
"commit": String("579b66a6f5c2a940073f4574228df91ca1f69306-dirty"),
"version": Null,
},
"hubris_task_gen": Number(0),
"hubris_task_name": String("cosmo_seq"),
"hubris_uptime_ms": Number(568146),
"k": String("hw.cpu.amd.thermtrip"),
"refdes": String("P0"),
"state": Object {
"cur": String("A0PlusHP"),
"since": Number(314536),
},
"v": Number(0),
}
0x3: {
"baseboard_part_number": String("913-0000023"),
"baseboard_rev": Number(2),
"baseboard_serial_number": String("2J80WF9K\0\0\0"),
"dev_id": String("sp5-host-cpu"),
"ereport_message_version": Number(0),
"hubris_caboose": Object {
"board": String("cosmo-b"),
"commit": String("579b66a6f5c2a940073f4574228df91ca1f69306-dirty"),
"version": Null,
},
"hubris_task_gen": Number(0),
"hubris_task_name": String("cosmo_seq"),
"hubris_uptime_ms": Number(714333),
"k": String("hw.cpu.amd.smerr"),
"refdes": String("P0"),
"state": Object {
"cur": String("A0"),
"since": Number(617259),
},
"v": Number(0),
}
eliza@jeeves ~ $ pfexec humility hiffy -c Sequencer.get_state
humility: WARNING: archive in environment variable overriding archive in environment file
humility: attached to 1fc9:0143:ISS1EG0OBG2HT via CMSIS-DAP
Sequencer.get_state() => A2
eliza@jeeves ~ $ |
ba992d6 to
6717b05
Compare
This reverts commit 6717b05. I forgot that we actaully do need this.
|
@rmustacc I've requested your review on this mostly just for any feedback on the new ereport messages and their contents, since the new ereports are based on your suggestions from #2242. Hopefully, now that we have a shared library with most of the ereport type definitions, it should be a bit easier to review just the ereport messages, without having to wade through a whole bunch of implementation details moving Rust around. In particular, you can see the complete list of all the ereport classes that each sequencer task will emit here (for Gimlet): hubris/drv/gimlet-seq-server/src/main.rs Lines 1601 to 1606 in 092f494 and here (for Cosmo): hubris/drv/cosmo-seq-server/src/main.rs Lines 1215 to 1221 in 092f494 and you should be able to find most of these Rust types defined in this library, except for a few whose content is specific to the Cosmo or Gimlet sequencer: Please don't feel like you need to review anything beyond how we feel about the ereports, their contents, and the class hierarchy and naming --- thanks in advance for taking a look! |
Presently, the compute sled sequencer tasks (
cosmo_seqandgimlet_seq) duplicate a large number of type definitions for ereportmessages. This is due to our present practice of defining a task's
ereports as one big enum of all the ereports it may send, which is
almost always unique to that task. This is unfortunate, as it means that
we must duplicate these messages when we would like them to be
consistent between multiple tasks.
Furthermore, the sequencer tasks currently only emit ereports for a
small subset of events that we care about --- mostly Vcore
regulator PMBus alerts. There are several other events that should
definitely be reported. I had previously attempted to add some new
ereports to the sequencer tasks in #2242, but at the time, neither
@rmustacc nor I were particularly satisfied with the content and class
hierarchy of those ereports, and I was unhappy with the fact that these
ereports added additional duplicated code between the two sequencer
tasks.
This PR is a second attempt to add some of the sequencer ereports I
would like, using the new API added in #2397 to define those ereport
messages in a shared crate, rather than duplicating them between the two
sequencer tasks. In particular, I've done the following:
lib/ereportscrate that contains shared ereportmessage type definitions, using the
#[ereport(...)]attribute frommicrocbor: add
#[ereport(...)]attribute #2397 to define the ereports as separate types.gimlet_seqandcosmo_seqereports to movethe message types for PMBus alerts and BMR491 mitigation failures to
the
ereportscrate, and use them in both sequencer tasks.psc_seqtask to use the shared definition ofPmbusStatusregisters. I didn't change the actual ereport messagesthere, as they're pretty different from the CPU sequencer ones.
power state, so that this can be included in some of the ereports,
as @rmustacc suggested in this comment.
THERMTRIP_LpinSMERR_Lpin (on Cosmo, as I'm not sure ifthe Gimlet sequencer task can notice this)
coretypeandsp3r{x}/sp5r{x}pins)I did not add ereports for MAPO or NIC MAPO events, as per a
conversation on Matrix with @rmustacc, I think we will want to report
MAPOs by adding better monitoring of
POWER_GOODstatus from variouspower rails. Some of that is described in issues #2372, #2394, and
perhaps also #2398.
This does not actually close #2142, as that describes sequencing
failures due to timeouts waiting for
POWER_GOODon various rails,which I will want to implement subsequently.
Closes #2242.