Skip to content

microcbor: add #[ereport(...)] attribute#2397

Merged
hawkw merged 4 commits intomasterfrom
eliza/derive-ereport-for-real
Feb 26, 2026
Merged

microcbor: add #[ereport(...)] attribute#2397
hawkw merged 4 commits intomasterfrom
eliza/derive-ereport-for-real

Conversation

@hawkw
Copy link
Member

@hawkw hawkw commented Feb 23, 2026

This commit adds a new attribute to microcbor, intended to make
defining ereport types more convenient. Presently, we tend to define
ereports by using an enum to represent all possible ereport classes that
a task may report, with microcbor's #[cbor(variant_id = "...")]
attribute on the enum definition and #[cbor(rename = "...")] on the
variants. Then, we define a struct which contains the class enum along
with a version, and either an enum or generic field to represent the
ereport body. For an example of this usage, consider the cosmo_seq
task:

#[derive(microcbor::Encode)]
pub enum EreportClass {
#[cbor(rename = "hw.pwr.pmbus.alert")]
PmbusAlert,
#[cbor(rename = "hw.pwr.bmr491.mitfail")]
Bmr491MitigationFailure,
}
#[derive(microcbor::EncodeFields)]
pub(crate) enum EreportKind {
Bmr491MitigationFailure {
refdes: FixedStr<'static, { crate::i2c_config::MAX_COMPONENT_ID_LEN }>,
failures: u32,
last_cause: drv_i2c_devices::bmr491::MitigationFailureKind,
succeeded: bool,
},
PmbusAlert {
refdes: FixedStr<'static, { crate::i2c_config::MAX_COMPONENT_ID_LEN }>,
rail: vcore::Rail,
time: u64,
pwr_good: Option<bool>,
pmbus_status: PmbusStatus,
},
}

This pattern has some disadvantages. In particular, it makes it very
difficult for multiple tasks to share the definitions of some ereport
types in a shared crate, which is useful in some situations. In
particular, as we add ereports for sequencer events (see #2242), we
would like to be able to share some ereport messages between the Cosmo
and Gimlet sequencer tasks (and perhaps also the Tofino and PSC
sequencers, in some cases). The current pattern makes this difficult.
While we could use #[derive(EncodeFields)] to define common types and
then embed them in an enum of "all ereport types in this task" in a task
crate, the definition of the ereport message's class and version would
be in the task rather than where the message is defined, meaning they
are duplicated. This would be sad: since the class is important to how
upstack software interprets the ereport, ensuring that both tasks emit
the same class and version fields is a big chunk of why we would even
want shared definitions.

Also, the enum-based approach has some other disadvantages. When we
define separate enums for the class and for message bodies, it is
possible to accidentally use the wrong class for a given message body
--- nothing ensures that these match. And, using enums for everything
means that the size of the message that has to be constructed on the
stack is the size of the largest variant, which makes stack usage
worse when a particular code path always reports a smaller variant.

This branch introduces a new API for defining ereport types as a
struct for each individual class of ereport message. This is done
using a new attribute which can be added to types that
#[derive(microcbor_derive::Encode)]. The new attribute,
#[ereport(...)], takes class = "a sting literal" and
version = <an int literal>" arguments, and, if present, changes the
generated Encode implementation to output the "k" = <class> and
"v" = <version> pairs when encoding the type. The maximum CBOR length
value is also adjusted to include the length of the additional K/V
pairs. Theusage of the new attribute is discussed in greater detail in
the RustDoc.

Now, we can define individual ereport messages as their own top-level
Rust types, and those types will always be serialized with the correct
class and version values. Multiple tasks can share these types, and can
still use the automatic buffer size calculation by passing multiple
types
to the microcbor::max_cbor_len_for! macro, which is how that
API was really intended to be used in the first place.

For example, we might imagine something like:

#[derive(Encode)]
#[ereport(class = "hw.discovery.ae35.fault", version = 0)]
struct Ae35UnitEreport {
    critical_in_hrs: u32,
    detected_by: fixedstr::FixedStr<'static, 8>,
}

#[derive(Encode)]
#[ereport(class = "hw.apollo.undervolt", version = 13)]
#[cbor(variant_id = "bus")]
enum UndervoltEreport {
    MainBusA { volts: f32 },
    MainBusB { volts: f32 }, // "Houston, we've got a main bus B undervolt!"
}

use some_other_crate_that_defines_ereports;

const EREPORT_BUF_SIZE: usize = microcbor::max_cbor_len_for![
     Ae32UnitEreport,
     UndervoltEreport,
    some_other_crate_that_defines_ereports::SomeOtherEreport,
];

and that will all just work.

As an aside, I did consider the fact that this could be an API to
add any arbitrary compile-time fields when encoding. I decided not to
do that, as the goal here was specifically to help with ereports, and I
felt like there was some value in having the attribute also enforce the
names and types of the conventional ereport fields. That way, you are
expressing the intent to say that "this is an ereport message", and the
proc-macro ensures you have included the requisite fields and that they
have the requisite types. We may consider adding a general-purpose
"additional fields with compile time values" attribute in the future if
such a thing seems useful, and if we do, the #[ereport(...)]
attribute could be reimplemented using that internally.

This commit adds a new attribute to `microcbor`, intended to make
defining ereport types more convenient. Presently, we tend to define
ereports by using an enum to represent all possible ereport classes that
a task may report, with `microcbor`'s `#[cbor(variant_id = "...")]`
attribute on the enum definition and `#[cbor(rename = "...")]` on the
variants. Then, we define a struct which contains the class enum along
with a version, and either an enum or generic field to represent the
ereport body. For an example of this usage, consider the `cosmo_seq`
task:

https://github.com/oxidecomputer/hubris/blob/aa843e7b937e5a7d8bb21298919440689657ee29/drv/cosmo-seq-server/src/main.rs#L458-L481

This pattern has some disadvantages. In particular, it makes it very
difficult for multiple tasks to share the definitions of some ereport
types in a shared crate, which is useful in some situations. In
particular, as we add ereports for sequencer events (see #2242), we
would like to be able to share some ereport messages between the Cosmo
and Gimlet sequencer tasks (and perhaps also the Tofino and PSC
sequencers, in some cases). The current pattern makes this difficult.
While we could use `#[derive(EncodeFields)]` to define common types and
then embed them in an enum of "all ereport types in this task" in a task
crate, the definition of the ereport message's class and version would
be in the task rather than where the message is defined, meaning they
are duplicated. This would be sad: since the class is important to how
upstack software interprets the ereport, ensuring that both tasks emit
the same class and version fields is a big chunk of why we would even
want shared definitions.

Also, the enum-based approach has some other disadvantages. When we
define separate enums for the class and for message bodies, it is
possible to accidentally use the wrong class for a given message body
--- nothing ensures that these match. And, using enums for everything
means that the size of the message that has to be constructed on the
stack is the size of the _largest variant_, which makes stack usage
worse when a particular code path always reports a smaller variant.

This branch introduces a new API for defining ereport types as a
`struct` for each individual class of ereport message. This is done
using a new attribute which can be added to types that
`#[derive(microcbor_derive::Encode)]`. The new attribute,
`#[ereport(...)]`, takes `class = "a sting literal"` and `version = <an
int literal>"` arguments, and, if present, changes the generated
`Encode` implementation to output the `"k" = <class>` and `"v" =
<version>` pairs when encoding the type. The maximum CBOR length value
is also adjusted to include the length of the additional K/V pairs. The
usage of the new attribute is discussed in greater detail in the
RustDoc.

Now, we can define individual ereport messages as their own top-level
Rust types, and those types will always be serialized with the correct
class and version values. Multiple tasks can share these types, and can
still use the automatic buffer size calculation by passing _multiple
types_ to the `microcbor::max_cbor_len_for!` macro, which is how that
API was really intended to be used in the first place.

For example, we might imagine something like:

```rust
#[derive(Encode)]
#[ereport(class = "hw.discovery.ae35.fault", version = 0)]
struct Ae35UnitEreport {
critical_in_hrs: u32,
detected_by: fixedstr::FixedStr<'static, 8>,
}

#[derive(Encode)]
#[ereport(class = "hw.apollo.undervolt", version = 13)]
#[cbor(variant_id = "bus")]
enum UndervoltEreport {
MainBusA { volts: f32 },
MainBusB { volts: f32 }, // "Houston, we've got a main bus B undervolt!"
}

use some_other_crate_that_defines_ereports;

const EREPORT_BUF_SIZE: usize = microcbor::max_cbor_len_for![
Ae32UnitEreport,
UndervoltEreport,
some_other_crate_that_defines_ereports::SomeOtherEreport,
];
```

 and that will all just work.

 As an aside, I *did* consider the fact that this *could* be an API to
 add any arbitrary compile-time fields when encoding. I decided *not* to
 do that, as the goal here was specifically to help with ereports, and I
 felt like there was some value in having the attribute also enforce the
 names and types of the conventional ereport fields. That way, you are
 expressing the intent to say that "this is an ereport message", and the
 proc-macro ensures you have included the requisite fields and that they
 have the requisite types. We may consider adding a general-purpose
 "additional fields with compile time values" attribute in the future if
 such a thing seems useful, and if we do, the `#[ereport(...)]`
 attribute could be reimplemented using that internally.
@hawkw hawkw added the fault-management Everything related to the Oxide's Fault Management architecture implementation label Feb 23, 2026
@hawkw hawkw requested a review from cbiffle February 23, 2026 21:11
@hawkw
Copy link
Member Author

hawkw commented Feb 23, 2026

This build failure occurred in the "fetch humility" step and is therefore presumably not my fault. As an aside, apparently there isn't a down arrow that lets me see the output from that step, which isn't something I've ever seen GitHub actions do before?
image

@hawkw
Copy link
Member Author

hawkw commented Feb 24, 2026

For a worked example of defining shared ereport types and using them in multiple crates, see this commit: 18cd847

///
/// ## Struct Type Definition Attributes
///
/// The following attributes are may be placed on the *definition* of a struct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity... You're emphasizing definition here, I assume this is to contrast with something... what?

(As a recovered C programmer I interpreted this as distinguishing from the declaration but we don't have those.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I meant to distinguish between the top-level struct definition, rather than on fields within that definition. So:

#[cbor(...)] // <-------- here
struct MyStruct {
    #[cbor(...)] // <---- not here
    foo: Bar,
}

Upon re-reading the comment I can see how that doesn't really come across as obviously as I had hoped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you feel about 92f9315 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine -- I'm not even sure this needed clarification, I was just curious what I was missing. Thanks!

@hawkw hawkw requested a review from cbiffle February 26, 2026 22:41
@hawkw hawkw merged commit 90a2bf0 into master Feb 26, 2026
174 checks passed
@hawkw hawkw deleted the eliza/derive-ereport-for-real branch February 26, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fault-management Everything related to the Oxide's Fault Management architecture implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants