Skip to content

How does Rust's provenance model interact with ARM MTE's hardware provenance? #603

@ais523

Description

@ais523

There have been some discussions in this repository about how Rust's provenance model interacts with CHERI, a hardware platform that tracks provenance.

There's another hardware platform that has a level of provenance tracking: ARM MTE (there's more technical information here). This one's interesting in that it has already been deployed to consumers (on certain mobile phones running Android) and thus is in active use at the moment (and because many parts of Android are written in Rust, there's therefore a backwards-compatibility situation to think about).

Unlike CHERI, which aims to guarantee that it catches all the memory-safety issues that are in scope, ARM MTE aims to probabilistically catch memory safety errors (in order to, e.g., make them more obvious in testing). Its provenance model works like this:

  • Each memory address has a "tag" (in ARM MTE terminology) that's equivalent to a provenance in Rust terminology. Only pointers with the appropriate provenance are allowed to access the address.
  • A pointer contains 56 address bits, 4 provenance bits, and 4 bits that are ignored by the hardware. Unlike on CHERI, the provenance part of a pointer isn't guarded against being modified maliciously – you can read and write it with the normal instructions for reading and writing memory.
  • When performing memory access, the hardware acts as-if it checks the pointer's provenance bits against a 4-bit hash of the memory's provenance. (Internally, the hardware only stores the hashes – the machine-code instruction to give memory a new random provenance works by generating a random 4-bit number and acting as though it's the provenance's hash.) As such, if you access memory using the wrong provenance, there is a 13 in 14 chance that it will trap and a 1 in 14 chance that the mistake will be uncaught (0x0 and 0xF aren't used as hash values).

For backwards compatibility with existing programs, ARM MTE is by default configured to pretend that the provenance bits are actually address bits (and although I don't know for certain, I think this is how ARM MTE versions of Rust currently view them). This works for heap memory because new heap provenances are created by the memory allocator, so if the allocator allocates some memory and discovers that (e.g.) its provenance hashes to 4, it can pretend that the memory it's allocating is actually stored at an address which naturally has a 4 in the relevant part of the pointer, and so the programs don't notice that anything has changed. This view does, however, mean that "provenance narrowing" on a reborrow is not something that's visible to the hardware, and thus a program written under this view will never trap if a pointer/reference that's only supposed to be able to access one part of an allocation is used to access a different part (meaning that the hardware is not, even probabilistically, enforcing the entire memory model). Under this view, everything except the heap is given a single provenance, so only mistakes involving heap memory are caught.

The ARM MTE documentation also suggests that compilers might want to use ARM MTE tags for stack memory, by giving a different tag for each variable on the stack (i.e. effectively meaning that each stack allocation has its own provenance). Unlike the previous case, this would be visible to existing programs, in that if the provenance bits are viewed as address bits, stack frames would no longer be contiguous in memory (e.g. if stack frame 1 contains variables a and b, and stack frame 2 contains a variable c, it's possible for c's address to be between that of a and b). In Rust's abstract machine, this difference is observable, but I don't think it's currently relevant – I don't think there's currently a guarantee that stack frames are contiguous, and the vast majority of programs don't care. (Programs/libraries that do stack unwinding might care, and would probably need special code for the platform.)

I guess there are two main questions that this platform raises:

  1. Are we currently making (or might we want to make in the future) any guarantees that contradict the way that ARM MTE works, and/or might we want to make such guarantees in the future?
  2. How should strict provenance APIs and exposed provenance APIs interpret the provenance bits? Both "interpret them as address bits" and "interpret them as provenance bits" seem like possible models (even exposed provenance can be implemented under the latter because you can ask the processor what the correct provenance for a piece of memory is). The former is likely to be more efficient, but has a number of conceptual downsides (although I'm not sure that any of them would cause problems in practice).

One thing that can probably be ruled out: retagging a reference in Rust effectively creates a new provenance, and in theory it would be nice to track that in the hardware in a BorrowSanitizer sort of way, but too many things go wrong when you try. The problem is that Rust has a stack of provenances with which you can access any memory address (Stacked Borrows is named after its own version of this, but a similar sort of stack turns up in every other provenance model I've experimented with and I think there are theoretical reasons for that), whereas ARM MTE only allows one hardware provenance for any piece of memory at a time. Conceptually, a &mut reference gives the memory a new provenance requirement as it's created and then returns to the original provenance requirement when it's dropped, but safe code allows leaking/forgetting a &mut reference and thus doing that would break safe code. Additionally, if the provenance of memory can change while it's allocated, from_exposed_provenance wouldn't be able to work out the correct provenance to use (but I think CHERI also has that problem).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions