feat: switch attestation tpm measurements#28
Conversation
| //SwitchAttestationResult per switch Tray. | ||
| message SwitchAttestationResult { | ||
| Result result = 1; | ||
| string artifact_path = 2; // Shared path to accesss attestation measurements. |
There was a problem hiding this comment.
This means that RMS holds logic of placing the result to destination path. where the path is pointing to ?
There was a problem hiding this comment.
I don't understand this either. It would mean we would need something like object storage that both services can access as prerequisite?
I think the results should just be made available via gRPC. Either directly in this API, or via separate APIs if we think the results are bigger.
There was a problem hiding this comment.
I should have discussed in the thread. switch has only self-signed certs currently and since TPM data is sensitive , is it okay to send over the wire ( in terms of payload they are much smaller 4K). So putting in the hostpath mount common between RMS and Carbide could be an option and artifact_path tries to achieve that but creates pod affinitiy dependecy on production sitecontroller (3 node k8s)
If sending over the wire is the right way then will change this.
There was a problem hiding this comment.
hostpath is absolutely no. We should avoid this as much as possible.
gRPC max msg legth is 4MB. can you paste sample data?
There was a problem hiding this comment.
TPM is highly sensitive. shared over DM
There was a problem hiding this comment.
the components have TLS and auth. RPC is fine. All the actual device credentials are also in the RPC Messages, and they are more critical
| //SwitchAttestationResult per switch Tray. | ||
| message SwitchAttestationResult { | ||
| Result result = 1; | ||
| string artifact_path = 2; // Shared path to accesss attestation measurements. |
There was a problem hiding this comment.
I don't understand this either. It would mean we would need something like object storage that both services can access as prerequisite?
I think the results should just be made available via gRPC. Either directly in this API, or via separate APIs if we think the results are bigger.
| * GetSwitchAttestation generates and copies TPM on a caller-supplied list of | ||
| * switch devices. Credentials to access the switches are passed from the client. | ||
| */ | ||
| rpc GetSwitchAttestation(GetSwitchAttestationRequest) returns (GetSwitchAttestationResponse) {} |
There was a problem hiding this comment.
The name sounds a bit off. The operation we are doing is "Attestation". But what we fetch as part of this operation are "Measurements". So its's probably more like GetSwitchAttestationMeasurements or PerformSwitchAttestation - but I think the first is better since RMS doesn't seem to do any attestation work besides forwarding measurements.
There was a problem hiding this comment.
+1 on GetSwitchAttestationMeasurements since its really not performing attestation or measurements - its simply retrieving the measurements from the switch
There was a problem hiding this comment.
I want to align it within CRUD naming but Perform is fine as well. will make the change.
| //GetSwitchAttestationRequest generates and gets the Switch TPM quotes.json on a set of switch devices. | ||
| message GetSwitchAttestationRequest { | ||
| Metadata metadata = 1; | ||
| NodeSet nodes = 2; // Switch devices to get attestation measurements |
There was a problem hiding this comment.
Depending on the size of the measurements (I don't know how many bytes these are) we might or might not support the operation for multiple nodes.
There was a problem hiding this comment.
I think the API should support multiple nodes. Let it be upto NICo to figure out if it wants to do single node or multiple nodes at a time.
There was a problem hiding this comment.
The api supports multiple nodes. IIRC dmitri mentioned they need TPM Attestation results per switch tray. we did generated tpm on a sample switch and its of 4K
4.0K /host/tpm/quotes.json
There was a problem hiding this comment.
Ok. With 4k we should be fine getting measurements for a full rack in a single API call (9 switches x 4k = 36k).
What this PR does:
Why this change is necessary:
What is the change:
How to test this change:
Relevant issues:
Reviewer Checklist: