Skip to content

feat: switch attestation tpm measurements#28

Open
narasimhan321 wants to merge 1 commit into
mainfrom
nv/switch-attestation-measurements
Open

feat: switch attestation tpm measurements#28
narasimhan321 wants to merge 1 commit into
mainfrom
nv/switch-attestation-measurements

Conversation

@narasimhan321
Copy link
Copy Markdown
Collaborator

What this PR does:

  • proto changes to support tpm generation on per switch

Why this change is necessary:

  • Security ask for attestation of the switch TPM measurements
  • pre-requisite to schedule workload by csp.

What is the change:

  • part of grpc api switch_security_handlers.rs

How to test this change:

  • Unit tests
  • Refer Attachment

Relevant issues:

Reviewer Checklist:

  • Code is well-documented
  • Tests are adequate
  • Follows coding standards
  • Functionality works as expected

Comment thread proto/rack_manager.proto
//SwitchAttestationResult per switch Tray.
message SwitchAttestationResult {
Result result = 1;
string artifact_path = 2; // Shared path to accesss attestation measurements.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that RMS holds logic of placing the result to destination path. where the path is pointing to ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this either. It would mean we would need something like object storage that both services can access as prerequisite?

I think the results should just be made available via gRPC. Either directly in this API, or via separate APIs if we think the results are bigger.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have discussed in the thread. switch has only self-signed certs currently and since TPM data is sensitive , is it okay to send over the wire ( in terms of payload they are much smaller 4K). So putting in the hostpath mount common between RMS and Carbide could be an option and artifact_path tries to achieve that but creates pod affinitiy dependecy on production sitecontroller (3 node k8s)

If sending over the wire is the right way then will change this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hostpath is absolutely no. We should avoid this as much as possible.

gRPC max msg legth is 4MB. can you paste sample data?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TPM is highly sensitive. shared over DM

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the components have TLS and auth. RPC is fine. All the actual device credentials are also in the RPC Messages, and they are more critical

Comment thread proto/rack_manager.proto
//SwitchAttestationResult per switch Tray.
message SwitchAttestationResult {
Result result = 1;
string artifact_path = 2; // Shared path to accesss attestation measurements.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this either. It would mean we would need something like object storage that both services can access as prerequisite?

I think the results should just be made available via gRPC. Either directly in this API, or via separate APIs if we think the results are bigger.

Comment thread proto/rack_manager.proto
* GetSwitchAttestation generates and copies TPM on a caller-supplied list of
* switch devices. Credentials to access the switches are passed from the client.
*/
rpc GetSwitchAttestation(GetSwitchAttestationRequest) returns (GetSwitchAttestationResponse) {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name sounds a bit off. The operation we are doing is "Attestation". But what we fetch as part of this operation are "Measurements". So its's probably more like GetSwitchAttestationMeasurements or PerformSwitchAttestation - but I think the first is better since RMS doesn't seem to do any attestation work besides forwarding measurements.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on GetSwitchAttestationMeasurements since its really not performing attestation or measurements - its simply retrieving the measurements from the switch

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to align it within CRUD naming but Perform is fine as well. will make the change.

Comment thread proto/rack_manager.proto
//GetSwitchAttestationRequest generates and gets the Switch TPM quotes.json on a set of switch devices.
message GetSwitchAttestationRequest {
Metadata metadata = 1;
NodeSet nodes = 2; // Switch devices to get attestation measurements
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the size of the measurements (I don't know how many bytes these are) we might or might not support the operation for multiple nodes.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API should support multiple nodes. Let it be upto NICo to figure out if it wants to do single node or multiple nodes at a time.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The api supports multiple nodes. IIRC dmitri mentioned they need TPM Attestation results per switch tray. we did generated tpm on a sample switch and its of 4K
4.0K /host/tpm/quotes.json

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. With 4k we should be fine getting measurements for a full rack in a single API call (9 switches x 4k = 36k).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants