Skip to content

Reset more vcpu state on snapshot::restore#1120

Open
ludfjig wants to merge 1 commit intohyperlight-dev:mainfrom
ludfjig:reset_vcpu
Open

Reset more vcpu state on snapshot::restore#1120
ludfjig wants to merge 1 commit intohyperlight-dev:mainfrom
ludfjig:reset_vcpu

Conversation

@ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Dec 16, 2025

MSRs will be added in another PR.

  • Removed padding fields from fpu register to simplify comparisons. Not sure why I added them to begin with, we don't need C repr on these. It could possibly allow more efficient into() implementation for kvm/mshv due to single memcyp, but that seems like premature optimization to me
  • Reset vcpu state on snapshot::restore
  • Added bunch of tests
  • MSRs are not included in this PR. Will add MSRs in future PR

Addresses #791 partially

------ After rebase ------

  • Added field onto Snapshot which is SpecialRegisters. These are saved on snapshot() and restored when calling restore(). This is since we can no longer just reset them to default since the guest initializes them

@ludfjig ludfjig added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Dec 16, 2025
@ludfjig ludfjig force-pushed the reset_vcpu branch 3 times, most recently from 9cfcc9c to 8507c7a Compare December 16, 2025 19:52
@ludfjig ludfjig force-pushed the reset_vcpu branch 11 times, most recently from d09b1fc to 18d4ff9 Compare December 17, 2025 06:46
@ludfjig ludfjig requested a review from Copilot December 17, 2025 06:53

This comment was marked as outdated.

@ludfjig ludfjig marked this pull request as ready for review December 17, 2025 07:26
@ludfjig ludfjig force-pushed the reset_vcpu branch 2 times, most recently from a5d8efa to 54bc12a Compare February 2, 2026 17:55
jsturtevant
jsturtevant previously approved these changes Feb 2, 2026
jsturtevant
jsturtevant previously approved these changes Feb 3, 2026
Copy link
Member

@syntactically syntactically left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me! It's probably a good bit of extra overhead, but perhaps that's unavoidable.

One alternate option would be to require every guest to have some code somewhere that does some/all of this reset work inside the VM, validating that the correct code is in the correct place whenever we make a snapshot. We have talked about doing something similar for TLB flushes as well. That would let you batch most of this into a single hypercall, although the hypercall itself would be a bit more expensive. Do you have any sense of whether that would make sense to investigate?

syntactically
syntactically previously approved these changes Feb 4, 2026
Copy link
Member

@syntactically syntactically left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed one other comment.

@ludfjig ludfjig dismissed stale reviews from syntactically and jsturtevant via 2b0b5f9 February 4, 2026 19:56
@ludfjig ludfjig force-pushed the reset_vcpu branch 7 times, most recently from e546c3d to 1092dd1 Compare February 6, 2026 18:42
@ludfjig ludfjig requested a review from Copilot February 6, 2026 18:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 2 comments.

Comment on lines +948 to +940
let _ = (cr3, sregs); // suppress unused warnings
// TODO: This is probably not correct.
// Let's deal with it when we clean up the init-paging feature
self.vm
.set_sregs(&CommonSpecialRegisters::standard_real_mode_defaults())?;
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset_vcpu() ignores the provided cr3/sregs when init-paging is disabled and unconditionally sets real-mode defaults (with an explicit TODO saying it's probably incorrect). This is a functional regression for non-init-paging builds: restoring a snapshot would reset the vCPU into a mode/state that may not match how the VM is actually configured. Consider either implementing the correct restore logic for the non-init-paging configuration (using the passed-in sregs/cr3), or gating the new reset behavior behind init-paging so behavior doesn’t silently change in that build mode.

Suggested change
let _ = (cr3, sregs); // suppress unused warnings
// TODO: This is probably not correct.
// Let's deal with it when we clean up the init-paging feature
self.vm
.set_sregs(&CommonSpecialRegisters::standard_real_mode_defaults())?;
// In non-init-paging builds, rely on the caller-provided snapshot
// state (including CR3) rather than forcing real-mode defaults.
self.vm.set_sregs(sregs)?;

Copilot uses AI. Check for mistakes.
Comment on lines +309 to +311
let sregs = snapshot.sregs().ok_or_else(|| {
HyperlightError::Error("snapshot from running sandbox should have sregs".to_string())
})?;
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restore path returns HyperlightError::Error("snapshot from running sandbox should have sregs"...) when snapshot.sregs() is None. Using the generic stringly-typed error makes it hard for callers/tests to match on the failure mode and isn’t very actionable. Consider introducing a dedicated error variant (e.g. SnapshotMissingSregs) or reusing an existing structured error so this case can be handled explicitly.

Suggested change
let sregs = snapshot.sregs().ok_or_else(|| {
HyperlightError::Error("snapshot from running sandbox should have sregs".to_string())
})?;
let sregs = snapshot
.sregs()
.ok_or(HyperlightError::SnapshotMissingSregs)?;

Copilot uses AI. Check for mistakes.
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants