Reset more vcpu state on snapshot::restore#1120
Reset more vcpu state on snapshot::restore#1120ludfjig wants to merge 1 commit intohyperlight-dev:mainfrom
Conversation
9cfcc9c to
8507c7a
Compare
d09b1fc to
18d4ff9
Compare
a5d8efa to
54bc12a
Compare
23c4c51 to
b4dace7
Compare
There was a problem hiding this comment.
This looks pretty good to me! It's probably a good bit of extra overhead, but perhaps that's unavoidable.
One alternate option would be to require every guest to have some code somewhere that does some/all of this reset work inside the VM, validating that the correct code is in the correct place whenever we make a snapshot. We have talked about doing something similar for TLB flushes as well. That would let you batch most of this into a single hypercall, although the hypercall itself would be a bit more expensive. Do you have any sense of whether that would make sense to investigate?
2b0b5f9
e546c3d to
1092dd1
Compare
| let _ = (cr3, sregs); // suppress unused warnings | ||
| // TODO: This is probably not correct. | ||
| // Let's deal with it when we clean up the init-paging feature | ||
| self.vm | ||
| .set_sregs(&CommonSpecialRegisters::standard_real_mode_defaults())?; |
There was a problem hiding this comment.
reset_vcpu() ignores the provided cr3/sregs when init-paging is disabled and unconditionally sets real-mode defaults (with an explicit TODO saying it's probably incorrect). This is a functional regression for non-init-paging builds: restoring a snapshot would reset the vCPU into a mode/state that may not match how the VM is actually configured. Consider either implementing the correct restore logic for the non-init-paging configuration (using the passed-in sregs/cr3), or gating the new reset behavior behind init-paging so behavior doesn’t silently change in that build mode.
| let _ = (cr3, sregs); // suppress unused warnings | |
| // TODO: This is probably not correct. | |
| // Let's deal with it when we clean up the init-paging feature | |
| self.vm | |
| .set_sregs(&CommonSpecialRegisters::standard_real_mode_defaults())?; | |
| // In non-init-paging builds, rely on the caller-provided snapshot | |
| // state (including CR3) rather than forcing real-mode defaults. | |
| self.vm.set_sregs(sregs)?; |
| let sregs = snapshot.sregs().ok_or_else(|| { | ||
| HyperlightError::Error("snapshot from running sandbox should have sregs".to_string()) | ||
| })?; |
There was a problem hiding this comment.
The restore path returns HyperlightError::Error("snapshot from running sandbox should have sregs"...) when snapshot.sregs() is None. Using the generic stringly-typed error makes it hard for callers/tests to match on the failure mode and isn’t very actionable. Consider introducing a dedicated error variant (e.g. SnapshotMissingSregs) or reusing an existing structured error so this case can be handled explicitly.
| let sregs = snapshot.sregs().ok_or_else(|| { | |
| HyperlightError::Error("snapshot from running sandbox should have sregs".to_string()) | |
| })?; | |
| let sregs = snapshot | |
| .sregs() | |
| .ok_or(HyperlightError::SnapshotMissingSregs)?; |
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
MSRs will be added in another PR.
into()implementation for kvm/mshv due to single memcyp, but that seems like premature optimization to meAddresses #791 partially
------ After rebase ------