Skip to content

runtime-rs: Port MSFT changes from msft-main in runtime-go#429

Open
Redent0r wants to merge 9 commits intomsft-previewfrom
saul/runtime-rs
Open

runtime-rs: Port MSFT changes from msft-main in runtime-go#429
Redent0r wants to merge 9 commits intomsft-previewfrom
saul/runtime-rs

Conversation

@Redent0r
Copy link
Copy Markdown

@Redent0r Redent0r commented Mar 19, 2026

  • Build and install runtime-rs
  • Port over fork specific changes from runtime-go to runtime-rs
  • Add unit test coverage for static resource management changes
  • Temporarily make runtime-rs the default runtime for testing (to remove before merging)

[outdated]
CI: https://dev.azure.com/mariner-org/mariner/_build/results?buildId=1077778&view=results
bb: https://dev.azure.com/mariner-org/mariner/_build/results?buildId=1077783&view=results
image build: https://dev.azure.com/mariner-org/mariner/_build/results?buildId=1077801&view=results
test: https://dev.azure.com/mariner-org/mariner/_build/results?buildId=1077871&view=results

@Redent0r
Copy link
Copy Markdown
Author

Redent0r commented Mar 19, 2026

todo: changes we'll need to bring into runtime-rs (or verify they don't apply) from runtime-go to be on par:

@Redent0r Redent0r force-pushed the saul/runtime-rs branch 8 times, most recently from 5792f42 to 7e70f3f Compare March 25, 2026 15:53
@Redent0r Redent0r changed the title node-builder: switch to runtime-rs node-builder: also build runtime-rs Mar 25, 2026
@Redent0r Redent0r changed the title node-builder: also build runtime-rs node-builder: also build and install runtime-rs Mar 25, 2026
@Redent0r
Copy link
Copy Markdown
Author

Redent0r commented Mar 25, 2026

we'll also need changes to the kata-ci pipeline to also vendor runtime-rs dependencies https://dev.azure.com/mariner-org/mariner/_git/CBL-Mariner-Pipelines/pullrequest/26534

@Redent0r Redent0r force-pushed the saul/runtime-rs branch 4 times, most recently from 23dd506 to 84dc50c Compare March 25, 2026 22:56
@sprt sprt changed the title node-builder: also build and install runtime-rs runtime-rs: Port MSFT changes from msft-main in runtime-go Mar 27, 2026
build runtime-rs binary and debug config

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Install both runtime-rs and runtime-go configs and binaries side by side:
  - runtime-go:
      /usr/local/bin/containerd-shim-kata-v2-go
      /usr/local/share/defaults/kata-containers/configuration-clh.toml
      /usr/local/share/defaults/kata-containers/configuration-clh-debug.toml

  - runtime-rs:
      /usr/local/bin/containerd-shim-kata-v2-rs
      /usr/local/share/defaults/kata-containers/configuration-cloud-hypervisor.toml
      /usr/local/share/defaults/kata-containers/configuration-cloud-hypervisor-debug.toml

Also add USE_RUNTIME_RS variable and default to "no". This controls which runtime binary and configuration will be installed
to /usr/local/bin/containerd-shim-kata-v2 and /usr/local/share/defaults/kata-containers/configuration.toml respectively.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Copilot AI review requested due to automatic review settings March 30, 2026 15:42
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports Microsoft-specific behavior from runtime-go into runtime-rs, and updates the Azure Linux node-builder packaging flow to build/install runtime-go and runtime-rs artifacts side-by-side with a selectable default.

Changes:

  • Build and install runtime-go + runtime-rs shim binaries/configs side-by-side, selecting the default via USE_RUNTIME_RS.
  • Add static sandbox resource management defaults (workload mem/vCPU) and a minimum pod memory enforcement knob to runtime-rs.
  • Relax kata-types Cloud Hypervisor memory defaults/validation to allow “0 base” sizing flows.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tools/osbuilder/node-builder/azure-linux/package_install.sh Installs runtime-go/runtime-rs shims and configs side-by-side; selects a default shim/config.
tools/osbuilder/node-builder/azure-linux/package_build.sh Builds both runtime implementations and generates debug configs for both.
tools/osbuilder/node-builder/azure-linux/common.sh Adds per-runtime config filenames/dirs and selects default config name based on USE_RUNTIME_RS.
tools/osbuilder/node-builder/azure-linux/Makefile Sets USE_RUNTIME_RS=yes by default for Azure Linux builds.
src/runtime-rs/crates/runtimes/src/manager.rs Adds minimum workload memory enforcement during sandbox init.
src/runtime-rs/crates/resource/src/network/network_info/network_info_from_link.rs Filters neighbor entries, allowing static + default-gateway neighbors.
src/runtime-rs/crates/resource/src/cpu_mem/initial_size.rs Applies static-resource defaults and exposes workload memory getter.
src/runtime-rs/config/configuration-cloud-hypervisor.toml.in Adds new static-resource default knobs + min workload memory setting.
src/runtime-rs/Makefile Adds/exports new config substitution variables; makes DEFVCPUS/DEFMEMSZ overridable.
src/libs/kata-types/src/config/runtime.rs Adds runtime config fields for static defaults + min workload memory.
src/libs/kata-types/src/config/hypervisor/mod.rs Removes validation requiring non-zero default_memory.
src/libs/kata-types/src/config/hypervisor/ch.rs Stops force-setting CH default_memory when unset.
src/libs/kata-types/src/config/default.rs Updates default CPU/memory constraints (e.g., allow 0 min for CH memory).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 30, 2026 16:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

hv.memory_info.default_memory += self.resource.mem_mb;
}

if self.resource.vcpu > 0.0 {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block will probably generate discussion. This commit is trying to port over kata-containers@9af9844 . After doing that, I realized that, unlike the memory, the static CPU set in the config wasn't getting past to the guest VM as expected.

It looks like runtime-rs is missing https://github.com/fidencio/kata-containers/blob/e2476f587c472d5d217df9c75cdb80193dd85994/src/runtime/pkg/oci/utils.go#L1232 and we may be able to upstream that

This is a port from b03db3e into runtime-rs

Bug: https://microsoft.visualstudio.com/OS/_workitems/edit/43668151

Rationale: This is a temporary solution for optimizing memory usage for
the current mechanism of requesting resources through pod Limit
annotations:
- if no Limits are specified and hence WorkloadMemMB is 0, set a default
  value 'StaticWorkloadDefaultMem' to allocate a default amount of
  memory for use for containers in the sandbox in addition to the base
  memory
- if Limits are specified, the base memory and the sum of Limits are
  allocated. The end user needs to be aware of the minimum memory
  requirements for their pods, otherwise the pod will be stuck in the
  ContainerCreating state

Testing: Manual testing, creating pods with Limits and without limits,
and with two containers where each container has a limit, tested with
integration in a SPEC file where the config variables were set via
environment variables via the make command

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This is a port from 7ddec33 into runtime-rs

After these changes:

1. The value of the K8s runtime class memory overhead:
   - Covers the memory usage from all the Host-side components (mainly
     the Kata Shim and the VMM).
   - Doesn't include the memory usage from any Guest-side components.

2. The value of a pod memory limit specified by the user:
   - Is equal to the memory size of the Pod VM.
   - Includes the memory usage from all the Guest-side components
     (mainly user's workload, the Guest kernel, and the Kata Agent)
   - Doesn't include the memory usage from any Host-side components.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This is a port from kata-containers@9af9844

Plus ports an existing behaviour from runtime-go to also add the vcpus https://github.com/fidencio/kata-containers/blob/e2476f587c472d5d217df9c75cdb80193dd85994/src/runtime/pkg/oci/utils.go#L1232

- similar to the static_sandbox_default_workload_mem option,
  assign a default number of vcpus to the VM when no limits
  are given, 1 vcpu in this case
- similar to commit c7b8ee9, do not allocate additional vcpus when limits are provided

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
If using static management and:
- initial size manager uses 0 for CPU or memory, we add default static values to the hv config
- if initial size manager uses non-zero values for CPU or memory, we add those values to the hv config

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Copilot AI review requested due to automatic review settings March 30, 2026 22:04
@Redent0r Redent0r requested a review from sprt March 30, 2026 22:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

This is a port from c06b470 into runtime-rs

For our Kata UVM, we know we need at least 128MB of memory to prevent instability in the guest.

Enforce this constraint with a descriptive error to prevent users from destabilizing the UVM with faulty k8s configurations.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This is a port from a136359 into runtime-rs

This change mirrors host networking into the guest as before, but now also
includes the default gateway neighbor entry for each interface.

Pods using overlay/synthetic gateways (e.g., 169.254.1.1) can hit a
first-connect race while the guest performs the initial ARP. Preseeding the
gateway neighbor removes that latency and makes early connections (e.g.,
to the API Service) deterministic.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
so we can test end to end. To remove this commit before merging.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
@Redent0r Redent0r requested a review from danmihai1 March 30, 2026 22:17
@Redent0r Redent0r marked this pull request as ready for review March 30, 2026 22:18
Copilot AI review requested due to automatic review settings March 30, 2026 22:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/libs/kata-types/src/config/hypervisor/mod.rs:1045

  • MemoryInfo::validate() no longer rejects default_memory == 0. Since runtime-rs’ Cloud Hypervisor conversion errors on default_memory == 0 (see src/runtime-rs/crates/hypervisor/ch-config/src/convert.rs:236-238), this means invalid configs can now pass validation and only fail later at VM creation time. If 0 is meant to be a supported “auto-size” value, consider updating the CH conversion to handle it (or gate this relaxation to non-CH hypervisors); otherwise, restoring the default_memory != 0 validation would keep failures earlier and clearer.
    /// This ensures that critical memory parameters like `memory_slots` are
    /// non-zero, and checks the validity of
    /// the memory backend file path.
    pub fn validate(&self) -> Result<()> {
        validate_path!(
            self.file_mem_backend,
            "Memory backend file {} is invalid: {}"
        )?;
        if self.memory_slots == 0 {
            return Err(std::io::Error::other(
                "Configured memory slots for guest VM are zero",
            ));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pub const MAX_CH_PCI_BRIDGES: u32 = 5;
pub const MAX_CH_VCPUS: u32 = 256;
pub const MIN_CH_MEMORY_SIZE_MB: u32 = 64;
pub const MIN_CH_MEMORY_SIZE_MB: u32 = 0;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting MIN_CH_MEMORY_SIZE_MB to 0 effectively disables the Cloud Hypervisor “minimum guest memory” validation (since default_memory < MIN_CH_MEMORY_SIZE_MB can never be true), allowing configs with default_memory == 0 to pass plugin validation. runtime-rs’ Cloud Hypervisor conversion rejects default_memory == 0 (MemoryConfigError::NoDefaultMemory in src/runtime-rs/crates/hypervisor/ch-config/src/convert.rs), so this can push invalid configs deeper into runtime. Consider keeping a non-zero minimum here, or ensure an earlier stage always sets default_memory to a valid value before CH conversion.

Suggested change
pub const MIN_CH_MEMORY_SIZE_MB: u32 = 0;
pub const MIN_CH_MEMORY_SIZE_MB: u32 = 128;

Copilot uses AI. Check for mistakes.
@@ -79,9 +79,6 @@ impl ConfigPlugin for CloudHypervisorConfig {
ch.machine_info.entropy_source = default::DEFAULT_CH_ENTROPY_SOURCE.to_string();
}

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CloudHypervisorConfig::adjust_config() no longer fills in a default when memory_info.default_memory == 0. Combined with the relaxed MemoryInfo::validate() and MIN_CH_MEMORY_SIZE_MB == 0, it becomes easy for CH configs to flow through with default_memory == 0 and then fail later during runtime-rs CH config conversion (MemoryConfigError::NoDefaultMemory). If 0 is intended to mean “defer memory sizing to InitialSizeManager”, consider adding an explicit comment + ensuring the sizing path always runs before any CH conversion; otherwise, restoring the defaulting here would prevent runtime failures for configs that omit default_memory.

Suggested change
// Fill in a default if not specified, to avoid propagating an unusable 0 value.
if ch.memory_info.default_memory == 0 {
ch.memory_info.default_memory = MIN_CH_MEMORY_SIZE_MB;
}

Copilot uses AI. Check for mistakes.
BUILD_TYPE := release

export SHIM_REDEPLOY_CONFIG := yes
export USE_RUNTIME_RS := yes
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Makefile forces USE_RUNTIME_RS := yes, making runtime-rs the default for any builds that include this file. The PR description calls this out as temporary “for testing (to remove before merging)”, but it’s easy to forget and accidentally ship this as the new default. Consider removing this export before merging, or gating it behind an explicit make target / environment override so default builds remain unchanged.

Suggested change
export USE_RUNTIME_RS := yes

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants