Skip to content

Conversation

@PhilippMatthes
Copy link
Member

@PhilippMatthes PhilippMatthes commented Dec 29, 2025

In this pull request we implemented a cortex filtering pipeline for KVM. This pipeline uses the hypervisor CRD as single source of truth to find out on which hypervisors a vm can be scheduled. To complete this implementation, we extended the hypervisor crd in this pull request. The hypervisor crd pull request added additional fields and removed outdated ones, which need to be autodiscovered in the kvm node agent. The following fields are now populated:

Support filtering based on hypervisor type and other capabilities:

  • Export the hypervisor type, architecture, supported devices, supported cpu modes, and supported features

Capacity filtering:

  • Aggregate the allocated and total available capacity and populate the corresponding fields

(Bonus)

  • Add numa cell capacity & allocation information so we can implement numa sensitive initial placement

When done:

  • Test with ssh-forwarded libvirt socket

Note

The scope of this PR is to establish a minimum viable scheduling pipeline in cortex, with the least amount of changes possible. Refactorings of the hypervisor crd spec can follow if needed.

@github-actions
Copy link

Merging this branch changes the coverage (1 decrease, 5 increase)

Impacted Packages Coverage Δ 🤖
github.com/cobaltcore-dev/kvm-node-agent/internal/controller 23.42% (+2.04%) 👍
github.com/cobaltcore-dev/kvm-node-agent/internal/emulator 0.00% (ø)
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt 11.08% (+2.58%) 👍
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/capabilities 63.64% (-3.86%) 👎
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities 71.79% (+71.79%) 🌟
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo 62.50% (+62.50%) 🌟
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/util 100.00% (+100.00%) 🌟

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/cobaltcore-dev/kvm-node-agent/internal/controller/hypervisor_controller.go 34.91% (+1.57%) 106 (+13) 37 (+6) 69 (+7) 👍
github.com/cobaltcore-dev/kvm-node-agent/internal/emulator/libvirt.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/capabilities/client.go 63.64% (+2.35%) 33 (+2) 21 (+2) 12 👍
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/capabilities/schema.go 0.00% (-88.89%) 0 (-9) 0 (-8) 0 (-1) 💀 💀 💀 💀 💀
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/client.go 71.79% (+71.79%) 39 (+39) 28 (+28) 11 (+11) 🌟
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/example.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/schema.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/client.go 62.50% (+62.50%) 48 (+48) 30 (+30) 18 (+18) 🌟
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/example.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/schema.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/interface.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/interface_mock.go 35.00% (+3.75%) 120 (+24) 42 (+12) 78 (+12) 👍
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/libvirt.go 0.00% (ø) 35 (+2) 0 35 (+2)
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/util/util.go 100.00% (+100.00%) 9 (+9) 9 (+9) 0 🌟

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/cobaltcore-dev/kvm-node-agent/internal/controller/hypervisor_controller_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/capabilities/client_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/client_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/schema_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/client_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/schema_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/util/util_test.go

@PhilippMatthes
Copy link
Member Author

Technically this is ready for review, but I want to make sure everything is implemented correctly and will check with a ssh forwarded libvirt socket, once I get an available hypervisor.

@PhilippMatthes PhilippMatthes marked this pull request as ready for review January 2, 2026 12:57
PhilippMatthes added a commit to cobaltcore-dev/cortex that referenced this pull request Jan 5, 2026
## Background

For virtual machines spawned on the kvm hypervisor, we want to no longer
use nova and placement as source of truth. Instead, filters should use
the hypervisor crd exposed by the [hypervisor
operator](github.com/cobaltcore-dev/openstack-hypervisor-operator) and
populated by the [node
agent](https://github.com/cobaltcore-dev/kvm-node-agent). This
contribution replaces the implementation of all filters that were
originally ported from nova accordingly. Afterward, we can disable
filters in nova one-by-one, moving the compute placement logic over to
cortex.

> [!TIP]
> You can use the newly added [mirror
tool](93fdcc0)
to mirror hypervisor resources from our compute cluster over to the
local cluster.

## Completion

- [x]
~internal/scheduling/decisions/nova/plugins/filters/filter_compute_capabilities.go~
(REMOVED)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_capabilities.go
(NEW)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_correct_az.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_external_customer.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_has_accelerators.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_has_enough_capacity.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_has_requested_traits.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_host_instructions.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_maintenance.go
(NEW)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_packed_virtqueue.go
- [x]
~internal/scheduling/decisions/nova/plugins/filters/filter_project_aggregates.go~
(REMOVED)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_allowed_projects.go
(NEW)
- [x]
~internal/scheduling/decisions/nova/plugins/filters/filter_disabled.go~
(REMOVED)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_status_conditions.go
(NEW)

## Dependencies

> [!NOTE]
> The scope of this PR is to establish a minimum viable scheduling
pipeline with the current state. Extensive refactorings, for example of
the filter for requested traits, are out of scope.

Hypervisor operator PR:
cobaltcore-dev/openstack-hypervisor-operator#217
KVM node agent PR:
cobaltcore-dev/kvm-node-agent#40
@PhilippMatthes
Copy link
Member Author

Tested result with real hypervisor:

apiVersion: kvm.cloud.sap/v1
kind: Hypervisor
# ...
status:
  capabilities:
    cpuArch: x86_64
    cpus: "128"
    hostTopology:
    - capacity:
        cpu: "64"
        memory: 528110060Ki
      id: 0
    - capacity:
        cpu: "64"
        memory: 528456396Ki
      id: 1
    memory: 1056566456Ki
  conditions:
  - lastTransitionTime: "2026-01-05T13:01:25Z"
    message: ""
    reason: DomainInfoClientGetSucceeded
    status: "True"
    type: DomainInfoClientConnection
  - lastTransitionTime: "2026-01-05T13:01:25Z"
    message: ""
    reason: DomainCapabilitiesClientGetSucceeded
    status: "True"
    type: DomainCapabilitiesClientConnection
  domainCapabilities:
    arch: x86_64
    hypervisorType: ch
    supportedCpuModes:
    - mode/host-passthrough
    supportedDevices:
    - video
    - video/none
    supportedFeatures: []
  domainInfos:
  - allocation:
      cpu: "2"
      memory: 2032Mi
    cpuCells:
    - 0
    memoryCells:
    - 0
    name: # omitted
    uuid: # omitted
  - allocation:
      cpu: "1"
      memory: 2032Mi
    cpuCells:
    - 0
    memoryCells:
    - 0
    name: # omitted
    uuid: # omitted
  # ...

@PhilippMatthes
Copy link
Member Author

Will polish this a bit more tomorrow. I'm not happy yet with storing the domain infos in a list, which could explode on a bigger hypervisor.

Comment on lines +19 to +28
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251229101148-5c49ce751841 h1:CQTvuKSm1YnALv5gJP2NkX5/3gz6qludor89PJ1eibw=
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251229101148-5c49ce751841/go.mod h1:i/YQm59sAvilkgTFpKc+elMIf/KzkdimnXMd13P3V9s=
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251229103057-906a154e6429 h1:1E4S42PyC1fsCJ2kjJ2qu+Ryk2vc7C0D1IInDaZWJGU=
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251229103057-906a154e6429/go.mod h1:i/YQm59sAvilkgTFpKc+elMIf/KzkdimnXMd13P3V9s=
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251229104931-d99e352a3886 h1:Tqvuis23JJnTJMhtL1zo5dqlV6THlNzsS+IfDzWTsRg=
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251229104931-d99e352a3886/go.mod h1:i/YQm59sAvilkgTFpKc+elMIf/KzkdimnXMd13P3V9s=
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251229115749-52d9308090a6 h1:yjxe8xMx3T2ZR8Vq9NqH332xoUXFAGhzZu/MLD34j0Q=
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251229115749-52d9308090a6/go.mod h1:i/YQm59sAvilkgTFpKc+elMIf/KzkdimnXMd13P3V9s=
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251230105055-37950dd7ff29 h1:2tPhnOy0tPv49xLuk1i/0mvPwOneWE+oK/yP8s4GKZY=
github.com/cobaltcore-dev/openstack-hypervisor-operator v0.0.0-20251230105055-37950dd7ff29/go.mod h1:i/YQm59sAvilkgTFpKc+elMIf/KzkdimnXMd13P3V9s=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you run go mod tidy

totalCpus := resource.NewQuantity(0, resource.DecimalSI)
for _, cell := range in.Host.Topology.CellSpec.Cells {
mem, err := cell.Memory.AsQuantity()
mem, err := util.MemoryToResource(cell.Memory.Value, cell.Memory.Unit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed, there is quantity.ParseQuantity in the k8s apimachinary package, doesn't it do the same already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants