Skip to content

Pass structured cluster resource data and taints upstream via ping path (aligned with interLink#516)#17

Draft
Copilot wants to merge 4 commits into
mainfrom
copilot/implement-slurm-plugin-equivalent
Draft

Pass structured cluster resource data and taints upstream via ping path (aligned with interLink#516)#17
Copilot wants to merge 4 commits into
mainfrom
copilot/implement-slurm-plugin-equivalent

Conversation

Copilot AI commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

The interlink core needs real-time HTCondor cluster resource occupancy to update the virtual node's advertised capacity and taints. Previously the /status ping path returned a plain-text health check response with no machine-readable resource data.

Changes

  • handles.py — new parse_cluster_resources_from_json(), parse_cluster_resources_from_text(), get_cluster_resources(), _run_cluster_resources_script(), and get_taints_from_config() functions; when /status is called with an empty pod list (the interlink-api ping call), returns Content-Type: application/json with a PingResponse body. Resource data is fetched via:

    • Custom script (optional): ClusterResourcesScript in SidecarConfig.yaml — when set, the specified shell command is executed and its JSON stdout is used as the ping response, allowing operators to supply their own resource-reporting logic without modifying the plugin code
    • Primary (built-in): condor_status --json — accurate per-slot Cpus and Memory; reports available resources (Unclaimed slots, excluding dynamic child-slots to avoid double-counting with partitionable-slot parents)
    • Fallback (built-in): condor_status -autoformat Cpus Memory — reports total installed CPUs and memory as Kubernetes quantity strings (allocation state not derivable from plain-text output)
    • Taints: read from SidecarConfig.yaml; when present (even as an empty list), the VK replaces the node's non-system taints with this list; when absent, existing taints are left unchanged
  • SidecarConfig.yaml — documents the optional ClusterResourcesScript and Taints config keys with usage examples

  • tests/test_resources.py — parsing logic exercised via parse_cluster_resources_from_json / parse_cluster_resources_from_text functions without a live HTCondor binary; integration tests verify the ping path returns the correct PingResponse JSON format including taints and custom script output

Example response on ping

{
  "status": "ok",
  "resources": {
    "cpu": "24",
    "memory": "96000Mi"
  },
  "taints": [
    {"key": "example.com/no-schedule", "effect": "NoSchedule"}
  ]
}

Example SidecarConfig.yaml

# Plug in a custom resource-reporting script instead of the built-in condor_status logic:
ClusterResourcesScript: "/usr/local/bin/my-resource-reporter.sh"

# Taints to advertise on the virtual node:
Taints:
  - key: "example.com/no-schedule"
    effect: "NoSchedule"

This mirrors the implementation in interlink-hq/interlink-slurm-plugin#135 adapted for HTCondor.

The VK's nodeUpdate() (interLink#516) parses this body as PingResponse and:

  • when resources is non-nil, calls updateNodeResources() to update both node.Status.Capacity and node.Status.Allocatable on every 30-second heartbeat
  • when taints is non-nil, calls updateNodeTaints() to replace the node's non-system taints

TODO: Replace the locally-defined PingResponse dict structure with upstream commonIL.* imports once interlink-hq/interLink#516 is merged and the interLink dependency is updated.

@dciangot

dciangot commented Apr 5, 2026

Copy link
Copy Markdown
Member

@copilot include also taints to be passed from the config file via the api interlink-hq/interLink#516 (comment)

Copilot AI commented Apr 5, 2026

Copy link
Copy Markdown
Contributor Author

@copilot include also taints to be passed from the config file via the api interlink-hq/interLink#516 (comment)

Done in e2f2c82. Added get_taints_from_config() which reads the optional Taints list from SidecarConfig.yaml and merges it into the ping response. When present (even as an empty list), the VK replaces non-system taints; when absent, existing taints are left unchanged — matching the interLink#516 behaviour.

Example config:

Taints:
  - key: "example.com/no-schedule"
    effect: "NoSchedule"
  - key: "example.com/gpu"
    value: "true"
    effect: "NoSchedule"

Example ping response:

{
  "status": "ok",
  "resources": {"cpu": "24", "memory": "96000Mi"},
  "taints": [{"key": "example.com/no-schedule", "effect": "NoSchedule"}]
}

Copilot AI changed the title Pass structured cluster resource data upstream via ping path (aligned with interLink#516) Pass structured cluster resource data and taints upstream via ping path (aligned with interLink#516) Apr 5, 2026
Comment thread handles.py
commonIL.PingResponse type once interlink-hq/interLink#516 is merged and
the interLink dependency is updated.
"""
try:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot make get cluster resource script configurable. Imagine I want to pass my own logic from the config file of the plugin.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 582c899. Added a ClusterResourcesScript config key to SidecarConfig.yaml. When set, the specified shell command is executed instead of the built-in condor_status logic and its JSON stdout is used as the ping response. This lets operators supply any custom resource-reporting script without modifying the plugin code.

Config example:

ClusterResourcesScript: "/usr/local/bin/my-resource-reporter.sh"

The script must print a JSON object to stdout in the PingResponse format:

{"status": "ok", "resources": {"cpu": "48", "memory": "192000Mi"}}

When absent or empty, the built-in condor_status --json / condor_status -autoformat logic is used as before.

Copilot AI requested a review from dciangot April 5, 2026 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants