Pass structured cluster resource data and taints upstream via ping path (aligned with interLink#516)#17
Pass structured cluster resource data and taints upstream via ping path (aligned with interLink#516)#17Copilot wants to merge 4 commits into
Conversation
…k#516) Agent-Logs-Url: https://github.com/interlink-hq/interlink-htcondor-plugin/sessions/4c7de43c-eca9-43ce-b9b3-63f5ca8241a1 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
|
@copilot include also taints to be passed from the config file via the api interlink-hq/interLink#516 (comment) |
Agent-Logs-Url: https://github.com/interlink-hq/interlink-htcondor-plugin/sessions/c2074631-cdc2-485c-84cd-578ee1f42118 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
Agent-Logs-Url: https://github.com/interlink-hq/interlink-htcondor-plugin/sessions/c2074631-cdc2-485c-84cd-578ee1f42118 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
Done in e2f2c82. Added Example config: Taints:
- key: "example.com/no-schedule"
effect: "NoSchedule"
- key: "example.com/gpu"
value: "true"
effect: "NoSchedule"Example ping response: {
"status": "ok",
"resources": {"cpu": "24", "memory": "96000Mi"},
"taints": [{"key": "example.com/no-schedule", "effect": "NoSchedule"}]
} |
| commonIL.PingResponse type once interlink-hq/interLink#516 is merged and | ||
| the interLink dependency is updated. | ||
| """ | ||
| try: |
There was a problem hiding this comment.
@copilot make get cluster resource script configurable. Imagine I want to pass my own logic from the config file of the plugin.
There was a problem hiding this comment.
Done in 582c899. Added a ClusterResourcesScript config key to SidecarConfig.yaml. When set, the specified shell command is executed instead of the built-in condor_status logic and its JSON stdout is used as the ping response. This lets operators supply any custom resource-reporting script without modifying the plugin code.
Config example:
ClusterResourcesScript: "/usr/local/bin/my-resource-reporter.sh"The script must print a JSON object to stdout in the PingResponse format:
{"status": "ok", "resources": {"cpu": "48", "memory": "192000Mi"}}When absent or empty, the built-in condor_status --json / condor_status -autoformat logic is used as before.
Agent-Logs-Url: https://github.com/interlink-hq/interlink-htcondor-plugin/sessions/e418e49a-0767-4349-a79b-c3e4802a65e4 Co-authored-by: dciangot <4144326+dciangot@users.noreply.github.com>
The interlink core needs real-time HTCondor cluster resource occupancy to update the virtual node's advertised capacity and taints. Previously the
/statusping path returned a plain-text health check response with no machine-readable resource data.Changes
handles.py— newparse_cluster_resources_from_json(),parse_cluster_resources_from_text(),get_cluster_resources(),_run_cluster_resources_script(), andget_taints_from_config()functions; when/statusis called with an empty pod list (the interlink-api ping call), returnsContent-Type: application/jsonwith aPingResponsebody. Resource data is fetched via:ClusterResourcesScriptinSidecarConfig.yaml— when set, the specified shell command is executed and its JSON stdout is used as the ping response, allowing operators to supply their own resource-reporting logic without modifying the plugin codecondor_status --json— accurate per-slotCpusandMemory; reports available resources (Unclaimed slots, excluding dynamic child-slots to avoid double-counting with partitionable-slot parents)condor_status -autoformat Cpus Memory— reports total installed CPUs and memory as Kubernetes quantity strings (allocation state not derivable from plain-text output)SidecarConfig.yaml; when present (even as an empty list), the VK replaces the node's non-system taints with this list; when absent, existing taints are left unchangedSidecarConfig.yaml— documents the optionalClusterResourcesScriptandTaintsconfig keys with usage examplestests/test_resources.py— parsing logic exercised viaparse_cluster_resources_from_json/parse_cluster_resources_from_textfunctions without a live HTCondor binary; integration tests verify the ping path returns the correctPingResponseJSON format including taints and custom script outputExample response on ping
{ "status": "ok", "resources": { "cpu": "24", "memory": "96000Mi" }, "taints": [ {"key": "example.com/no-schedule", "effect": "NoSchedule"} ] }Example
SidecarConfig.yamlThis mirrors the implementation in interlink-hq/interlink-slurm-plugin#135 adapted for HTCondor.
The VK's
nodeUpdate()(interLink#516) parses this body asPingResponseand:resourcesis non-nil, callsupdateNodeResources()to update bothnode.Status.Capacityandnode.Status.Allocatableon every 30-second heartbeattaintsis non-nil, callsupdateNodeTaints()to replace the node's non-system taints