Open
Conversation
The subsystems are (mostly) self-encapsulated and communicate via queues. In order to expose internal system metrics and state a queue will be used to export telemetry data. This will decouple the metrics capture / exposure from the internal subsystems.
Adding telemetry in order to add health checks and aid in debugging. This is disabled by default and the structures returned are alpha quality so that we can better understand the practical usage of them.
For the initial telemetry capture only the reconciliation scheduler and event watcher subsystems emit data. This is rough and expected to evolve after further testing and design.
ericlarssen
reviewed
Aug 14, 2025
src/controller/__init__.py
Outdated
| api=api, | ||
| namespace=namespace, | ||
| api_version=API_VERSION, | ||
| plural_kind=f"{kind_title.lower()}s", |
Contributor
There was a problem hiding this comment.
Do we need to support the odd plural cases?
Contributor
Author
There was a problem hiding this comment.
Yes, normally. In this case these are only Koreo resources themselves (ResourceFunction, ResourceTemplate, ValueFunction, and Workflow)—all of which use simple plural rules. It is not currently an issue, but certainly could be if we add something with complex plural rules.
Without jitter, reconciliation of all resources a controller is monitoring can become aligned. This is meant to help scatter that in order to more effectively spread load.
Starting uvicorn prior to the controller helps uvicorn start successfully. The cause is not yet clear.
Removing try/excepts in order to debug an issue with unclean uvicorn shutdowns.
Python's TaskGroup has several special case errors. If these aren't handled specially, then very verbose errors are dumped as the system exits.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds the ability to enable telemetry / diagnostics endpoints within the controller. Eventually these may be used for health checks, but the immediately objective is to improve observability and help with debugging.