Skip to content

Partial TRS haircut#70

Open
rainest wants to merge 1 commit into
mainfrom
rainest/less-trs
Open

Partial TRS haircut#70
rainest wants to merge 1 commit into
mainfrom
rainest/less-trs

Conversation

@rainest
Copy link
Copy Markdown
Contributor

@rainest rainest commented Jun 2, 2026

tl;dr you probably don't need Kafka just to send a bunch of HTTP requests. If we hit scaling problems in the future, that should better inform further refactoring.

This moves TRS into an internal package and removes some stuff that looks like cruft, but doesn't try for exhaustive simplification just yet.

This will allow for further simplifying #69 into "just set CGO_ENABLED=0, but I think having #69 around as an example if we encounter something else that happens to need CGO.

This "adds" code, but it's actually removing a bunch of code that used to be external in an HPE repo: https://gist.github.com/rainest/a04de0575926844227fd3096b5800ad5


The task runner service tries to answer the question of whether you need Kafka to handle the issuance and tracking of HTTP requests, and its README posits a possible answer and some ballpark scale figures:

[Kafka] mode was created to increase scalability but has not yet been adopted due to its added complexity. Unless local mode shows itself to be inadequate, remote mode will likely remain a dormant feature. Testing has shown that local mode should scale beyond 24,000 concurrent tasks.

Bringing Kafka in requires CGO, because the original Confluent client needs it. Should you need Kafka, there are alternatives that don't need CGO (https://github.com/twmb/franz-go and the new official IBM/sarama#1179).

#69 does show how to handle CGO on ARM, but AFAIK nothing else in our service set needs CGO, and ideally nothing will.

That said, you probably don't need Kafka! As stated, the local variant's usually fine, since you probably simply don't have that many nodes, or could use alternative strategies for dealing with HTTP requests that don't involve an entire distributed event store! Refactoring to some combination of standard Go concurrency primitives, the existing usage of a simple retry library, multiple replicas, or any of the other myriad ways to handle distribution of HTTP requests in a way particular to the needs of an individual service is likely fine. OpenCHAMI doesn't use the one other service that (maybe) needed TRS, so the need to keep the library around is tenuous.

I don't have a good way to disentangle the architecture history that drove the design choices in TRS. Some (serializing HTTP requests into JSON that could fit into Kafka events) is pretty clearly unnecessary if you don't need to farm your HTTP requests out to Kafka, some appears to be trying to deal with connection management behavior in ancient versions of Golang (problem solved: we do not use ancient versions of Golang), and some (in test design) seems to be trying to chaos engineering writ small when BMCs or the cluster network misbehaves. There's maybe some value in the latter, but I'm iffy because I lack an actual fleet of IRL BMCs that I can casually try and make misbehave by chaotically sending mass power commands or status queries to a misbehaving cluster or a general purpose mock environment with problem scenarios informed by real-world experience.

Ultimately, there's further cruft here that could be simplified further, but my appetite for yak shaving is not infinite.

Replace the github.com/Cray-HPE/hms-trs-app-api/ package with a local
copy that lacks the Kafka implementation.

Removes serialization code that's not necessary for locally-issued HTTP
requests.

Removes workarounds for bugs in older Go versions.

Removes unused logger cruft.

Signed-off-by: Travis Raines <571832+rainest@users.noreply.github.com>
@rainest
Copy link
Copy Markdown
Contributor Author

rainest commented Jun 3, 2026

IDK what the heck golangci-lint's doing--it's complaining about

  Error: cmd/power-control/service.go:174:2: ineffectual assignment to envstr (ineffassign)
  	envstr = os.Getenv("TRS_IMPLEMENTATION")
  	^
  1 issues:
  * ineffassign: 1

but that's specifically a line I removed because of, yknow, the thing it's complaining about: https://github.com/OpenCHAMI/power-control/pull/70/changes#diff-dc71194ec358f605d87e6c182853969eacb9af7dc0a044247868f27c011d1bf9L174

service.go has weird envvar parsing where it reuses the same envstr variable over and over (I guess we're running on an embedded device and we really need to save on soon-to-be-GCed memory by reusing the same string variable or something), and maybe that confuses it, but IDK--I don't expect that'll be an issue on merge because the line is, well, gone 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant