Skip to content

[FEAT] Add SparkSession tag management for Interrupt()#183

Open
40u5 wants to merge 1 commit into
apache:masterfrom
40u5:feat/issue-49-session-tag-management
Open

[FEAT] Add SparkSession tag management for Interrupt()#183
40u5 wants to merge 1 commit into
apache:masterfrom
40u5:feat/issue-49-session-tag-management

Conversation

@40u5

@40u5 40u5 commented Jun 8, 2026

Copy link
Copy Markdown

What changes were proposed in this pull request?

Closes #49.

Adds the per-session tag management API that complements the Interrupt / InterruptTag / InterruptOperation surface introduced in #182, so that callers can actually tag operations and then cancel by tag:

  • SparkSession.AddTag(tag string) error — attaches a tag to every subsequent operation issued by the session.
  • SparkSession.RemoveTag(tag string) error — removes a previously-added tag. Removing a tag that was never added is a no-op.
  • SparkSession.GetTags() []string — returns the tags currently attached to the session, sorted lexicographically for deterministic output.
  • SparkSession.ClearTags() — drops every tag attached to the session.

Plumbing:

  • The set is stored on the underlying sparkConnectClientImpl behind a sync.RWMutex so it is safe to mutate from multiple goroutines.
  • newExecutePlanRequest now sets request.Tags = s.GetTags(), so the tags ride along on every ExecutePlan / ExecuteCommand issued after AddTag.
  • The new methods are exposed on the SparkConnectClient interface (spark/client/base/base.go) and spark/mocks/mock_executor.go is updated to satisfy the extended interface.
  • Validation lives in a new exported base.ValidateTag so both the real client and the in-tree mock enforce the Spark Connect contract: tags must be non-empty and must not contain ,. Invalid input returns sparkerrors.InvalidArgumentError.

Why are the changes needed?

Issue #49 asks for Interrupt() support on the Go client. #182 surfaces InterruptAll / InterruptTag / InterruptOperation, but InterruptTag is only useful if callers can actually attach tags to operations — which today they cannot, because there is no equivalent of PySpark's SparkSession.addTag / removeTag / getTags / clearTags. This PR fills that gap.

Does this PR introduce any user-facing change?

Yes — additive only:

  • SparkSession gains AddTag, RemoveTag, GetTags, ClearTags.
  • The SparkConnectClient interface gains the same four methods; no known external implementers, and the in-tree mocks.TestExecutor is updated in the same commit.
  • base.ValidateTag is exported so callers can reuse the same validation if they want.

No backward-incompatible breakages.

How was this patch tested?

Added unit tests in spark/client/client_test.go:

  • TestAddTagRejectsInvalidInput — empty tag and tag containing , both return sparkerrors.InvalidArgumentError and are not stored.
  • TestTagsRoundTripAddRemoveClearAddTag dedupes, GetTags returns sorted output, RemoveTag works including removing tags that were never added (no-op), ClearTags resets the set.
  • TestExecutePlanRequestCarriesSessionTags — wraps the existing mock in a recorder that captures the ExecutePlanRequest and verifies that tags configured via AddTag end up on request.Tags, and that ClearTags scrubs them from subsequent requests.

And in spark/sql/sparksession_test.go:

  • TestSparkSessionTagsRoundTripThroughClient — exercises the same flow through the public SparkSession API, including invalid-tag rejection.

Locally:

  • go build ./... — clean.
  • go vet ./... — clean.
  • gofmt -l ./spark — no diff.
  • go test $(go list ./... | grep -v internal/tests/integration) — all PASS. The integration suite is excluded because it requires SPARK_HOME and is unrelated to this change.

Surfaces SparkSession.AddTag / RemoveTag / GetTags / ClearTags so user
code can tag operations and later cancel them by tag via InterruptTag.
Tags are stored on the underlying SparkConnectClient (mutex-protected)
and threaded into ExecutePlanRequest.Tags on every execute, matching
PySpark's tag semantics. Validation matches the Spark Connect contract:
tags must be non-empty and must not contain ','.

Together with the Interrupt() / InterruptTag / InterruptOperation
surface added in apache#182, this closes apache#49.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Interrupt()

1 participant