[TE-5339] Fix test command error handling by nprizal · Pull Request #455 · buildkite/test-engine-client

nprizal · 2026-03-09T02:12:21Z

Problem

There were recurring bugs where bktec reports a build as passed even though the test runner exited with a non-zero code. This is dangerous because bktec should exits with a non-zero if there are any failures, so customers can take action on it.

We've attempted to fix the issues before, for example:

Better handling of runtime error for Jest #377 - Better handling of runtime error for Jest
Return error when rspec exits with non-zero but no failure #436 - Return error when rspec exits with non-zero but no failure

Those fixes addressed specific scenarios for individual runners but didn’t address the underlying problem.

The root cause is that bktec swallows the runner's exit error after parsing the report. This was originally done to avoid failing the build when the only failures come from muted tests, but it has the side effect of hiding legitimate non-test failures.

Changes

This PR reworks the error handling across all runners to consistently propagate the command error instead of swallowing it. Each runner now follows the same approach:

execute the test command
parse the report regardless the exit code/error from the test command
if there is a failure when parsing the report, return the error from command (if any)
otherwise, process the report and update the RunResult
return error from the test command (if any)

On the caller side (the main package), if the run result is Passed but there are failed muted tests, we can assume the error originates from those muted test failures. Therefore, we should ignore them to prevent the build from failing. Otherwise the error propagates and causes the build to fail.

Test

I've tested locally against RSpec. We should test it in examples pipeline before releasing this.

nprizal · 2026-03-09T02:16:09Z

internal/command/run.go

-	// Print muted and failed tests
-	mutedTests := runResult.MutedTests()
-	if len(mutedTests) > 0 {
+	if statistics.Total > 0 {


printReport function is now being called regardless the error and run result. But the statistics will only be printed if we get actual test results recorded in the result object.

Only muted tests failure is ignored, and all other errors will bubbled up

It expects any errors from the command to be returned

pda · 2026-03-09T04:09:07Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 814a12994c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

internal/command/run.go

internal/runner/gotest.go

gchan · 2026-03-11T22:20:09Z

internal/command/run.go

+	// If the run result status is Passed but there are failed muted tests,
+	// the failures were suppressed due to muting.
+	// Therefore, we should ignore the error to prevent the build from failing.
+	if runResult.Status() == runner.RunStatusPassed && len(runResult.FailedMutedTests()) > 0 {


I wonder if it's worth mentioning that the pytest runner will capture non standard codes, early exits without a RunStatusUnknown so non-standard error codes won't be swallowed by this line.

I asked an LLM if it's possible for runner to produce a report while still reporting a non-standard exit code:

Yes, this is possible, and the current code would silently swallow that error. Here's how: Consider a runner like pytest. Pytest has well-defined exit codes (https://docs.pytest.org/en/7.1.x/reference/exit-codes.html): - 1 = tests failed - 2 = user interrupted - 3 = internal error - 4 = usage error - 5 = no tests collected If pytest exits with code 2 (interrupted) but happened to flush a partial report before being interrupted, and that report only contains muted test failures, then: 1. runResult.Status() would be RunStatusPassed (failures are muted) 2. len(runResult.FailedMutedTests()) > 0 would be true 3. The function returns nil -- build passes The interruption error (exit code 2) is swallowed.

But the LLM also said the individual runner (pytest) callers handles non standard error codes and doesn't return a report.

However, looking at the actual runner implementations, pytest and pytest_pants have an early return for non-1 exit codes: if exitError := new(exec.ExitError); errors.As(cmdErr, &exitError) && exitError.ExitCode() != 1 { return cmdErr } This means pytest/pytest_pants would bail before even parsing the report for codes != 1, so runResult would have no tests recorded, and Status() would be RunStatusUnknown, not RunStatusPassed. The muted-test suppression wouldn't trigger. So pytest is safe.

Other runners don't have this guard apparently.

But the other runners (rspec, jest, cucumber, playwright) don't have this guard. They always attempt to parse the report regardless of exit code. So if, say, rspec exits with code 7 (which is a non-standard rspec exit (https://github.com/rspec/rspec-core/issues/2956)) but produces a valid JSON report where the only failures are muted tests, this code would suppress that exit code 7 error. In practice, exit code 7 in rspec is unlikely to produce a clean report. And most "non-standard" exit codes (signal deaths, OOM kills) would either not produce a report at all, or produce an unparseable one (triggering the parseErr path which returns cmdErr). But it's not impossible. The gotest runner has the same early-return guard as pytest for non-1 exit codes, so it's also safe. So the realistic risk is narrow but exists for rspec/jest/cucumber/playwright. A possible improvement would be to add a check at the caller level: if runResult.Status() == runner.RunStatusPassed && len(runResult.FailedMutedTests()) > 0 { // Only suppress known test-failure exit codes (typically 1) if exitError := new(exec.ExitError); errors.As(runErr, &exitError) && exitError.ExitCode() == 1 { return nil } } This would only suppress the error when the exit code is exactly 1 (the conventional "tests failed" code), letting non-standard exit codes propagate even when muted tests are present.

Yes, I agree with everything LLM said. There still a risk of swallowing the error for other test runners, because there is no foolprof way to distinguish test failure and other failures (some runners exit with 1 for non test failure).

So the realistic risk is narrow but exists for rspec/jest/cucumber/playwright. A possible improvement would be to add a check at the caller level

Agree that we can narrow down the risk by checking the exit code, and only swallow exit 1.

gchan · 2026-03-11T22:40:24Z

internal/command/run.go

+	// Therefore, we should ignore the error to prevent the build from failing.
+	if runResult.Status() == runner.RunStatusPassed && len(runResult.FailedMutedTests()) > 0 {
+		return nil
+	}


I wonder if it's safer to not suppress signal errors? Should we move the ProcessSignaledError handling before this block so we don't miss non-1 errors?

LLM:

A normal test-failure exit code 1 would not match ProcessSignaledError

Yeah good point. We should handle signal error first before suppressing the muted test failure.

gchan

Latest commit addresses my questions and it more tightly only swallows exit code 1. I can approve if it's blocking but would be great if another pair of eyes worked through the signal handling logic!

malclocke

I have a few points to discuss, but a great change overall 👍

malclocke · 2026-03-12T03:22:06Z

internal/command/run.go

-	if runResult.Status() == runner.RunStatusFailed || runResult.Status() == runner.RunStatusError {
-		os.Exit(1)
+	if exitError := new(exec.ExitError); errors.As(runErr, &exitError) {
+		if exitError.ExitCode() == 1 && runResult.Status() == runner.RunStatusPassed && len(runResult.FailedMutedTests()) > 0 {


nit We could implement something like this on RunResult

func (r *RunResult) onlyMutedFailures() bool { ... }

The check could be a little more defensive too, like len(runResult.FailedMutedTests()) == len(runResult.FailedTestsIncludingMuted())

malclocke · 2026-03-12T03:31:19Z

internal/command/run.go

 	case runner.RunStatusPassed:
-		fmt.Println("✅ All tests passed.")
+		if len(runResult.FailedMutedTests()) > 0 {
+			fmt.Println("✅ Build passed. Some muted tests failed.")


Nice addition ✨

malclocke · 2026-03-12T03:43:16Z

internal/command/run.go

 	// execute tests
 	var timeline []api.Timeline
-	runResult, err := runTestsWithRetry(testRunner, &thisNodeTask.Tests, cfg.MaxRetries, testPlan.MutedTests, &timeline, cfg.RetryForMutedTest, cfg.FailOnNoTests)
+	runResult, runErr := runTestsWithRetry(testRunner, &thisNodeTask.Tests, cfg.MaxRetries, testPlan.MutedTests, &timeline, cfg.RetryForMutedTest, cfg.FailOnNoTests)


question there are quite a few of these s/err/xyzErr/ in this PR, is this a gofumpt rule or something else?

malclocke · 2026-03-12T03:49:45Z

internal/runner/gotest.go

+		fmt.Printf("Buildkite Test Engine Client: Failed to read gotestsum output, tests will not be retried: %v\n", parseErr)
+		// We don't want to fail the build if we fail to parse the report,
+		// therefore we return the command error (which can be nil), instead of the parse error.
+		return cmdErr


note this is a change of behaviour, parse error was returned previously.

malclocke · 2026-03-12T04:00:11Z

internal/runner/run_result.go

 	}

-	if len(r.tests) == 0 {
+	if len(r.tests) == 0 || r.unknownResultTestsCount() > 0 {


thought I noticed while reviewing that this method "fails open", e.g. the fall through case is return RunResultPassed on line 152.

Maybe we should tackle that at the same time and make an explicit case for RunResultPassed?

nprizal commented Mar 9, 2026

View reviewed changes

nprizal added 6 commits March 9, 2026 15:39

Fix error handling for RSpec

3aaed1e

Revert the error handling logic when running tests

44b81b1

Only muted tests failure is ignored, and all other errors will bubbled up

Fix retry behavior

708ee85

Update error handling for other runners

78ed7d6

Update the test runner tests

c74e0bb

It expects any errors from the command to be returned

Don't print the report when result is unknown

94b9e8d

nprizal force-pushed the te-5339-3-persona-false-pass-when-vitest-exits-non-zero branch from f4f39ca to 94b9e8d Compare March 9, 2026 02:47

Update the muted tests handling

814a129

nprizal changed the title ~~Te 5339 3 persona false pass when vitest exits non zero~~ [TE-5339] Fix the handling of error from test command Mar 9, 2026

nprizal changed the title ~~[TE-5339] Fix the handling of error from test command~~ [TE-5339] Fix test command error handling Mar 9, 2026

nprizal self-assigned this Mar 9, 2026

nprizal marked this pull request as ready for review March 9, 2026 03:58

nprizal requested a review from a team as a code owner March 9, 2026 03:58

chatgpt-codex-connector bot reviewed Mar 9, 2026

View reviewed changes

internal/command/run.go Show resolved Hide resolved

internal/runner/gotest.go Outdated Show resolved Hide resolved

Fix gotest error handling

5367305

nprizal requested review from malclocke and pda March 9, 2026 23:06

buildkite deleted a comment from chatgpt-codex-connector bot Mar 11, 2026

gchan reviewed Mar 11, 2026

View reviewed changes

Improve error handling for muted test failures

10c1c83

gchan reviewed Mar 12, 2026

View reviewed changes

malclocke reviewed Mar 12, 2026

View reviewed changes

Conversation

nprizal commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Test

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pda commented Mar 9, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gchan left a comment

Choose a reason for hiding this comment

Uh oh!

malclocke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nprizal commented Mar 9, 2026 •

edited

Loading