Skip to content

[TE-5339] Fix test command error handling#455

Open
nprizal wants to merge 9 commits intomainfrom
te-5339-3-persona-false-pass-when-vitest-exits-non-zero
Open

[TE-5339] Fix test command error handling#455
nprizal wants to merge 9 commits intomainfrom
te-5339-3-persona-false-pass-when-vitest-exits-non-zero

Conversation

@nprizal
Copy link
Contributor

@nprizal nprizal commented Mar 9, 2026

Problem

There were recurring bugs where bktec reports a build as passed even though the test runner exited with a non-zero code. This is dangerous because bktec should exits with a non-zero if there are any failures, so customers can take action on it.

We've attempted to fix the issues before, for example:

Those fixes addressed specific scenarios for individual runners but didn’t address the underlying problem.

The root cause is that bktec swallows the runner's exit error after parsing the report. This was originally done to avoid failing the build when the only failures come from muted tests, but it has the side effect of hiding legitimate non-test failures.

Changes

This PR reworks the error handling across all runners to consistently propagate the command error instead of swallowing it. Each runner now follows the same approach:

  • execute the test command
  • parse the report regardless the exit code/error from the test command
  • if there is a failure when parsing the report, return the error from command (if any)
  • otherwise, process the report and update the RunResult
  • return error from the test command (if any)

On the caller side (the main package), if the run result is Passed but there are failed muted tests, we can assume the error originates from those muted test failures. Therefore, we should ignore them to prevent the build from failing. Otherwise the error propagates and causes the build to fail.

Test

I've tested locally against RSpec. We should test it in examples pipeline before releasing this.

// Print muted and failed tests
mutedTests := runResult.MutedTests()
if len(mutedTests) > 0 {
if statistics.Total > 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

printReport function is now being called regardless the error and run result. But the statistics will only be printed if we get actual test results recorded in the result object.

@nprizal nprizal force-pushed the te-5339-3-persona-false-pass-when-vitest-exits-non-zero branch from f4f39ca to 94b9e8d Compare March 9, 2026 02:47
@nprizal nprizal changed the title Te 5339 3 persona false pass when vitest exits non zero [TE-5339] Fix the handling of error from test command Mar 9, 2026
@nprizal nprizal changed the title [TE-5339] Fix the handling of error from test command [TE-5339] Fix test command error handling Mar 9, 2026
@nprizal nprizal self-assigned this Mar 9, 2026
@nprizal nprizal marked this pull request as ready for review March 9, 2026 03:58
@nprizal nprizal requested a review from a team as a code owner March 9, 2026 03:58
@pda
Copy link
Member

pda commented Mar 9, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 814a12994c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@nprizal nprizal requested review from malclocke and pda March 9, 2026 23:06
@buildkite buildkite deleted a comment from chatgpt-codex-connector bot Mar 11, 2026
// If the run result status is Passed but there are failed muted tests,
// the failures were suppressed due to muting.
// Therefore, we should ignore the error to prevent the build from failing.
if runResult.Status() == runner.RunStatusPassed && len(runResult.FailedMutedTests()) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth mentioning that the pytest runner will capture non standard codes, early exits without a RunStatusUnknown so non-standard error codes won't be swallowed by this line.

I asked an LLM if it's possible for runner to produce a report while still reporting a non-standard exit code:

Yes, this is possible, and the current code would silently swallow that error. Here's how:
Consider a runner like pytest. Pytest has well-defined exit codes (https://docs.pytest.org/en/7.1.x/reference/exit-codes.html):
- 1 = tests failed
- 2 = user interrupted
- 3 = internal error
- 4 = usage error
- 5 = no tests collected
If pytest exits with code 2 (interrupted) but happened to flush a partial report before being interrupted, and that report only contains muted test failures, then:
1. runResult.Status() would be RunStatusPassed (failures are muted)
2. len(runResult.FailedMutedTests()) > 0 would be true
3. The function returns nil -- build passes
The interruption error (exit code 2) is swallowed.

But the LLM also said the individual runner (pytest) callers handles non standard error codes and doesn't return a report.

However, looking at the actual runner implementations, pytest and pytest_pants have an early return for non-1 exit codes:
if exitError := new(exec.ExitError); errors.As(cmdErr, &exitError) && exitError.ExitCode() != 1 {
    return cmdErr
}
This means pytest/pytest_pants would bail before even parsing the report for codes != 1, so runResult would have no tests recorded, and Status() would be RunStatusUnknown, not RunStatusPassed. The muted-test suppression wouldn't trigger. So pytest is safe.

Other runners don't have this guard apparently.

But the other runners (rspec, jest, cucumber, playwright) don't have this guard. They always attempt to parse the report regardless of exit code. So if, say, rspec exits with code 7 (which is a non-standard rspec exit (https://github.com/rspec/rspec-core/issues/2956)) but produces a valid JSON report where the only failures are muted tests, this code would suppress that exit code 7 error.
In practice, exit code 7 in rspec is unlikely to produce a clean report. And most "non-standard" exit codes (signal deaths, OOM kills) would either not produce a report at all, or produce an unparseable one (triggering the parseErr path which returns cmdErr). But it's not impossible.
The gotest runner has the same early-return guard as pytest for non-1 exit codes, so it's also safe.
So the realistic risk is narrow but exists for rspec/jest/cucumber/playwright. A possible improvement would be to add a check at the caller level:
if runResult.Status() == runner.RunStatusPassed && len(runResult.FailedMutedTests()) > 0 {
    // Only suppress known test-failure exit codes (typically 1)
    if exitError := new(exec.ExitError); errors.As(runErr, &exitError) && exitError.ExitCode() == 1 {
        return nil
    }
}
This would only suppress the error when the exit code is exactly 1 (the conventional "tests failed" code), letting non-standard exit codes propagate even when muted tests are present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with everything LLM said. There still a risk of swallowing the error for other test runners, because there is no foolprof way to distinguish test failure and other failures (some runners exit with 1 for non test failure).

So the realistic risk is narrow but exists for rspec/jest/cucumber/playwright. A possible improvement would be to add a check at the caller level

Agree that we can narrow down the risk by checking the exit code, and only swallow exit 1.

// Therefore, we should ignore the error to prevent the build from failing.
if runResult.Status() == runner.RunStatusPassed && len(runResult.FailedMutedTests()) > 0 {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's safer to not suppress signal errors? Should we move the ProcessSignaledError handling before this block so we don't miss non-1 errors?

LLM:

A normal test-failure exit code 1 would not match ProcessSignaledError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. We should handle signal error first before suppressing the muted test failure.

Copy link
Contributor

@gchan gchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit addresses my questions and it more tightly only swallows exit code 1. I can approve if it's blocking but would be great if another pair of eyes worked through the signal handling logic!

Copy link
Contributor

@malclocke malclocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few points to discuss, but a great change overall 👍

if runResult.Status() == runner.RunStatusFailed || runResult.Status() == runner.RunStatusError {
os.Exit(1)
if exitError := new(exec.ExitError); errors.As(runErr, &exitError) {
if exitError.ExitCode() == 1 && runResult.Status() == runner.RunStatusPassed && len(runResult.FailedMutedTests()) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit We could implement something like this on RunResult

func (r *RunResult) onlyMutedFailures() bool {
  ...
}

The check could be a little more defensive too, like len(runResult.FailedMutedTests()) == len(runResult.FailedTestsIncludingMuted())

case runner.RunStatusPassed:
fmt.Println("✅ All tests passed.")
if len(runResult.FailedMutedTests()) > 0 {
fmt.Println("✅ Build passed. Some muted tests failed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition ✨

// execute tests
var timeline []api.Timeline
runResult, err := runTestsWithRetry(testRunner, &thisNodeTask.Tests, cfg.MaxRetries, testPlan.MutedTests, &timeline, cfg.RetryForMutedTest, cfg.FailOnNoTests)
runResult, runErr := runTestsWithRetry(testRunner, &thisNodeTask.Tests, cfg.MaxRetries, testPlan.MutedTests, &timeline, cfg.RetryForMutedTest, cfg.FailOnNoTests)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question there are quite a few of these s/err/xyzErr/ in this PR, is this a gofumpt rule or something else?

fmt.Printf("Buildkite Test Engine Client: Failed to read gotestsum output, tests will not be retried: %v\n", parseErr)
// We don't want to fail the build if we fail to parse the report,
// therefore we return the command error (which can be nil), instead of the parse error.
return cmdErr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note this is a change of behaviour, parse error was returned previously.

}

if len(r.tests) == 0 {
if len(r.tests) == 0 || r.unknownResultTestsCount() > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought I noticed while reviewing that this method "fails open", e.g. the fall through case is return RunResultPassed on line 152.

Maybe we should tackle that at the same time and make an explicit case for RunResultPassed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants