Skip to content

impl(o11y): introduce error.type#4148

Open
diegomarquezp wants to merge 9 commits intomainfrom
observability/tracing-attr/error.type
Open

impl(o11y): introduce error.type#4148
diegomarquezp wants to merge 9 commits intomainfrom
observability/tracing-attr/error.type

Conversation

@diegomarquezp
Copy link
Contributor

@diegomarquezp diegomarquezp commented Mar 16, 2026

This PR introduces error.type observability attribute, which is obtained from error information (i.e. Throwables).

Key Changes:

  • Introduction of ErrorTypeUtils, with public surface, that provides a string representation of the possible error types given a Throwable. Tracing classes such as SpanTracer can rely on it to obtain the error.type attribute from such a throwable.
  • Temporary addition of addAttribute to Span. This is because attemptFailed() and other sibling methods must modify existing Spans. This will be reverted in the future once we "flatten" the SpanTracer vs SpanManager layout.

Logic

The logic is extensively explained in the javadoc of ErrorTypeUtils.extractErrorType(), but in a nutshell it consists of:

  1. Using the ErrorInfo reason as a primary source
  2. Using the server side error code
  3. Using the client side error as a fallback.
  1. Use the exception class name
  2. Simply return INTERNAL

Testing

  • Added error-related test in ITOtelTracing

@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Mar 16, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability capabilities by introducing a new error.type attribute for tracing spans. This attribute provides a standardized, low-cardinality classification of errors, enabling more effective analysis and aggregation of error data. The changes include a robust error type extraction logic that prioritizes specific error reasons and codes, ensuring consistent and meaningful error reporting across different failure modes.

Highlights

  • New error.type Attribute: Introduced a new error.type attribute to provide standardized, low-cardinality error classification in observability traces.
  • Error Type Extraction Logic: Implemented a utility function ObservabilityUtils.extractErrorType to determine the appropriate error type based on a priority system (ErrorInfo.reason, server error codes, client-side errors, language-specific exceptions, or internal fallback).
  • Span Attribute Support: Extended the TraceManager.Span interface and its OpenTelemetry implementation to support adding custom attributes to spans.
  • Automated Error Type Recording: Modified SpanTracer to automatically record the error.type attribute on failed operation attempts.
  • Comprehensive Testing: Added comprehensive unit tests for SpanTracer and integration tests in ITOtelTracing to validate the correct extraction and recording of error.type for various error scenarios.
Changelog
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/ObservabilityAttributes.java
    • Added ERROR_TYPE_ATTRIBUTE constant to define the new error type attribute key.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/ObservabilityUtils.java
    • Added ErrorType enum for classifying client-side network and operational errors.
    • Implemented extractErrorType static method to determine a low-cardinality error type string from a Throwable based on a defined priority order.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/OpenTelemetryTraceManager.java
    • Implemented the addAttribute method in the OtelSpan inner class to allow setting custom attributes on the underlying OpenTelemetry span.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/SpanTracer.java
    • Overrode attemptCancelled, attemptFailedDuration, attemptFailedRetriesExhausted, and attemptPermanentFailure methods to ensure error handling.
    • Introduced a private recordErrorAndEndAttempt method to add the error.type attribute to the span using ObservabilityUtils.extractErrorType before ending the attempt.
  • gax-java/gax/src/main/java/com/google/api/gax/tracing/TraceManager.java
    • Added addAttribute(String key, String value) method to the Span interface, allowing custom attributes to be attached to spans.
  • gax-java/gax/src/test/java/com/google/api/gax/tracing/SpanTracerTest.java
    • Added necessary imports for ApiException, ErrorDetails, StatusCode, ImmutableList, Any, ErrorInfo, ConnectException, and TimeoutException for testing.
    • Added numerous new test methods to verify the error.type extraction and recording logic for various error scenarios, including ErrorInfo.reason, gRPC and HTTP server error codes, and different client-side exceptions.
    • Defined several private static exception classes (CredentialsException, DecodeException, RedirectException, RequestBodyException, RequestException, UnknownClientException) to facilitate testing of client-side error types.
  • java-showcase/gapic-showcase/src/test/java/com/google/showcase/v1beta1/it/ITOtelTracing.java
    • Added imports for assertThrows, StatusCode.Code, UnavailableException, and Status for integration testing.
    • Added testTracing_failedEcho_grpc_recordsErrorType to verify that the error.type attribute is correctly recorded for gRPC failures.
    • Added testTracing_failedEcho_httpjson_recordsErrorType to verify that the error.type attribute is correctly recorded for HTTP/JSON failures.
Activity
  • Pull request created by diegomarquezp.
  • Initial implementation of error.type attribute for observability.
  • Added Javadoc for the new error type logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new observability attribute error.type to provide a more specific, low-cardinality classification of errors. The core logic is implemented in ObservabilityUtils.extractErrorType, which determines the error type based on a clear priority: ErrorInfo.reason, server error codes, specific client-side exceptions, and then fallbacks. The changes are well-supported by comprehensive unit tests in SpanTracerTest and integration tests in ITOtelTracing for both gRPC and HTTP/JSON transports, which is great to see.

My review includes one suggestion for a minor code simplification. Overall, this is a solid implementation.

@diegomarquezp
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new observability attribute, error.type, to provide a low-cardinality classification of errors. The core logic is implemented in the new ErrorTypeUtil class, which extracts the error type based on a priority list, including ErrorInfo.reason, server status codes, and various client-side error categories. The changes are well-tested with both unit and integration tests.

My main feedback is regarding the implementation of client-side error detection in ErrorTypeUtil. Several checks rely on string matching on exception class names or messages (e.g., using String.contains()), which is fragile and can lead to incorrect error classification. I've left specific comments with suggestions to improve the robustness of these checks.

Comment on lines +246 to +248
private static boolean isClientRedirectError(Throwable e) {
return e.getMessage() != null && e.getMessage().contains("redirect");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Relying on the content of an exception message with e.getMessage().contains("redirect") is highly fragile. Exception messages are not a stable API; they can be changed, might not be present, or could be localized, which would break this logic.

It would be much more robust to check for a specific exception type that indicates a redirect error. If no such standard exception exists, this approach has a high risk of being unreliable and should be documented as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no redirect-specific exceptions in gax. This is only a heuristic to comply with the higher level effort.

Comment on lines +281 to +283
private static boolean isClientUnknownError(Throwable e) {
return e.getClass().getName().toLowerCase().contains("unknown");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Matching unknown in a class name is very broad and risky. It could lead to misclassifying exceptions. For example, an exception from another library with unknown in its name would be incorrectly categorized as CLIENT_UNKNOWN_ERROR.

Given that UnknownHostException is already handled by isClientConnectionError, this check seems too generic. Please consider making this check more specific to the intended exception types or removing it if it's a speculative catch-all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we don't have a reliable heuristic that also prevents external code from being caught here. In favor of the current approach, this is the last case to be handled.
One alternative: do not handle CLIENT_UNKNOWN_ERROR.

@diegomarquezp diegomarquezp changed the title impl(o11y): [wip] introduce error.type' impl(o11y): [wip] introduce error.type Mar 17, 2026
@diegomarquezp diegomarquezp changed the title impl(o11y): [wip] introduce error.type impl(o11y): introduce error.type Mar 17, 2026
@diegomarquezp diegomarquezp marked this pull request as ready for review March 17, 2026 20:02
@sonarqubecloud
Copy link

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed for 'java_showcase_integration_tests'

Failed conditions
31.6% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

if (isRequestBodyError(error)) {
return ErrorType.CLIENT_REQUEST_BODY_ERROR.toString();
}
if (isClientUnknownError(error)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was the default option, as in there shouldn't be a check here?

Copy link
Contributor Author

@diegomarquezp diegomarquezp Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the fallback sequence is "reason", then "server error" then "client error" then "exception name" then "INTERNAL".
This class is not only for "client error" but for the whole the fallback sequence, so we need the flow to check for other cases such as exception name and INTERNAL.

Copy link
Contributor

@ldetmer ldetmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just one question

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants