Skip to content

fix: implement Vertex AI evaluation API calls#623

Open
MarvelNwachukwu wants to merge 2 commits intomainfrom
fix-vertex-eval-random
Open

fix: implement Vertex AI evaluation API calls#623
MarvelNwachukwu wants to merge 2 commits intomainfrom
fix-vertex-eval-random

Conversation

@MarvelNwachukwu
Copy link
Contributor

@MarvelNwachukwu MarvelNwachukwu commented Mar 19, 2026

Description

The Vertex AI evaluation system was returning fake random scores instead of actually calling Google's evaluation API. This meant safety checks and response quality scores were completely meaningless — every evaluation just got a random number between 0.5 and 1.0, regardless of the actual content.

This fix connects the evaluation system to the real Vertex AI API so scores now reflect actual response quality and safety.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

How Has This Been Tested?

All existing 478 tests pass. Build succeeds with no type errors.

Checklist

  • My code follows the code style of this project
  • All new and existing tests passed
  • My changes generate no new warnings

Replace Math.random() stubs in VertexAiEvalFacade with real calls to the
Vertex AI evaluateInstances REST API. Adds google-auth-library for ADC token
acquisition. Maps RESPONSE_EVALUATION_SCORE and SAFETY_V1 metrics to correct
API request/response formats.
@vercel
Copy link
Contributor

vercel bot commented Mar 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
adk-typescript-docs Building Building Preview Mar 19, 2026 3:24pm
adk-web Building Building Preview Mar 19, 2026 3:24pm

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Mar 19, 2026

⚠️ No Changeset found

Latest commit: 4d72a3e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request transitions the Vertex AI evaluation facade from using mocked responses to making real API calls. This change enables accurate and live evaluation of metrics by integrating with the Vertex AI evaluateInstances endpoint, ensuring that the system provides genuine evaluation scores rather than simulated ones.

Highlights

  • Vertex AI API Integration: Replaced Math.random() stubs with actual calls to the Vertex AI evaluateInstances REST API for evaluation metrics.
  • Authentication: Integrated google-auth-library to acquire Application Default Credentials (ADC) tokens for secure API access.
  • Metric Mapping: Implemented proper request/response mapping for RESPONSE_EVALUATION_SCORE and SAFETY_V1 metrics when interacting with the Vertex AI API.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully replaces the mock implementation in VertexAiEvalFacade with actual calls to the Vertex AI evaluateInstances API. The changes include adding google-auth-library for authentication, constructing and sending requests to the API, and parsing the responses. My review includes a few suggestions to improve maintainability, security, and type safety. Specifically, I've pointed out an opportunity to refactor the metric key mapping, a potential security risk related to error handling of API calls, and a chance to improve type safety by avoiding any.

Comment on lines +204 to +209
const response = await axios.post(url, requestBody, {
headers: {
Authorization: `Bearer ${accessToken}`,
"Content-Type": "application/json",
},
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The axios.post call is not wrapped in a specific error handler within _performEval. If the API call fails, axios throws an AxiosError which contains the full request configuration, including the Authorization header with the bearer token. This error is caught by the generic catch block in the evaluate method, and if logged, could expose the sensitive access token. This is a security risk.

It is highly recommended to wrap the API call in a try...catch block to handle AxiosError specifically, log sanitized error information, and re-throw a generic error to prevent token leakage.

Example:

let response;
try {
    response = await axios.post(url, requestBody, {
        headers: {
            Authorization: `Bearer ${accessToken}`,
            "Content-Type": "application/json",
        },
    });
} catch (error) {
    if (axios.isAxiosError(error)) {
        console.error(
            `Vertex AI API Error: ${error.message}`,
            error.response?.data ? JSON.stringify(error.response.data) : ""
        );
        throw new Error("Vertex AI API request failed.");
    }
    throw error;
}

Comment on lines +35 to +62
function getMetricInputKey(metric: PrebuiltMetrics): string {
switch (metric) {
case PrebuiltMetrics.RESPONSE_EVALUATION_SCORE:
return "coherence_input";
case PrebuiltMetrics.SAFETY_V1:
return "safety_input";
default:
throw new Error(
`Metric "${metric}" is not supported by Vertex AI evaluation.`,
);
}
}

/**
* Maps a PrebuiltMetrics value to the Vertex AI evaluateInstances API result key.
*/
function getMetricResultKey(metric: PrebuiltMetrics): string {
switch (metric) {
case PrebuiltMetrics.RESPONSE_EVALUATION_SCORE:
return "coherenceResult";
case PrebuiltMetrics.SAFETY_V1:
return "safetyResult";
default:
throw new Error(
`Metric "${metric}" is not supported by Vertex AI evaluation.`,
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The functions getMetricInputKey and getMetricResultKey have very similar logic and can be consolidated to reduce code duplication and improve maintainability. By using a single mapping object and a lookup function, the code becomes more concise and easier to extend with new metrics.

After applying this suggestion, you will need to update the _performEval method to use the new getMetricKeys function, like this:

const { inputKey, resultKey } = getMetricKeys(metric);
const VERTEX_AI_METRIC_KEYS = {
	[PrebuiltMetrics.RESPONSE_EVALUATION_SCORE]: {
		inputKey: "coherence_input",
		resultKey: "coherenceResult",
	},
	[PrebuiltMetrics.SAFETY_V1]: {
		inputKey: "safety_input",
		resultKey: "safetyResult",
	},
} as const;

/**
 * Maps a PrebuiltMetrics value to the Vertex AI evaluateInstances API keys.
 */
function getMetricKeys(metric: PrebuiltMetrics): {
	inputKey: string;
	resultKey: string;
} {
	const keys =
		VERTEX_AI_METRIC_KEYS[
			metric as keyof typeof VERTEX_AI_METRIC_KEYS
		];
	if (!keys) {
		throw new Error(
			'Metric "' + metric + '" is not supported by Vertex AI evaluation.',
		);
	}
	return keys;
}

metrics: PrebuiltMetrics[],
evalCase: { prompt: string; reference: string; response: string },
metric: PrebuiltMetrics,
): Promise<any> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _performEval method currently returns Promise<any>, which weakens type safety. Using a specific type for the returned promise improves code clarity, enables better autocompletion, and helps prevent potential runtime errors. You can use an inline type as suggested, or define a separate interface for the return type for better reusability.

Suggested change
): Promise<any> {
): Promise<{ summaryMetrics: Array<{ meanScore: number | undefined }> }> {

getAccessToken() can return null when credentials aren't configured,
which would send "Bearer null" as the auth header causing a confusing
401. Now throws a clear error directing users to configure ADC.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant