fix: implement Vertex AI evaluation API calls#623
fix: implement Vertex AI evaluation API calls#623MarvelNwachukwu wants to merge 2 commits intomainfrom
Conversation
Replace Math.random() stubs in VertexAiEvalFacade with real calls to the Vertex AI evaluateInstances REST API. Adds google-auth-library for ADC token acquisition. Maps RESPONSE_EVALUATION_SCORE and SAFETY_V1 metrics to correct API request/response formats.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request transitions the Vertex AI evaluation facade from using mocked responses to making real API calls. This change enables accurate and live evaluation of metrics by integrating with the Vertex AI Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request successfully replaces the mock implementation in VertexAiEvalFacade with actual calls to the Vertex AI evaluateInstances API. The changes include adding google-auth-library for authentication, constructing and sending requests to the API, and parsing the responses. My review includes a few suggestions to improve maintainability, security, and type safety. Specifically, I've pointed out an opportunity to refactor the metric key mapping, a potential security risk related to error handling of API calls, and a chance to improve type safety by avoiding any.
| const response = await axios.post(url, requestBody, { | ||
| headers: { | ||
| Authorization: `Bearer ${accessToken}`, | ||
| "Content-Type": "application/json", | ||
| }, | ||
| }); |
There was a problem hiding this comment.
The axios.post call is not wrapped in a specific error handler within _performEval. If the API call fails, axios throws an AxiosError which contains the full request configuration, including the Authorization header with the bearer token. This error is caught by the generic catch block in the evaluate method, and if logged, could expose the sensitive access token. This is a security risk.
It is highly recommended to wrap the API call in a try...catch block to handle AxiosError specifically, log sanitized error information, and re-throw a generic error to prevent token leakage.
Example:
let response;
try {
response = await axios.post(url, requestBody, {
headers: {
Authorization: `Bearer ${accessToken}`,
"Content-Type": "application/json",
},
});
} catch (error) {
if (axios.isAxiosError(error)) {
console.error(
`Vertex AI API Error: ${error.message}`,
error.response?.data ? JSON.stringify(error.response.data) : ""
);
throw new Error("Vertex AI API request failed.");
}
throw error;
}| function getMetricInputKey(metric: PrebuiltMetrics): string { | ||
| switch (metric) { | ||
| case PrebuiltMetrics.RESPONSE_EVALUATION_SCORE: | ||
| return "coherence_input"; | ||
| case PrebuiltMetrics.SAFETY_V1: | ||
| return "safety_input"; | ||
| default: | ||
| throw new Error( | ||
| `Metric "${metric}" is not supported by Vertex AI evaluation.`, | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Maps a PrebuiltMetrics value to the Vertex AI evaluateInstances API result key. | ||
| */ | ||
| function getMetricResultKey(metric: PrebuiltMetrics): string { | ||
| switch (metric) { | ||
| case PrebuiltMetrics.RESPONSE_EVALUATION_SCORE: | ||
| return "coherenceResult"; | ||
| case PrebuiltMetrics.SAFETY_V1: | ||
| return "safetyResult"; | ||
| default: | ||
| throw new Error( | ||
| `Metric "${metric}" is not supported by Vertex AI evaluation.`, | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
The functions getMetricInputKey and getMetricResultKey have very similar logic and can be consolidated to reduce code duplication and improve maintainability. By using a single mapping object and a lookup function, the code becomes more concise and easier to extend with new metrics.
After applying this suggestion, you will need to update the _performEval method to use the new getMetricKeys function, like this:
const { inputKey, resultKey } = getMetricKeys(metric);const VERTEX_AI_METRIC_KEYS = {
[PrebuiltMetrics.RESPONSE_EVALUATION_SCORE]: {
inputKey: "coherence_input",
resultKey: "coherenceResult",
},
[PrebuiltMetrics.SAFETY_V1]: {
inputKey: "safety_input",
resultKey: "safetyResult",
},
} as const;
/**
* Maps a PrebuiltMetrics value to the Vertex AI evaluateInstances API keys.
*/
function getMetricKeys(metric: PrebuiltMetrics): {
inputKey: string;
resultKey: string;
} {
const keys =
VERTEX_AI_METRIC_KEYS[
metric as keyof typeof VERTEX_AI_METRIC_KEYS
];
if (!keys) {
throw new Error(
'Metric "' + metric + '" is not supported by Vertex AI evaluation.',
);
}
return keys;
}| metrics: PrebuiltMetrics[], | ||
| evalCase: { prompt: string; reference: string; response: string }, | ||
| metric: PrebuiltMetrics, | ||
| ): Promise<any> { |
There was a problem hiding this comment.
The _performEval method currently returns Promise<any>, which weakens type safety. Using a specific type for the returned promise improves code clarity, enables better autocompletion, and helps prevent potential runtime errors. You can use an inline type as suggested, or define a separate interface for the return type for better reusability.
| ): Promise<any> { | |
| ): Promise<{ summaryMetrics: Array<{ meanScore: number | undefined }> }> { |
getAccessToken() can return null when credentials aren't configured, which would send "Bearer null" as the auth header causing a confusing 401. Now throws a clear error directing users to configure ADC.
Description
The Vertex AI evaluation system was returning fake random scores instead of actually calling Google's evaluation API. This meant safety checks and response quality scores were completely meaningless — every evaluation just got a random number between 0.5 and 1.0, regardless of the actual content.
This fix connects the evaluation system to the real Vertex AI API so scores now reflect actual response quality and safety.
Type of Change
How Has This Been Tested?
All existing 478 tests pass. Build succeeds with no type errors.
Checklist