Skip to content

[Feature Request] Support Uint8Array in EmbedResponse Schema for Native Binary Vector Quantization #5048

@CatoPaus

Description

@CatoPaus

Is your feature request related to a problem? Please describe. I am deeply frustrated by the severe memory and bandwidth inefficiencies of storing high-dimensional vector embeddings, specifically when utilizing modern Polar Quantization algorithms. Currently, Genkit's @genkit-ai/ai core types strictly restrict the EmbedResponse schema to expect a number[] array.

This means even if a developer builds a middleware adapter to mathematically crash high-precision Float32 embeddings down into 1-bit logic (0 and 1), the Genkit schema forces us to transport and serialize these bits as standard Javascript 64-bit JSON floats. This entirely destroys the architectural memory benefits of quantization, yielding a 0% compression rate in transit and storage, despite the math being natively viable.

Describe the solution you'd like I am requesting that the Genkit core architecture open up to officially support Native Binary Vector Transports.

1. Schema Expansion: The EmbedResponse and Document embedding types should accept explicitly packed binary encodings (like Uint8Array or standard Int8 representations) in addition to number[].

2. Retriever Adapters: Update or introduce Retriever plugins for databases that natively support Hamming Distance indexing (e.g., Qdrant, Milvus, Pinecone Serverless) to successfully map and ingress these packed byte arrays out of the box.
This would allow Genkit developers to achieve a literal 32x reduction in vector database storage costs and network bandwidth when executing semantic retrieval algorithms.

Describe alternatives you've considered I considered manually Base64-encoding the packed bytes before passing them into Genkit, but the Typescript Zod validators immediately reject anything that isn't a strict numeric array. As an alternative, I'm currently forced to submit raw Javascript arrays of zeroes and ones ([0, 1, 0, 0, 1]...), which falsely inflates the payload back up to massive Float-scale sizes inside the database.

Additional context I have already architected an experimental Request for Comment (RFC) Repository demonstrating how the Native evaluation logic for this behaves.

I successfully built a custom node-addon-api C++ Hardware Quantizer that losslessly intercepts Native Genkit Float32 Arrays and dynamically evaluates their sign-bits in the OS hardware tier (preventing devastating Node.js V8 Garbage Collection lag). The math and Genkit semantic retrieval logic operates flawlessly. You can view the live implementation and the Next.js framework bypassing techniques here:

Proof of Concept Repository: https://github.com/CatoPaus/genkit-turboquant-embedder

If the Genkit team exposes the necessary Schema structure, this repository is ready to natively pipe the resulting Bit-Packed binaries straight to the cloud.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions