It would be great to have support for embeddings compressed to Int8 as per HuggingFace: Embedding Quantization.
Potential implementation would be to:
- Define an embedder (<:AbstractEmbedder for get_embeddings) and the corresponding finder (<:AbstractSimilarityFinder for find_similar). Both would have the vectors with necessary min_values and max_values fields to hold the effective range for each embedding dimension (eg, length(min_values)=length(max_values)=D)
- Define methods for these types
- The conversion to Int8 could be done post hoc (after build_index) via a utility function and then the resulting finder with the range to allow converting to Int8 (to be provided to the airag)
- It should implement the two-stage pass with rescore_multiplier=4 (first on Int8 embeddings, then with Float x Int8)
Original issue from PromptingTools.jl: svilupp/PromptingTools.jl#118
Original issue from PromptingTools.jl: svilupp/PromptingTools.jl#118