Skip to content

Query regarding Embedding Normalization and Similarity Score Range for PE-AV model #108

@Preet-Sojitra

Description

@Preet-Sojitra

Hi there,

I am using facebook/pe-av-large following the example code provided in the model card (using the dot product: audio_embeds @ visual_embeds.T).

I noticed that the resulting similarity scores often exceed 1.0 (e.g., I am seeing scores around 1.1). This suggests the embeddings are not L2-normalized by default.

  1. Are the embeddings intended to be used as unnormalized dot products?
  2. Is there a known range for these scores? I am trying to set a threshold to filter "good" vs "bad" pairs. Should I manually L2-normalize the embeddings to interpret them as Cosine Similarity (-1 to 1)?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions