Query regarding Embedding Normalization and Similarity Score Range for PE-AV model

Hi there,

I am using [facebook/pe-av-large](https://huggingface.co/facebook/pe-av-large) following the example code provided in the model card (using the dot product: `audio_embeds @ visual_embeds.T`).

I noticed that the resulting similarity scores often exceed 1.0 (e.g., I am seeing scores around 1.1). This suggests the embeddings are not L2-normalized by default.

1. Are the embeddings intended to be used as unnormalized dot products?
2. Is there a known range for these scores? I am trying to set a threshold to filter "good" vs "bad" pairs. Should I manually L2-normalize the embeddings to interpret them as Cosine Similarity (-1 to 1)?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query regarding Embedding Normalization and Similarity Score Range for PE-AV model #108

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query regarding Embedding Normalization and Similarity Score Range for PE-AV model #108

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions