Skip to content

Fix/result embedding data encoding#243

Open
swarnaprakash wants to merge 1 commit into
valkey-io:mainfrom
swarnaprakash:fix/result-embedding-data-encoding
Open

Fix/result embedding data encoding#243
swarnaprakash wants to merge 1 commit into
valkey-io:mainfrom
swarnaprakash:fix/result-embedding-data-encoding

Conversation

@swarnaprakash
Copy link
Copy Markdown

@swarnaprakash swarnaprakash commented Nov 5, 2025

Pull Request check-list

  • [Y] Do tests and lints pass with this change? Ran linter and corrected changed lines
  • [Y ] Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)? See https://github.com/swarnaprakash/valkey-py/actions/runs/19217879094 . The failed tests are unrelated to this change
  • [Y] Is the new or changed code fully tested? added tests for the new methods and enabled it (even though existing tests don't pass)
  • [Y] Is a documentation update included (if this change modifies existing APIs, or introduces new ones)? T_he docstring is updated withe new parameters_
  • [Y] Is there an example added to the examples folder (if applicable)? NA

Description of change

See #242

The Result class in valkey/commands/search/result.py inappropriately applies UTF-8 decoding to all field values, including binary vector data. This corrupts VECTOR field embeddings and makes valkey-py unsuitable for vector search applications.

The change Adds preserve_bytes and binary_fields parameters to search methods to prevent UTF-8 decoding from corrupting VECTOR field embeddings and other binary data.
The Result class was inappropriately applying UTF-8 decoding to all field values, including binary vector embeddings. This corrupted FLOAT32 vector data and made valkey-py unsuitable for vector search applications.

Adds preserve_bytes and binary_fields parameters to search methods to prevent
UTF-8 decoding from corrupting VECTOR field embeddings and other binary data.

The Result class was inappropriately applying UTF-8 decoding to all field values,
including binary vector embeddings. This corrupted FLOAT32 vector data and made
valkey-py unsuitable for vector search applications.

Changes:
- Add preserve_bytes parameter to search() methods (default: False for backward compatibility)
- Add binary_fields parameter for selective field preservation
- Implement to_string_or_bytes() utility for conditional binary preservation
- Update Result class to handle binary preservation during field processing
- Add comprehensive tests for binary preservation functionality

The fix maintains full backward compatibility while enabling proper vector search
support when preserve_bytes=True is specified.

Fixes vector search corruption where binary embeddings were being decoded as
UTF-8 strings with 'ignore' error handling, silently dropping bytes and
corrupting the vector data.

Signed-off-by: Swarnaprakash Udayakumar <swarnap@amazon.com>
@mkmkme
Copy link
Copy Markdown
Collaborator

mkmkme commented Nov 11, 2025

@amirreza8002 is this by any chance something you got to work with?
@bogdanp05 I understand it's very unlikely you have worked with this before, but could you also have a look when you have time please?

@mkmkme
Copy link
Copy Markdown
Collaborator

mkmkme commented Nov 11, 2025

Also keep in mind approving this PR is slightly more problematic due to the fact valkey-search is not yet supported in the CI and the pytest skips that. I'll try to see if it can be easily fixed.

@swarnaprakash
Copy link
Copy Markdown
Author

Also keep in mind approving this PR is slightly more problematic due to the fact valkey-search is not yet supported in the CI and the pytest skips that. I'll try to see if it can be easily fixed.

@mkmkme I can make an attempt to fix the tests. First I want to clarify the expectation. Is valkey-py expected to be backward compatible with redis? specifically w.r.t search module is the expectation still backward compatibility with Redis search. I understand redis supports different data types (TEXT instead of TAG for example). Wondering if I can just delete/modify these or I need to keep them and make changes such that both redis search and valkey-search are supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants