Skip to content

bug: DocumentFileProcessor skips duplicate deletion due to missing owner_user_id #1773

@coderabbitai

Description

@coderabbitai

Summary

In DocumentFileProcessor.process_item (src/models/processors.py), the call to delete_document_by_filename does not pass owner_user_id:

await self.delete_document_by_filename(original_filename, opensearch_client)

However, delete_document_by_filename explicitly early-returns when owner_user_id is None or missing (see lines 176–181). This means that when replace_duplicates=True, the deletion step is silently skipped and stale chunks remain in the index before the new document is ingested.

Steps to Reproduce

  1. Upload a file via the DocumentFileProcessor path with replace_duplicates=True.
  2. Observe that the old document chunks are not removed from OpenSearch.

Expected Behavior

The existing document chunks should be deleted before the new document is indexed.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions