Summary
In DocumentFileProcessor.process_item (src/models/processors.py), the call to delete_document_by_filename does not pass owner_user_id:
await self.delete_document_by_filename(original_filename, opensearch_client)
However, delete_document_by_filename explicitly early-returns when owner_user_id is None or missing (see lines 176–181). This means that when replace_duplicates=True, the deletion step is silently skipped and stale chunks remain in the index before the new document is ingested.
Steps to Reproduce
- Upload a file via the
DocumentFileProcessor path with replace_duplicates=True.
- Observe that the old document chunks are not removed from OpenSearch.
Expected Behavior
The existing document chunks should be deleted before the new document is indexed.
Related
Summary
In
DocumentFileProcessor.process_item(src/models/processors.py), the call todelete_document_by_filenamedoes not passowner_user_id:However,
delete_document_by_filenameexplicitly early-returns whenowner_user_idisNoneor missing (see lines 176–181). This means that whenreplace_duplicates=True, the deletion step is silently skipped and stale chunks remain in the index before the new document is ingested.Steps to Reproduce
DocumentFileProcessorpath withreplace_duplicates=True.Expected Behavior
The existing document chunks should be deleted before the new document is indexed.
Related