Optimize mixed-batch uploads: filter known duplicates from bulk-upsert and bucket recalculation

## Overview

For mixed-batch uploads (payloads containing both new and duplicate messages), known duplicates are still included in the bulk-upsert path and may still trigger bucket recalculation in `/api/upload-stats`. This is a follow-up optimization to PR #42, which fixed the all-duplicate timeout regression.

## Problem

When a batch contains some new and some duplicate messages:
- `messagesForDb` (which includes known duplicates) is passed to the bulk-upsert loop unchanged.
- `affectedBuckets` is populated for all messages, including those that are duplicates, so bucket aggregation runs for buckets that have no new data.

## Proposed Fix

After the duplicate detection query, filter `messagesForDb` to only include genuinely new messages:

```ts
const existingHashSet = new Set(existingMessages.map((m) => m.globalHash));
const newMessagesForDb = messagesForDb.filter(
  (m) => !existingHashSet.has(m.globalHash)
);
const duplicateCount = messagesForDb.length - newMessagesForDb.length;
```

Then use `newMessagesForDb` for all subsequent upsert and recalculation steps (and update timing/metrics accordingly), so duplicates are never sent to the DB or cause unnecessary bucket recalculations.

## References

- PR #42 (hotfix for all-duplicate timeout): https://github.com/Piebald-AI/splitrail-cloud/pull/42
- Review comment: https://github.com/Piebald-AI/splitrail-cloud/pull/42#discussion_r2961106267
- Requested by @mike1858

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize mixed-batch uploads: filter known duplicates from bulk-upsert and bucket recalculation #43

Overview

Problem

Proposed Fix

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize mixed-batch uploads: filter known duplicates from bulk-upsert and bucket recalculation #43

Description

Overview

Problem

Proposed Fix

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions