Description
Add Retry Logic
Implement retry count per job (e.g., max_retries = 3–5).
Track retry attempts in Redis alongside the job.
Apply exponential backoff between retries.
Move job to a dead-letter queue (DLQ) after max retries.
Dead-Letter Queue (DLQ)
Create a Redis-backed DLQ list (e.g., "jobs:dead").
Store failed job metadata + error reason.
Add an API endpoint or admin page to inspect DLQ items.
Allow manual requeue of DLQ jobs.
Improved Error Handling
Standardized exception types for unzip, file IO, KM upload, etc.
Structured error logging with correlation IDs.
More granular job status values (e.g., "unzip_failed", "upload_failed").
Worker Scaling
Enable multiple worker replicas consuming from the same Redis queue.
Ensure job processing is fully idempotent.
Add a Redis-based distributed lock if needed for shared-state operations.
Progress Reporting
Report intermediate steps to Redis:
pending → unzipping → uploading → indexing → completed
Include % progress estimate for multi-file uploads.
Provide progress messages the frontend can display.
File Validation & Security
Validate supported file types before unzipping.
Reject archives containing > X files or unsupported formats.
Sanitize or reject dangerous paths (zip-slip prevention).
Enforce file size limits on extracted content.
Performance Optimizations
Stream KM uploads instead of loading entire files into memory.
Optional background cleanup of extracted temp files.
Pre-check available disk space.
Consider batching KM ingestion if many small files are uploaded.
Observability & Monitoring
Add structured logs with jobId + userId correlation.
Expose Prometheus metrics (jobs processed, failures, retry counts, queue lag).
Dashboards for worker health.
Worker Shutdown Safety
Graceful shutdown signals (SIGTERM/SIGINT).
Finish current job before terminating.
Requeue unprocessed/partial jobs safely.
Config Improvements
Externalize worker config (timeouts, KM URL, max zip size, retry logic).
Use strong typing + validation for configuration.
Multi-File Upload Enhancements
Allow very large multi-file zips.
Allow user-configurable tags per file.
Deduplicate files before uploading to KM.
Admin / Diagnostics Tools
Add “requeue all failed jobs” endpoint.
KM upload diagnostics (timings, latency).
Worker self-test endpoint.
Reactions are currently unavailable
You can’t perform that action at this time.
Add Retry Logic
Dead-Letter Queue (DLQ)
Improved Error Handling
Worker Scaling
Progress Reporting
pending → unzipping → uploading → indexing → completed
File Validation & Security
Performance Optimizations
Observability & Monitoring
Worker Shutdown Safety
Config Improvements