Future Worker Improvements (Before Prod)

1. Add Retry Logic
   - Implement retry count per job (e.g., max_retries = 3–5).
   - Track retry attempts in Redis alongside the job.
   - Apply exponential backoff between retries.
   - Move job to a dead-letter queue (DLQ) after max retries.

2. Dead-Letter Queue (DLQ)
   - Create a Redis-backed DLQ list (e.g., "jobs:dead").
   - Store failed job metadata + error reason.
   - Add an API endpoint or admin page to inspect DLQ items.
   - Allow manual requeue of DLQ jobs.

3. Improved Error Handling
   - Standardized exception types for unzip, file IO, KM upload, etc.
   - Structured error logging with correlation IDs.
   - More granular job status values (e.g., "unzip_failed", "upload_failed").

4. Worker Scaling
   - Enable multiple worker replicas consuming from the same Redis queue.
   - Ensure job processing is fully idempotent.
   - Add a Redis-based distributed lock if needed for shared-state operations.

5. Progress Reporting
   - Report intermediate steps to Redis:
        pending → unzipping → uploading → indexing → completed
   - Include % progress estimate for multi-file uploads.
   - Provide progress messages the frontend can display.

6. File Validation & Security
   - Validate supported file types before unzipping.
   - Reject archives containing > X files or unsupported formats.
   - Sanitize or reject dangerous paths (zip-slip prevention).
   - Enforce file size limits on extracted content.

7. Performance Optimizations
   - Stream KM uploads instead of loading entire files into memory.
   - Optional background cleanup of extracted temp files.
   - Pre-check available disk space.
   - Consider batching KM ingestion if many small files are uploaded.

8. Observability & Monitoring
   - Add structured logs with jobId + userId correlation.
   - Expose Prometheus metrics (jobs processed, failures, retry counts, queue lag).
   - Dashboards for worker health.

9. Worker Shutdown Safety
   - Graceful shutdown signals (SIGTERM/SIGINT).
   - Finish current job before terminating.
   - Requeue unprocessed/partial jobs safely.

10. Config Improvements
   - Externalize worker config (timeouts, KM URL, max zip size, retry logic).
   - Use strong typing + validation for configuration.

11. Multi-File Upload Enhancements
   - Allow very large multi-file zips.
   - Allow user-configurable tags per file.
   - Deduplicate files before uploading to KM.

12. Admin / Diagnostics Tools
   - Add “requeue all failed jobs” endpoint.
   - KM upload diagnostics (timings, latency).
   - Worker self-test endpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future Worker Improvements (Before Prod) #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Future Worker Improvements (Before Prod) #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions