Skip to content

configurable artifact retention with background cleanup and dry-run CLI#475

Open
ishitaajain22-tech wants to merge 6 commits into
utksh1:mainfrom
ishitaajain22-tech:main
Open

configurable artifact retention with background cleanup and dry-run CLI#475
ishitaajain22-tech wants to merge 6 commits into
utksh1:mainfrom
ishitaajain22-tech:main

Conversation

@ishitaajain22-tech
Copy link
Copy Markdown

@ishitaajain22-tech ishitaajain22-tech commented Jun 2, 2026

Description

Implements production-grade artifact retention for scan tasks and their raw output files.

  • Configurable cleanup by age (--max-age-days), count (--keep-statuses), and status guard
  • Background cleanup loop wired into the FastAPI lifespan (mirrors existing WorkflowScheduler pattern)
  • One-shot dry-run CLI command (python -m backend.secuscan.cli cleanup --dry-run) that reports what would be removed without touching the DB or disk
  • Audit log entry (retention_purge) written for every real deletion
  • Child rows (findings, reports, audit_log) are explicitly removed before the task row
  • .env.example restored and extended with commented-out retention knobs — all policies default to 0 (disabled), so existing deployments are unaffected

Related Issues

Closes #217

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

27 new unit tests added in testing/backend/unit/test_retention.py.

Run the new tests:

$env:SECUSCAN_VAULT_KEY="test"
python -m pytest testing/backend/unit/test_retention.py -v
# Expected: 27 passed

Run full unit suite to confirm no regressions:

python -m pytest testing/backend/unit/ -v
# Expected: all previously passing tests still pass

Manually test dry-run CLI:

python -m backend.secuscan.cli cleanup --max-age-days 30 --dry-run
# Expected:
# [DRY-RUN] Tasks would be removed: 0
# [DRY-RUN] Files  would be removed: 0

Test coverage includes:

  • Dry-run: no DB writes, correct counts reported, no audit entries written
  • Age threshold: removes old tasks, respects cutoff boundary, disabled when 0
  • Count threshold: keeps newest N tasks, no-op when within limit, disabled when 0
  • keep_statuses guard: running and queued never purged regardless of policy
  • File deletion: existing raw output file removed; missing file does not raise
  • Failed deletion: error captured in result.errors, cleanup continues for remaining tasks
  • Audit entries: retention_purge written per deleted task; not written on dry-run
  • DB references: findings and reports deleted with the task; surviving task's children untouched
  • Scheduler lifecycle: start/stop, idempotent double-start, stop-before-start safe, tick error tolerance

Checklist

  • My code follows the code style of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.

@ishitaajain22-tech ishitaajain22-tech changed the title yay feat(retention): configurable artifact retention with background cleanup and dry-run CLI Jun 2, 2026
@ishitaajain22-tech ishitaajain22-tech changed the title feat(retention): configurable artifact retention with background cleanup and dry-run CLI configurable artifact retention with background cleanup and dry-run CLI Jun 2, 2026
@utksh1 utksh1 added level:advanced 55 pts difficulty label for advanced contributor PRs type:devops DevOps or infrastructure work category bonus label type:feature Feature work category bonus label area:backend Backend API, database, or service work labels Jun 2, 2026
Copy link
Copy Markdown
Owner

@utksh1 utksh1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes. The retention idea is useful, but this PR is not safe to merge in its current form. CI formatting-hygiene is failing, and the diff removes existing application wiring: API key startup initialization is removed, the saved views router is no longer included, denied capability settings are removed, and parser sandbox settings are replaced by retention settings. Those are unrelated regressions. Please rebase on current main, keep the retention change scoped, preserve existing auth/saved-view/parser-sandbox configuration, and get formatting plus backend tests green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:backend Backend API, database, or service work level:advanced 55 pts difficulty label for advanced contributor PRs type:devops DevOps or infrastructure work category bonus label type:feature Feature work category bonus label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BACKEND] Add scan artifact retention and cleanup policies

2 participants