Skip to content

Scanner improvements #208

Description

@r2dedios

Describe the solution you'd like
The scanner will no longer be a cronjob generated pod. Instead, it will be executed as a short-lived Kubernetes Job, created dynamically on demand or based on a scheduled trigger. The Jobs will be launched via the Agent, using its existing Action model, with a new action type for triggering scanner jobs. The actual scan will be performed by the same scanner container, configured via environment variables or arguments.

FR:

  • FR-1. The Agent will support a new action type: ScanAction, implementing ActionOperation.
  • FR-2. When a scan is scheduled (via API or frontend), a new action will be created in the DB and dispatched to the Agent.
  • FR-3. Upon receiving a ScanAction, the Agent will launch a Kubernetes Job in the cluster using the Scanner container image.
  • FR-4. The Job will receive the list of account_id to scan, and optionally a flag for enabling billing, via env vars or args.
  • FR-5. The Job will execute the scan and terminate after processing the data and reporting the result.
  • FR-6. The scan schedule will be persisted in the database.
  • FR-7. Scheduled scans will be managed (create, update, delete, list) via new API endpoints:
    • [POST] /scans/now — Triggers an on-demand scan. Body: list of accounts + billing flag.
    • [GET] /scans/schedule — Lists all scheduled scans.
    • [POST] /scans/schedule — Creates a new scheduled scan.
    • [PATCH] /scans/schedule/:id — Modifies an existing schedule.
    • [DELETE] /scans/schedule/:id — Deletes an existing schedule.
  • FR-8. The backend will include a lightweight scheduler (using cron library) that periodically checks which scheduled scans must be triggered. This logic will reside in the API service.
  • FR-9. The scanner Job will report the result of the scan back to the API or directly to the DB, including:
    status (success/error),
    accounts scanned,
    duration,
    scan_run_id.
  • FR-10. Rate limits and maintenance windows will be validated before launching the scan job, based on predefined configuration.

NFR:

  • NFR-1. The scanner container must be stateless, self-contained and receive configuration only through env vars or command-line args.
  • NFR-2. The scanner Jobs will be labeled with scan_run_id and other metadata to facilitate tracing and log collection.
  • NFR-3. The Agent will require RBAC permissions to create Jobs within a specified namespace.
  • NFR-4. The concurrency of scan jobs will be limited via configuration (e.g., MAX_CONCURRENT_JOBS) and enforced by the API when scheduling.
  • NFR-5. API rate-limiting policies per cloud provider must be respected. Throttling logic may be implemented at the backend scheduler level.
  • NFR-6. Maintenance windows will be stored in the DB and consulted before launching any scan job.
  • [ ]

Additional considerations

  • 1. Review and update the compose files used for development and for CI/CD
  • 2. In order to keep the schedule persistent, it will be stored on the DB. Create the necessary DB tables to save the scanner schedule. Proposal:
    -- Scheduled scans
    CREATE TABLE IF NOT EXISTS schedule (
      id BIGINT GENERATED ALWAYS AS IDENTITY NOT NULL,
      time TIMESTAMP WITH TIME ZONE,
      cron_exp TEXT,
      target_accounts TEXT REFERENCES acounts(id) ON DELETE CASCADE,
      status TEXT REFERENCES action_status(name)
    );
    

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requesthelp wantedExtra attention is needed

Fields

No fields configured for Feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions