Skip to content

Show structured stage-by-stage progress and taint analysis load during scan #137

@misonijnik

Description

@misonijnik

Summary

Users running opentaint scan have no visibility into what the tool is doing beyond two opaque spinners — "Compiling project model" and "Analyzing project." The scan should show structured stage-by-stage progress (compiling, IR loading, rules loading, analysis, trace reproducing, report generation) so users can tell where the tool is in the pipeline, and a live htop-style taint-analysis loading indicator (fact and module counts displayed as per-module progress bars) so users can gauge analysis workload and progress in real time.

Problem

Today the CLI wraps compilation and analysis each in a single spinner that shows only elapsed time and an animation. Everything that happens inside — bytecode loading, rules preloading, IFDS taint analysis, symbolic-execution verification, SARIF generation — is invisible to the user unless they dig through log files.

This creates friction in several situations:

  • Long-running scans feel stuck. On large projects the "Analyzing project" spinner can run for many minutes with no indication of progress. Users cannot tell whether the tool is loading bytecode, running the taint analysis, or reproducing traces. They have no way to know how far along the process is or whether it is making progress at all.

  • No way to identify slow phases. When a scan takes longer than expected, users have no information about which stage is the bottleneck. Was it IR loading? The IFDS analysis itself? Trace reproduction? Without stage-level timing, troubleshooting performance is guesswork.

  • No visibility into analysis workload. The taint analysis processes facts across project modules, but the user sees nothing about the scale of work being done. There is no indication of how many modules are loaded, how many facts are being propagated, or how the analysis load is distributed. A live loading indicator — similar to htop's per-core CPU bars — would give users a real-time sense of the analysis scope, distribution, and progress.

  • AI-agent and automation workflows suffer. Agents monitoring a scan have no structured signal about the current phase. They cannot make informed decisions about timeouts, retries, or resource allocation because the only observable output is a spinner animation and elapsed time.

  • Post-scan timing is absent. After the scan completes, users see a summary of findings but no breakdown of time spent per stage. This makes it difficult to compare scan performance across runs or report bottlenecks.

The internal analyzer already logs stage transitions ("Start IFDS analysis...", "Starting entry point selection", etc.) and tracks module and fact counts, but none of this information reaches the interactive CLI output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions