Skip to content

perf: connector_health_report parses 14 days of JSONL on every report #80

Description

@jordanrburger

Problem

load_records() glob-scans every connector-calls-*.jsonl file, opens each, JSON-decodes every line, builds a datetime for _ts, and filters by the 14-day cutoff — every time the report runs. The data is append-only, so reparsing finalized days is pure waste.

Evidence

  • engine/scout/scripts/connector_health_report.py:79-113
for path in sorted(glob.glob(f"{log_dir}/connector-calls-*.jsonl")):
    try:
        with open(path, encoding="utf-8") as f:
            for line in f:
                try:
                    r = json.loads(line)
                ...

Impact

Scout sessions can produce hundreds of tool-call rows per day. Over a 14-day window this is thousands of json.loads calls per report. Not as hot as the per-tick paths, but the workload is 100% redundant.

Suggested fix

Pre-aggregate to a per-day summary file (connector-summary-YYYY-MM-DD.json) after each day rolls over. The report reads daily summaries for past days and tails only the current day's raw JSONL. Reduces work from O(window_days × rows) to O(window_days + today_rows).

Metadata

Metadata

Assignees

No one assigned

    Labels

    auditFrom engine auditperformanceCPU/memory/wall-clock optimizationseverity:mediumCorrectness gap or edge case

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions