Skip to content

Add --until and --sort sweep flags to analyze/label#40

Merged
ashleyzhang01 merged 13 commits into
mainfrom
add-analyze-until-flag
May 23, 2026
Merged

Add --until and --sort sweep flags to analyze/label#40
ashleyzhang01 merged 13 commits into
mainfrom
add-analyze-until-flag

Conversation

@ashleyzhang01

Copy link
Copy Markdown
Contributor

Add --until flag to analyze/label for backfilling a specific date range (e.g. a missed day). Add --sort sweep mode that orders by assembled_at (analyze) or analyzed_at (label) DESC instead of bot_reviewed_at. Since we only analyze merged PRs, ones that merge long after discovery fall outside the --since window of the primary run; the sweep pass catches these stragglers in addition to the freshness-first processing.

Allows targeting a specific date range when re-running analyze/label,
which is needed for incident-recovery backfills (e.g. analyzing only
2026-04-18 without first burning budget on newer days that the DESC-
ordered query would walk first).

- New BOUNDED query variants in queries.py with nullable since/until
  bounds; existing SINCE / no-bound queries kept as fast paths.
- repository.get_assembled_not_analyzed and get_analyzed_not_labeled
  gain an `until` param and route to the bounded queries when set.
- pipeline.analyze.analyze_prs and pipeline.label.label_prs thread
  `until` through to the repository.
- main.py exposes --until on analyze + label, parsed identically to
  --since (relative "Nd" or absolute date). Refactored the parsing
  into a shared _parse_time_bound helper.
- connection._translate_params strips PG-only ::type casts when
  running against SQLite, so the bounded queries work in both
  backends without duplication.
- Tests cover since-only, until-only, since+until, and the
  all-chatbots variant.

Semantics: --since is inclusive, --until is exclusive, so
  --since 2026-04-18 --until 2026-04-19  ->  just 2026-04-18.
Made-with: Cursor
asyncpg requires datetime objects for timestamptz parameters, and
_coerce_args only converts strings that match the full ISO regex
(YYYY-MM-DDThh:mm:ss). Bare dates like "2026-04-18" passed straight
through and triggered:

  asyncpg.exceptions.DataError: invalid input for query argument $2:
    '2026-04-18' (expected a datetime.date or datetime.datetime
    instance, got 'str')

Fix in _parse_time_bound: detect a bare YYYY-MM-DD and expand to
midnight UTC. Affects both --since and --until on analyze and label.

This also resolves a latent bug: the original --since handler had
the same problem, but it was only ever invoked with the relative
"Nd" form (which produces a full ISO timestamp), so nobody hit it.

Made-with: Cursor
default is `--sort reviewed` by when bot reviewed at desc. new `--sort sweep` sorts by assembled at desc for analyze and analyzed at desc for label. this catches straggler PRs that were discovered/processed late
default is `--sort reviewed` by when bot reviewed at desc. new `--sort sweep` sorts by assembled at desc for analyze and analyzed at desc for label. this catches straggler PRs that were discovered/processed late

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the DB-backed analyze and label CLI subcommands to support bounded backfills and an alternate “sweep” prioritization order, aimed at catching PRs that become eligible long after their original discovery/review timestamp.

Changes:

  • Added --until (exclusive upper bound) to analyze/label, including a shared _parse_time_bound helper to normalize CLI time inputs (relative days, bare dates).
  • Added --sort {reviewed,sweep} to analyze/label, and introduced new SQL query variants to support sweep ordering.
  • Added/extended SQLite integration tests covering until bounds and sweep ordering behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
online/etl/tests/test_repository.py Adds integration tests for until bounds and analyze sweep ordering.
online/etl/tests/test_main.py Adds unit tests for the new _parse_time_bound CLI helper.
online/etl/pipeline/label.py Plumbs until/sort_by into the labeling pipeline call to the repository.
online/etl/pipeline/analyze.py Plumbs until/sort_by into the analysis pipeline call to the repository.
online/etl/main.py Adds --until/--sort flags to CLI and centralizes time-bound parsing.
online/etl/db/repository.py Extends repository query APIs with until + sort_by branching.
online/etl/db/queries.py Adds bounded and sweep-ordered SQL query variants for analyze/label selection.
online/etl/db/connection.py Updates SQLite param translation to strip Postgres ::type casts used in new bounded queries.
Comments suppressed due to low confidence (1)

online/etl/db/repository.py:334

  • sort_by == "sweep" bypasses the since/until bounds here too, meaning label --since/--until --sort sweep will label PRs outside the requested reviewed-at window. Align the sweep branch with the bounded logic (e.g., add bounded+sorted queries ordered by p.analyzed_at DESC while still filtering on p.bot_reviewed_at).
        if sort_by == "sweep":
            if chatbot_id is not None:
                return await self.db.fetchall(
                    q.GET_ANALYZED_NOT_LABELED_BY_ASSEMBLED, (chatbot_id, limit)
                )
            return await self.db.fetchall(
                q.GET_ALL_ANALYZED_NOT_LABELED_BY_ASSEMBLED, (limit,)
            )
        if until is not None:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread online/etl/db/repository.py
Comment thread online/etl/db/queries.py Outdated
Comment thread online/etl/db/queries.py
@ashleyzhang01 ashleyzhang01 merged commit 279f279 into main May 23, 2026
2 checks passed

@christama christama left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants