Add --until and --sort sweep flags to analyze/label#40
Merged
Conversation
Allows targeting a specific date range when re-running analyze/label, which is needed for incident-recovery backfills (e.g. analyzing only 2026-04-18 without first burning budget on newer days that the DESC- ordered query would walk first). - New BOUNDED query variants in queries.py with nullable since/until bounds; existing SINCE / no-bound queries kept as fast paths. - repository.get_assembled_not_analyzed and get_analyzed_not_labeled gain an `until` param and route to the bounded queries when set. - pipeline.analyze.analyze_prs and pipeline.label.label_prs thread `until` through to the repository. - main.py exposes --until on analyze + label, parsed identically to --since (relative "Nd" or absolute date). Refactored the parsing into a shared _parse_time_bound helper. - connection._translate_params strips PG-only ::type casts when running against SQLite, so the bounded queries work in both backends without duplication. - Tests cover since-only, until-only, since+until, and the all-chatbots variant. Semantics: --since is inclusive, --until is exclusive, so --since 2026-04-18 --until 2026-04-19 -> just 2026-04-18. Made-with: Cursor
asyncpg requires datetime objects for timestamptz parameters, and
_coerce_args only converts strings that match the full ISO regex
(YYYY-MM-DDThh:mm:ss). Bare dates like "2026-04-18" passed straight
through and triggered:
asyncpg.exceptions.DataError: invalid input for query argument $2:
'2026-04-18' (expected a datetime.date or datetime.datetime
instance, got 'str')
Fix in _parse_time_bound: detect a bare YYYY-MM-DD and expand to
midnight UTC. Affects both --since and --until on analyze and label.
This also resolves a latent bug: the original --since handler had
the same problem, but it was only ever invoked with the relative
"Nd" form (which produces a full ISO timestamp), so nobody hit it.
Made-with: Cursor
default is `--sort reviewed` by when bot reviewed at desc. new `--sort sweep` sorts by assembled at desc for analyze and analyzed at desc for label. this catches straggler PRs that were discovered/processed late
default is `--sort reviewed` by when bot reviewed at desc. new `--sort sweep` sorts by assembled at desc for analyze and analyzed at desc for label. this catches straggler PRs that were discovered/processed late
There was a problem hiding this comment.
Pull request overview
This PR extends the DB-backed analyze and label CLI subcommands to support bounded backfills and an alternate “sweep” prioritization order, aimed at catching PRs that become eligible long after their original discovery/review timestamp.
Changes:
- Added
--until(exclusive upper bound) toanalyze/label, including a shared_parse_time_boundhelper to normalize CLI time inputs (relative days, bare dates). - Added
--sort {reviewed,sweep}toanalyze/label, and introduced new SQL query variants to support sweep ordering. - Added/extended SQLite integration tests covering
untilbounds and sweep ordering behavior.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| online/etl/tests/test_repository.py | Adds integration tests for until bounds and analyze sweep ordering. |
| online/etl/tests/test_main.py | Adds unit tests for the new _parse_time_bound CLI helper. |
| online/etl/pipeline/label.py | Plumbs until/sort_by into the labeling pipeline call to the repository. |
| online/etl/pipeline/analyze.py | Plumbs until/sort_by into the analysis pipeline call to the repository. |
| online/etl/main.py | Adds --until/--sort flags to CLI and centralizes time-bound parsing. |
| online/etl/db/repository.py | Extends repository query APIs with until + sort_by branching. |
| online/etl/db/queries.py | Adds bounded and sweep-ordered SQL query variants for analyze/label selection. |
| online/etl/db/connection.py | Updates SQLite param translation to strip Postgres ::type casts used in new bounded queries. |
Comments suppressed due to low confidence (1)
online/etl/db/repository.py:334
sort_by == "sweep"bypasses thesince/untilbounds here too, meaninglabel --since/--until --sort sweepwill label PRs outside the requested reviewed-at window. Align the sweep branch with the bounded logic (e.g., add bounded+sorted queries ordered byp.analyzed_at DESCwhile still filtering onp.bot_reviewed_at).
if sort_by == "sweep":
if chatbot_id is not None:
return await self.db.fetchall(
q.GET_ANALYZED_NOT_LABELED_BY_ASSEMBLED, (chatbot_id, limit)
)
return await self.db.fetchall(
q.GET_ALL_ANALYZED_NOT_LABELED_BY_ASSEMBLED, (limit,)
)
if until is not None:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
zverianskii
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add
--untilflag to analyze/label for backfilling a specific date range (e.g. a missed day). Add--sortsweep mode that orders byassembled_at(analyze) oranalyzed_at(label) DESC instead ofbot_reviewed_at. Since we only analyze merged PRs, ones that merge long after discovery fall outside the--sincewindow of the primary run; the sweep pass catches these stragglers in addition to the freshness-first processing.