Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions .github/workflows/news-items-daily-review.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: news-items-daily-review

on:
# Review the latest daily ingest artifacts after the daily ingest workflow completes.
workflow_run:
workflows: ["daily-state-run"]
types: [completed]
workflow_dispatch:

concurrency:
group: news-items-daily-review
cancel-in-progress: false

permissions:
contents: read
issues: write

env:
DEFAULT_PYTHON_VERSION: "3.11"
STATE_REPO: DataHackIL/tfht_enforce_idx_state
STATE_JOB_DIR: state_repo/news_items/ingest
DENBUST_REVIEW_WORKFLOW_NAME: daily-state-run

jobs:
review-latest-daily-ingest:
if: >-
github.event_name == 'workflow_dispatch' ||
github.event.workflow_run.conclusion == 'success'
runs-on: ubuntu-latest
timeout-minutes: 20
environment: news-items-ingest

steps:
- name: Checkout code repo
uses: actions/checkout@v6
Comment thread
shaypal5 marked this conversation as resolved.
with:
ref: ${{ github.event.workflow_run.head_sha || github.sha }}
fetch-depth: 2

- name: Checkout state repo
uses: actions/checkout@v6
with:
repository: ${{ env.STATE_REPO }}
token: ${{ secrets.STATE_REPO_PAT }}
path: state_repo

- name: Set up denbust review job
uses: ./.github/actions/setup-denbust-state-job
with:
python-version: ${{ env.DEFAULT_PYTHON_VERSION }}
install-playwright: "false"
state-job-dir: ${{ env.STATE_JOB_DIR }}

- name: Review latest daily ingest artifacts and open issues
# Required secrets:
# - STATE_REPO_PAT
# - ANTHROPIC_API_KEY
# Optional:
# - DENBUST_REVIEW_MODEL
# - DENBUST_REVIEW_ISSUE_LABELS
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ github.token }}
DENBUST_STATE_ROOT: state_repo
DENBUST_REVIEW_MODEL: ${{ secrets.DENBUST_REVIEW_MODEL }}
DENBUST_REVIEW_ISSUE_LABELS: ${{ secrets.DENBUST_REVIEW_ISSUE_LABELS }}
run: python -m denbust.news_items.daily_review
Comment thread
shaypal5 marked this conversation as resolved.
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Planned future datasets:
- Publishes release bundles to Kaggle and Hugging Face when configured
- Uploads the latest release bundle to Google Drive and S3-compatible object storage when configured
- Persists dataset/job-scoped seen state and per-run JSON snapshots
- Reviews the latest daily ingest artifacts and can open GitHub issues for suspicious runs

## Quick Start

Expand Down Expand Up @@ -153,6 +154,7 @@ Bootstrap notes:

- `seen.json` may be absent initially; it is created once a run marks at least one URL as seen
- `runs/` and `publication/` directories are created automatically by the workflows when needed
- `logs/` is created automatically once ingest debug artifacts are written
- a small `README.md` in the state repo is fine but optional

## Architecture Direction
Expand Down Expand Up @@ -373,11 +375,28 @@ The Phase B integrations are intentionally split into:
- backup is considered successful if the command completes; zero configured targets is treated as a warning, not a failure
- if a configured publication or backup target is missing required credentials, that target currently fails the job rather than silently skipping

## Daily AI Review Workflow

The repository also includes a workflow that reviews the latest `daily-state-run` artifacts and can
open GitHub issues when the latest ingest looks suspicious:

- `news-items-daily-review.yml`

It runs automatically after `daily-state-run` completes successfully, and it also supports
`workflow_dispatch` for manual review. It reads the latest matching files from:

- `news_items/ingest/runs/`
- `news_items/ingest/logs/`

The workflow uses Anthropic to turn those artifacts into candidate engineering issues, then creates
only new issues by embedding a hidden fingerprint marker in each issue body.

## GitHub Actions secret/setup matrix

| Workflow | Required secrets | Optional secrets |
|---|---|---|
| `daily-state-run.yml` / `weekly-state-run.yml` | `STATE_REPO_PAT`, `ANTHROPIC_API_KEY`, `DENBUST_SUPABASE_URL`, `DENBUST_SUPABASE_SERVICE_ROLE_KEY` | SMTP/email secrets if email output is enabled |
| `news-items-daily-review.yml` | `STATE_REPO_PAT`, `ANTHROPIC_API_KEY` | `DENBUST_REVIEW_MODEL`, `DENBUST_REVIEW_ISSUE_LABELS` |
| `news-items-release.yml` | `STATE_REPO_PAT`, `DENBUST_SUPABASE_URL`, `DENBUST_SUPABASE_SERVICE_ROLE_KEY` | `DENBUST_KAGGLE_DATASET`, `KAGGLE_USERNAME`, `KAGGLE_KEY`, `DENBUST_HUGGINGFACE_REPO_ID`, `HF_TOKEN` |
| `news-items-backup.yml` | `STATE_REPO_PAT` | `DENBUST_DRIVE_FOLDER_ID`, `DENBUST_DRIVE_SERVICE_ACCOUNT_JSON`, `DENBUST_OBJECT_STORE_BUCKET`, `DENBUST_OBJECT_STORE_PREFIX`, `DENBUST_OBJECT_STORE_ENDPOINT_URL`, `DENBUST_OBJECT_STORE_ACCESS_KEY_ID`, `DENBUST_OBJECT_STORE_SECRET_ACCESS_KEY` |

Expand All @@ -386,6 +405,7 @@ The release and backup workflows both support `workflow_dispatch` for manual run
Recommended GitHub Environment mapping:

- `news-items-ingest` for `daily-state-run.yml` and `weekly-state-run.yml`
- `news-items-ingest` for `news-items-daily-review.yml`
- `news-items-release` for `news-items-release.yml`
- `news-items-backup` for `news-items-backup.yml`

Expand Down
Loading
Loading