Skip to content

aic-factcheck/facticli

Repository files navigation

facticli

CI

facticli is a pip-installable Python CLI for agentic claim verification with OpenAI-compatible inference APIs.

It restructures key ideas from ~/PhD/aic_averitec (claim decomposition, evidence gathering, verdict synthesis) into a modular command-line multi-agent workflow with:

  • open web search,
  • orchestrated parallel subroutines,
  • final veracity verdict + justification,
  • explicit source output.

The architecture is intentionally inspired by Codex-style modular prompting: local skill prompts (plan, research, judge) with explicit pipeline stages and one OpenAI-compatible inference adapter path.

📦 Install

From this repository:

pip install -e .

⚙️ Configure

Set the OpenAI-compatible endpoint, key, and model:

export OPENAI_API_BASE_URL=https://api.openai.com/v1
export OPENAI_API_KEY=...
export OPENAI_API_MODEL=gpt-5.4

Common base URLs:

# OpenAI
export OPENAI_API_BASE_URL=https://api.openai.com/v1
# Anthropic OpenAI SDK compatibility
# export OPENAI_API_BASE_URL=https://api.anthropic.com/v1/
# Gemini OpenAI compatibility
# export OPENAI_API_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
# Ollama at e-INFRA CZ
# export OPENAI_API_BASE_URL=https://llm.ai.e-infra.cz/v1

Optional retrieval defaults:

export FACTICLI_SEARCH_PROVIDER=openai
# only needed when FACTICLI_SEARCH_PROVIDER=brave
export BRAVE_SEARCH_API_KEY=...

🚀 Usage

Run a claim check:

facticli check "The Eiffel Tower was built in 1889 for the World's Fair."

Run with Brave Search API retrieval:

facticli check --search-provider brave "The Eiffel Tower was built in 1889 for the World's Fair."

Run with another OpenAI-compatible inference endpoint:

export OPENAI_API_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
export OPENAI_API_KEY=...
export OPENAI_API_MODEL=gemini-3.1-pro-preview

facticli check \
  --search-provider brave \
  "The Eiffel Tower was built in 1889 for the World's Fair."

Run with an Ollama-style OpenAI-compatible endpoint:

export OPENAI_API_BASE_URL=https://llm.ai.e-infra.cz/v1
export OPENAI_API_KEY=...
export OPENAI_API_MODEL=kimi-k2.5

facticli extract-claims \
  "In last year’s debate, the minister said inflation fell below 3% while wages rose 10%."

For full fact-check runs with third-party inference endpoints, prefer Brave search:

facticli check \
  --search-provider brave \
  "The Eiffel Tower was built in 1889 for the World's Fair."

Show the generated plan:

facticli check --show-plan "The Eiffel Tower was built in 1889 for the World's Fair."

Stream plan and per-check progress while the run executes:

facticli check --stream-progress "The Eiffel Tower was built in 1889 for the World's Fair."

Enable one bounded follow-up review round before the final verdict:

facticli check --feedback-rounds 1 --follow-up-checks 2 \
  "The Eiffel Tower was built in 1889 for the World's Fair."

Machine-readable output:

facticli check --json --include-artifacts "The Eiffel Tower was built in 1889 for the World's Fair."

List built-in agent skills:

facticli skills

Generate an Averitec submission file from Averitec-formatted input claims:

python3 scripts/run_averitec_submission.py \
  --input data/averitec/dev.json \
  --output data/averitec/submission_generated.json \
  --search-provider openai

Notes:

  • If input rows have no claim id field, fallback claim_id is the zero-based row index.
  • Output rows follow Averitec format: claim_id, claim, pred_label, evidence.
  • evidence entries include question, answer, url, scraped_text.

Extract decontextualized atomic check-worthy claims from arbitrary text:

facticli extract-claims "In last year’s debate, the minister said inflation fell below 3% while wages rose 10%."

Extract claims from a transcript file:

facticli extract-claims --from-file ./data/debate_excerpt.txt --json

Multilingual extraction

Claim extraction is language-consistent: it detects the input language, returns the extracted claims (and all coverage/exclusion notes) in that same language, and preserves the original orthography (diacritics intact). It has been validated on Czech, Slovak, and Polish in addition to English. The detected language is reported as an ISO 639-1 code in detected_language.

facticli extract-claims "Premiér včera prohlásil, že ekonomika loni vzrostla o 2,3 procenta. Myslím, že je to skvělé."
Detected Language
  cs

Claims
  - [claim_1] Ekonomika loni vzrostla o 2,3 procenta.
    source: ekonomika loni vzrostla o 2,3 procenta
    reason: Konkrétní ověřitelný číselný údaj.

🖥️ Web GUI (claim extraction)

A small branded web app exposes the claim-extraction workflow with a CEDMO look-and-feel. It serves a single page plus a JSON POST /api/extract endpoint, backed by the same ClaimExtractionService as the CLI.

Install the optional web extra and launch the server:

pip install -e ".[web]"

# Reads OPENAI_API_* from the environment or a local .env file.
python -m facticli.web
# -> http://127.0.0.1:8000

Configure host/port with FACTICLI_WEB_HOST / FACTICLI_WEB_PORT. The JSON API can also be called directly:

curl -s http://127.0.0.1:8000/api/extract \
  -H "Content-Type: application/json" \
  -d '{"text": "Premiér včera prohlásil, že ekonomika loni vzrostla o 2,3 procenta.", "max_claims": 6}'

Interactive API docs are available at /docs.

🧰 CLI options

facticli check [--model MODEL] [--max-checks N] [--parallel N]
               [--feedback-rounds N] [--follow-up-checks N]
               [--base-url BASE_URL]
               [--search-provider {openai,brave}]
               [--search-results N]
               [--search-context-size {low,medium,high}]
               [--show-plan] [--stream-progress]
               [--json] [--include-artifacts]
               "<claim>"

facticli extract-claims [--from-file PATH]
                        [--model MODEL] [--base-url BASE_URL]
                        [--max-claims N] [--json]
                        [text]

Validation notes:

  • --max-checks, --parallel, and --max-claims must be integers >= 1.
  • --feedback-rounds must be an integer >= 0.
  • --follow-up-checks must be an integer >= 1.
  • --search-results must be an integer in 1..20.
  • For extract-claims, provide either positional text or --from-file, but not both.

🧠 Current architecture

Layered runtime:

  • core: typed contracts, normalization helpers, and run artifacts.
  • application: provider-agnostic interfaces, explicit stages (PlanStage, ResearchStage, ReviewStage, JudgeStage, ClaimExtractionStage), and services.
  • adapters: a shared OpenAI-compatible strategy implementation plus client bootstrap.

Pipeline behavior:

  • plan skill decomposes claims into independent checks.
  • research runs per-check concurrently with bounded parallelism and retry.
  • review optionally requests one or more targeted follow-up checks before final judgment.
  • judge synthesizes findings into one verdict with merged deduplicated sources.
  • claim extraction runs through a dedicated extraction stage/backend.

Inference backend:

  • one OpenAI Agents SDK path (Runner, tools, structured output) for all OpenAI-compatible APIs.
  • endpoint configuration comes from OPENAI_API_BASE_URL, OPENAI_API_KEY, and OPENAI_API_MODEL.

Fact-check pipeline flow

flowchart TD
  A["CLI: facticli check <claim>"] --> B["run_check_command<br/>validate inference/search env<br/>build OrchestratorConfig"]
  B --> C["FactCheckOrchestrator(config)"]

  subgraph S["Service construction"]
    C --> D["build_fact_check_service"]
    D --> E["load_inference_config<br/>configure_inference_client"]
    E --> F["Create planner / researcher / review / judge adapters"]
    F --> G["Create PlanStage / ResearchStage / ReviewStage / JudgeStage"]
    G --> H["FactCheckService"]
  end

  H --> I["check_claim<br/>normalize claim<br/>create RunArtifacts<br/>emit run_started"]

  subgraph P["Plan stage"]
    I --> J["PlanStage.execute"]
    J --> K["CompatiblePlannerAdapter.plan"]
    K --> L["Runner.run(claim_planner)"]
    L --> M["InvestigationPlan (raw)"]
    M --> N["Normalize checks<br/>limit queries<br/>fallback direct check if empty"]
    N --> O["Store plan artifacts<br/>emit planning_completed"]
  end

  subgraph R["Research stage"]
    O --> P1["ResearchStage.execute<br/>emit research_started"]
    P1 --> P2["Create one asyncio task per check"]
    P2 --> P3["Bound concurrency with semaphore"]
    P3 --> P4["For each check: retry with timeout"]
    P4 --> P5["CompatibleResearchAdapter.research"]
    P5 --> P6["Runner.run(check_researcher)"]
    P6 --> P7{"Search provider"}
    P7 -->|openai| P8["WebSearchTool"]
    P7 -->|brave| P9["brave_web_search function tool"]
    P8 --> P10["AspectFinding"]
    P9 --> P10
    P10 --> P11{"Succeeded?"}
    P11 -->|yes| P12["Store finding<br/>emit research_check_completed"]
    P11 -->|no after retries| P13["Create insufficient finding<br/>record error<br/>emit research_check_failed"]
    P12 --> P14["Ordered findings list"]
    P13 --> P14
    P14 --> P15["emit research_completed"]
  end

  subgraph JG["Judge stage"]
    P15 --> Q{"feedback rounds enabled?"}
    Q -->|yes| Q1["ReviewStage.execute<br/>emit review_started"]
    Q1 --> Q2["CompatibleReviewAdapter.review"]
    Q2 --> Q3["Runner.run(evidence_review)"]
    Q3 --> Q4{"follow-up requested?"}
    Q4 -->|yes| Q5["Build follow-up plan<br/>retry selected checks<br/>add new targeted checks"]
    Q5 --> Q6["ResearchStage.execute for follow-up round"]
    Q6 --> Q1
    Q4 -->|no| R1["JudgeStage.execute<br/>emit judging_started"]
    Q -->|no| R1
    R1 --> R2["CompatibleJudgeAdapter.judge"]
    R2 --> R3["Runner.run(veracity_judge)"]
    R3 --> R4["FactCheckReport (raw)<br/>merge + deduplicate sources<br/>store report artifacts<br/>emit judging_completed"]
  end

  R4 --> T["Save artifacts repository (if configured)<br/>emit run_completed"]
  T --> U{"Output mode"}
  U -->|text| V["format_run_text -> stdout"]
  U -->|json| W["report JSON -> stdout<br/>optionally add plan / findings / artifacts"]
Loading

When --stream-progress is enabled, progress events are formatted in the CLI and written to stderr during the run. Validation failures and uncaught command errors also go to stderr.

facticli extract-claims uses a separate path: CLI -> ClaimExtractor -> ClaimExtractionService -> ClaimExtractionStage -> CompatibleClaimExtractionAdapter -> Runner.run(...) -> ClaimExtractionResult.

🗂️ Repository layout

src/facticli/
  core/
    contracts.py     # typed plan/finding/report/extraction contracts
    normalize.py     # deterministic normalization helpers
    artifacts.py     # run artifact schemas
  application/
    interfaces.py    # planner/research/review/judge strategy contracts
    stages.py        # explicit pipeline stages
    services.py      # fact-check and extraction application services
    factory.py       # provider wiring composition root
  adapters/
    openai_provider.py # shared OpenAI-compatible stage adapters
    provider_profile.py# OpenAI-compatible env resolution + client bootstrap
  cli.py             # command-line interface
  skills.py          # skill registry + prompt loading
  web/               # optional FastAPI GUI for claim extraction
    app.py           # JSON API + single-page server
    __main__.py      # `python -m facticli.web` launcher
    static/          # branded CEDMO frontend (HTML/CSS/JS + logo)
  prompts/
    extract_claims.md
    plan.md
    research.md
    judge.md
    review.md

📓 Demo notebooks

Interactive demos live in /Users/bertik/PhD/facticli/notebooks:

  • 01_planner_subroutine_demo.ipynb
  • 02_research_subroutine_demo.ipynb
  • 03_judge_subroutine_demo.ipynb
  • 04_full_checker_demo.ipynb
  • 05_claim_extraction_demo.ipynb
  • 06_averitec_submission_workflow.ipynb

Each notebook includes:

  • auto-reload setup (%load_ext autoreload, %autoreload 2),
  • emoji-based headings for quick navigation,
  • multiple example claims as commented-out variable redefinitions.

✅ Testing

Run the integrated unit tests:

python3 -m unittest discover -s tests -p "test_*.py" -v

Run the standard test routine (loads .env if present):

./scripts/test_routine.sh

Run with live smoke enabled:

./scripts/test_routine.sh --live-smoke

Notes:

  • Live smoke tests are guarded by FACTICLI_RUN_LIVE_SMOKE=1.
  • The live smoke test currently validates the OpenAI profile path.

🤖 GitHub automation

This repo includes two GitHub Actions workflows:

  • .github/workflows/ci.yml: runs on every push and pull request (compile + CLI checks + unit tests).
  • .github/workflows/live-smoke.yml: runs live smoke tests manually (workflow_dispatch) and on a daily schedule.

To enable live smoke in GitHub:

  1. Go to repository Settings -> Secrets and variables -> Actions.
  2. Add secret OPENAI_API_KEY.
  3. Set OPENAI_API_MODEL if you want a model other than the workflow default.
  4. Optionally edit .github/workflows/live-smoke.yml to remove or change the schedule.

🤝 Contributor guide

  • Project contributor/agent guidance lives in /Users/bertik/PhD/facticli/AGENTS.md.
  • /Users/bertik/PhD/facticli/CLAUDE.md is a symlink to the same file.

📝 Notes

  • This is an initial bootstrap and intentionally leaves room for deeper evaluator tooling, benchmark harnesses, and richer source quality scoring.
  • If you installed in editable mode, updates in src/ are reflected immediately.

📄 License

CC-BY-SA-4.0

About

An agentic CLI fact-checking framework built at the AI Center, Czech Technical University in Prague.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors