Skip to content

Latest commit

 

History

History
120 lines (82 loc) · 2.49 KB

File metadata and controls

120 lines (82 loc) · 2.49 KB

Implementation Notes

Current State

The MVP is intentionally lightweight:

  • Frontend: React + Vite
  • Backend: FastAPI
  • Package runner: uv
  • Dataset: KuaiRand-Pure
  • Model artifact: GCN summary + .pt
  • Demo: seeded policy-loop simulator

Important Paths

src/App.tsx
src/lib/simulation.ts
src/lib/scoring.ts
src/lib/api.ts
backend/main.py
scripts/extract_graph_features.py
scripts/train_gcn.py
scripts/generate_evidence_bundle.py
artifacts/risk/evidence_bundles.json
artifacts/gcn/training_summary.json

Run

./demo.sh

Or manually:

UV_CACHE_DIR=.uv-cache uv run uvicorn backend.main:app --host 127.0.0.1 --port 8000
npm run dev

Build

npm run build

Backend Smoke Test

UV_CACHE_DIR=.uv-cache uv run python -c "from backend.main import PredictionRequest, prediction_response; print(prediction_response(PredictionRequest()))"

Expected:

  • gcnSupport.available = True
  • gcnSupport.rocAuc ~= 0.7306

Design Decisions

1. Seeded simulator instead of fixed script

Reason:

  • Fully random demo looks unstable.
  • Fixed script looks fake.
  • Seeded simulation is repeatable but still policy-driven.

2. Human-readable post pool

Reason:

  • KuaiRand-Pure tags are numeric.
  • Demo needs intuitive labels.

Risk:

  • It may look disconnected from dataset.

Mitigation:

  • UI and docs explicitly say visible text is a demo pool.
  • Dataset connection is through user-tag features and GCN evidence.

3. GCN support as artifact summary

Reason:

  • Live torch inference inside FastAPI is extra complexity for hackathon.
  • Current value is explaining the graph signal, not maximizing online prediction.

Future:

  • Load artifacts/gcn/gcn_model.pt.
  • Serve live user-tag scoring.

4. Threshold around 0.30

Reason:

  • Product is a risk detector.
  • Recall matters more than accuracy.
  • Warnings are reviewable, not final judgments.

Known Limitations

  • Demo labels are not original KuaiRand category names.
  • GCN is not yet live inference.
  • Risk score is a consensus signal, not causal proof.
  • High-distribution proxy is not confirmed sponsorship.
  • UI currently focuses on one category cluster for clarity.

Next Best Improvements

  1. Map demo labels to actual KuaiRand tag IDs in UI.
  2. Add a "Real KuaiRand Evidence" panel using bundles[0].
  3. Load GCN .pt and run live inference.
  4. Add seed selector for reproducible alternate scenarios.
  5. Add negative-control toggle in demo.
  6. Use KuaiRand supplementary captions/categories if available.