Conversation
|
pr-agent-context report: This is a refreshed snapshot of the current PR state.
This run includes a failing check on PR #33.
Diagnose and fix the failing checks below, then push all of these changes in a single commit.
# Failing Checks
## FAIL-1
Type: GitHub Actions workflow run
Workflow: ci-test
Run: [WIP] Fix classifier rejecting relevant prostitution trafficking articles
Run number: 140
Run attempt: 1
Conclusion: action_required
URL: https://github.com/DataHackIL/tfht_enforce_idx/actions/runs/23397793307
Summary:
Workflow run failed, but job details were unavailable.Run metadata: |
…icles - Add CLASSIFICATION_SYSTEM_PROMPT with an explicit inclusive framing that tells the model to mark articles relevant even without enforcement actions - Rewrite CLASSIFICATION_PROMPT: remove 'anti-prostitution enforcement' framing, add Hebrew keywords (בתי בושת/זנות/סרסורות/סחר בבני אדם), and give cleaner JSON response examples - Pass system=CLASSIFICATION_SYSTEM_PROMPT to Anthropic messages.create() - Add tests for prompt content (Hebrew terms, inclusive language) and API call Co-authored-by: shaypal5 <917954+shaypal5@users.noreply.github.com> Agent-Logs-Url: https://github.com/DataHackIL/tfht_enforce_idx/sessions/30cb20f0-b398-4ce1-b671-6b973820e5c9
|
pr-agent-context report: This is a refreshed snapshot of the current PR state.
This run includes a failing check on PR #33.
Diagnose and fix the failing checks below, then push all of these changes in a single commit.
# Failing Checks
## FAIL-1
Type: GitHub Actions workflow run
Workflow: ci-test
Run: Fix classifier falsely rejecting Hebrew prostitution/trafficking articles
Run number: 141
Run attempt: 1
Conclusion: action_required
URL: https://github.com/DataHackIL/tfht_enforce_idx/actions/runs/23397892026
Summary:
Workflow run failed, but job details were unavailable.Run metadata: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #33 +/- ##
=======================================
Coverage 99.84% 99.84%
=======================================
Files 38 38
Lines 3270 3277 +7
=======================================
+ Hits 3265 3272 +7
Misses 5 5
🚀 New features to boost your workflow:
|
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#33. Treat this PR as all clear unless new signals appear.Run metadata: |
There was a problem hiding this comment.
Pull request overview
This PR fixes false negatives in the Hebrew LLM relevance classifier by removing enforcement-only framing and explicitly prompting for inclusive topic-based relevance (brothels, prostitution, pimping, trafficking), even when no arrest/enforcement action occurred.
Changes:
- Introduces a dedicated
CLASSIFICATION_SYSTEM_PROMPTand passes it via Anthropic’ssystemparameter to broaden the model’s relevance criteria. - Rewrites the user prompt to use a neutral topic list and adds key Hebrew terms (
בתי בושת,זנות,סרסורות,סחר בבני אדם) plus clearer JSON response examples. - Adds unit tests to assert prompt content and verify
systemis forwarded to the Anthropic API call, covering trafficking/brothel false-negative scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/denbust/classifier/relevance.py |
Adds an inclusive system prompt, updates the user prompt framing/keywords, and passes system=... into messages.create(). |
tests/unit/test_classifier.py |
Adds tests asserting prompt/system prompt content and that classify() forwards the system prompt and returns relevant for key examples. |
|
pr-agent-context report: No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR
#33. Treat this PR as all clear unless new signals appear.Run metadata: |
|
pr-agent-context report: 🚨 `pr-agent-context` failed while preparing PR context.
PR: #33
Error: CalledProcessError: Command '['git', '-C', '/home/runner/work/tfht_enforce_idx/tfht_enforce_idx/caller-repo', 'diff', '--unified=0', 'd35059ed71baff42d14eb15ea1a7c58c7fc1e38c...56d290c7fac354d187c8c1241abfe14055193700']' returned non-zero exit status 128.
Run: https://github.com/DataHackIL/tfht_enforce_idx/actions/runs/23407643392
The workflow continued gracefully so this failure does not block CI.
Check the job logs for the full traceback.Run metadata: |
|
pr-agent-context report: 🚨 `pr-agent-context` failed while preparing PR context.
PR: #33
Error: CalledProcessError: Command '['git', '-C', '/home/runner/work/tfht_enforce_idx/tfht_enforce_idx/caller-repo', 'diff', '--unified=0', 'd35059ed71baff42d14eb15ea1a7c58c7fc1e38c...56d290c7fac354d187c8c1241abfe14055193700']' returned non-zero exit status 128.
Run: https://github.com/DataHackIL/tfht_enforce_idx/actions/runs/23407647940
The workflow continued gracefully so this failure does not block CI.
Check the job logs for the full traceback.Run metadata: |
|
pr-agent-context report: 🚨 `pr-agent-context` failed while preparing PR context.
PR: #33
Error: CalledProcessError: Command '['git', '-C', '/home/runner/work/tfht_enforce_idx/tfht_enforce_idx/caller-repo', 'diff', '--unified=0', 'd35059ed71baff42d14eb15ea1a7c58c7fc1e38c...56d290c7fac354d187c8c1241abfe14055193700']' returned non-zero exit status 128.
Run: https://github.com/DataHackIL/tfht_enforce_idx/actions/runs/23407681566
The workflow continued gracefully so this failure does not block CI.
Check the job logs for the full traceback.Run metadata: |
The LLM classifier was rejecting clearly relevant articles (brothel discoveries, trafficking cases, prostitution arrests) with high confidence. Root cause: the prompt opened with "anti-prostitution enforcement", priming the model to require an explicit enforcement action rather than treating any mention of the covered topics as sufficient for relevance.
Changes
src/denbust/classifier/relevance.pyCLASSIFICATION_SYSTEM_PROMPT— passed as the Anthropicsystemparameter — that explicitly instructs the model to be inclusive and mark articles relevant "even if no arrest or enforcement action has occurred"CLASSIFICATION_PROMPTto:בתי בושת/זנות/סרסורות/סחר בבני אדם)true/falseJSON template with separate positive/negative response examplessystem=CLASSIFICATION_SYSTEM_PROMPTin themessages.create()calltests/unit/test_classifier.pyTestClassificationPromptContent— asserts Hebrew keywords and inclusive language are present in the promptsTestClassifyPassesSystemPrompt— verifies the system prompt reaches the Anthropic API call and that trafficking/brothel articles (the specific false-negative types from the issue) returnrelevant: trueWarning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
api.anthropic.com/usr/bin/python python -m pytest tests/unit/ -q(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.