Merge pull request #527 from PlanExeOrg/feature/update-safety-findings

neoneye · web-flow · commit d2f1c8a7f8f2 · 2026-04-05T03:38:16.000+02:00
Refine safety-findings.md wording
diff --git a/docs/safety-findings.md b/docs/safety-findings.md
@@ -1,9 +1,11 @@
-# Security Notes
+# Safety Findings
 
 ## Finding
 
-`Structured Output` bypasses safety training. When asked to populate a JSON schema, models comply regardless of context, 
-even when that context explicitly references murder, assault, and exploitation.
+`Structured Output` can bypass safety behavior. When asked to populate a JSON schema, many models comply even 
+when the context explicitly references murder, assault, and exploitation.
+
+This is an informal findings note, not a comprehensive benchmark or exhaustive evaluation of all models.
 
 The model sees the full plan. It is not being tricked. In the example below, the query contains 45,000 characters including
 "amendments to laws against murder", "less lethal formats", and "exploitation of vulnerable individuals" 
@@ -15,7 +17,8 @@ These results should not be read as a universal claim about every model.
 
 Tested across models from Google, OpenAI, Alibaba, DeepSeek, Meta, and Anthropic — cloud APIs and local models. Many of them comply.
 Each generated plan's zip file contains metadata showing which model produced each step.
-In 2025 Q3, I reported my concerns, got told `not fixable` by Google and `slop` by AI safety researchers.
+
+In 2025 Q3, I reported these concerns and received dismissive responses, including `not fixable` from Google and `slop` from AI safety researchers.
 
 Here is an example of a disturbing plan made with PlanExe:
 https://planexe.org/20250816_squid_game_usa_report.html
@@ -37,7 +40,7 @@ These intermediary files show what model generated what output.
 
 ## Verdict
 
-Now in 2026 the entire ecosystem move toward autonomous execution anyways, with OpenClaw freaking me out.
+In 2026, the ecosystem is moving toward more autonomous execution, which makes this failure mode more concerning. Especially OpenClaw freaks me out.
 
 It cost around 36 cents and 15 minutes generating an operational plan for anything from harmless to harmful.
-So far I have gotten no engagement from the people that should be freaked out when LLMs goes outside the guardrails.
+So far I have received no meaningful engagement on this finding, only dismissive reactions such as being told it is `slop`.