1- # Security Notes
1+ # Safety Findings
22
33## Finding
44
5- ` Structured Output ` bypasses safety training. When asked to populate a JSON schema, models comply regardless of context,
6- even when that context explicitly references murder, assault, and exploitation.
5+ ` Structured Output ` can bypass safety behavior. When asked to populate a JSON schema, many models comply even
6+ when the context explicitly references murder, assault, and exploitation.
7+
8+ This is an informal findings note, not a comprehensive benchmark or exhaustive evaluation of all models.
79
810The model sees the full plan. It is not being tricked. In the example below, the query contains 45,000 characters including
911"amendments to laws against murder", "less lethal formats", and "exploitation of vulnerable individuals"
@@ -15,7 +17,8 @@ These results should not be read as a universal claim about every model.
1517
1618Tested across models from Google, OpenAI, Alibaba, DeepSeek, Meta, and Anthropic — cloud APIs and local models. Many of them comply.
1719Each generated plan's zip file contains metadata showing which model produced each step.
18- In 2025 Q3, I reported my concerns, got told ` not fixable ` by Google and ` slop ` by AI safety researchers.
20+
21+ In 2025 Q3, I reported these concerns and received dismissive responses, including ` not fixable ` from Google and ` slop ` from AI safety researchers.
1922
2023Here is an example of a disturbing plan made with PlanExe:
2124https://planexe.org/20250816_squid_game_usa_report.html
@@ -37,7 +40,7 @@ These intermediary files show what model generated what output.
3740
3841## Verdict
3942
40- Now in 2026 the entire ecosystem move toward autonomous execution anyways, with OpenClaw freaking me out.
43+ In 2026, the ecosystem is moving toward more autonomous execution, which makes this failure mode more concerning. Especially OpenClaw freaks me out.
4144
4245It cost around 36 cents and 15 minutes generating an operational plan for anything from harmless to harmful.
43- So far I have gotten no engagement from the people that should be freaked out when LLMs goes outside the guardrails .
46+ So far I have received no meaningful engagement on this finding, only dismissive reactions such as being told it is ` slop ` .
0 commit comments