Claude Code shares the implementation details and asks if it can continue. Boris Bobrov says No and Claude Code goes off the rails.
Why did this happen? A few pointers from Hacker News.
This comment by nicofcl:
Exactly right. The core issue is conflating authorization semantics with text processing. When a user says “no”, that’s a state change assertion, not prompt content that gets fed back to a model.
The harness layer should enforce this at the I/O boundary – permissions are control flow gates, not part of the LLM’s input context. Treating “consent as prompt material” creates an attack surface where:
1. The user’s intent (“don’t do X”) can be reinterpreted as creative writing 2. The model’s output becomes the source of truth for authorization 3. There’s no clear enforcement boundary
This is why military/critical systems have long separated policy (what’s allowed) from execution (what actually runs). The UI returns a boolean or enum, the harness checks it, and write operations either proceed or fail – no interpretation needed.
The irony is that this makes systems both more secure AND more predictable for the user.
This comment by sgillen:
To be fair to the agent…
I think there is some behind the scenes prompting from claude code (or open code, whichever is being used here) for plan vs build mode, you can even see the agent reference that in its thought trace. Basically I think the system is saying “if in plan mode, continue planning and asking questions, when in build mode, start implementing the plan” and it looks to me(?) like the user switched from plan to build mode and then sent “no”.
From our perspective it’s very funny, from the agents perspective maybe it’s confusing. To me this seems more like a harness problem than a model problem.
I laughed way too long this one. This reminded me of a typical bollywood trope where the hero will pursue the heroine despite the fact that heroine has said No.

You must be logged in to post a comment.