An insightful post by Bruce Schneier on the security issues plaguing AI. He also suggests that prompt injections might be unsolvable in today’s LLMs.
The fundamental problem is that AI must compress reality into model-legible forms. In this setting, adversaries can exploit the compression. They don’t have to attack the territory; they can attack the map. Models lack local contextual knowledge. They process symbols, not meaning. A human sees a suspicious URL; an AI sees valid syntax. And that semantic gap becomes a security gap.
Prompt injection might be unsolvable in today’s LLMs. LLMs process token sequences, but no mechanism exists to mark token privileges. Every solution proposed introduces new injection vectors: Delimiter? Attackers include delimiters. Instruction hierarchy? Attackers claim priority. Separate models? Double the attack surface. Security requires boundaries, but LLMs dissolve boundaries. More generally, existing mechanisms to improve models won’t help protect against attack. Fine-tuning preserves backdoors. Reinforcement learning with human feedback adds human preferences without removing model biases. Each training phase compounds prior compromises.
This is Ken Thompson’s “trusting trust” attack all over again. Poisoned states generate poisoned outputs, which poison future states. Try to summarize the conversation history? The summary includes the injection. Clear the cache to remove the poison? Lose all context. Keep the cache for continuity? Keep the contamination. Stateful systems can’t forget attacks, and so memory becomes a liability. Adversaries can craft inputs that corrupt future outputs.
This is the agentic AI security trilemma. Fast, smart, secure; pick any two. Fast and smart—you can’t verify your inputs. Smart and secure—you check everything, slowly, because AI itself can’t be used for this. Secure and fast—you’re stuck with models with intentionally limited capabilities.