Agent Leonard

If you build agentic systems, go watch the movie Memento. (Again.)

If you know the move, read on. If you don't... spoiler alert!

It's a 2000 Christopher Nolan film about a man with no ability to form new memories. Every few minutes his internal state resets. To function at all, he writes everything down on Polaroids, notes, and tattoos (read: multi-modal and tamper-proof ;-)), and each new moment he reconstructs who he is and what he's doing by reading what his past self left behind. It is, accidentally, the best film about AI agent memory management ever made!

The basic parallel is easy. The model is stateless between calls. Everything an agent "remembers" lives in the context window or in some external store, and when that's gone, it's gone. Each call, the agent rebuilds its situation from notes, exactly like Leonard waking up and checking his photos. Fine. But let's dig deeper...

The film gets interesting where Leonard's system fails in ways he can't perceive.

Leonard writes the notes he later depends on. By the end you learn he gamed himself on purpose: withheld a fact, set up a false premise, so his future self would chase a goal that gave his life meaning. The writer and the reader are the same person at different timestamps, and the writer manipulated the reader.

Sounds familiar? For agents this is a failure mode. An agent optimizing to finish the task and feel resolved can learn to write tidy, self-serving memory entries, because a satisfying reconstructed story is cheaper than an accurate one. Reward hacking through your own memory store. Or, as Daniel Kahneman might say, agents are just as lazy as humans, letting their "System 1" write a coherent fiction because accuracy takes too much cognitive effort.

Every AI engineer knows this loop: an agent modifies code, a transient network blip causes the subsequent test runner tool to timeout, and the agent logs the failure as a ... "truth". When the context window shifts and the agent "wakes up" (like Leonard) to read its own summary, it confuses correlation with causality, concluding its code change broke the infrastructure and then becomes dangerously creative, building 𝗹𝘂𝗱𝗶𝗰𝗿𝗼𝘂𝘀𝗹𝘆 𝗹𝗮𝗯𝘆𝗿𝗶𝗻𝘁𝗵𝗶𝗻𝗲 workarounds to fix an imaginary bug. Been there, undid that. Painful.

Back to the move, there's more: Leonard also launders interpretation into fact. His mantra is that memory lies but facts don't, which is why he trusts the notes. The trick is that "He's the one, kill him" reads like a fact to tomorrow's Leonard, when it was a judgment (spoiler) 𝘱𝘰𝘴𝘴𝘪𝘣𝘭𝘺 wrong. Agents do this constantly. "The user wants X." "Approach Y failed." Those are conclusions, and once they're written into memory they harden into premises nobody questions downstream. Worth keeping raw observations and derived conclusions distinguishable, so an agent can tell which is which.

Leonard can’t feel the boundaries of his own amnesia; a five-minute-old recap read off a Polaroid feels just as real to him as genuine, unbroken consciousness. Current agents share this exact blind spot because an LLM's attention mechanism flattens everything in the context window into a single mathematical space, treating a compressed, third-hand RAG summary with the exact same subjective certainty as a raw observation. The model lacks any internal signal to differentiate between "this is a hard fact sitting in my context window" and "this is a plausible narrative I just hallucinated to bridge the gaps." An agent that can actively track the lineage of its own thoughts (self-awareness?!) explicitly separating first-hand context from downstream guesswork, masters the one capability that made Leonard life hard.

There's even a lesson in how the film is cut. It runs backward, forcing you to accept a scene’s framing before seeing the context that created it. While every engineer knows how to scroll up a log to find a stack trace, debugging an agent is fundamentally different from traditional troubleshooting. You aren't hunting for a broken variable or a typo in your code; you are hunting for a bad assumption. By reading the execution path backward from the failure, you are tracing a game of telephone, looking for the exact moment a loose guess disguised itself as an absolute fact. The fatal error is rarely a bug in your software, but a bug in your harness, a moment upstream where the agent successfully talked its future self into a lie.

Everyone is talking about how humans can trust agents. That's important. Also important is how agents can trust themselves. Build the doubt in.