The good news is: It does not work...

... Yet.

But it could be something. Too early to tell.

The "it" is AI. A specific flavor of it: the kind that goes past writing code, emails or blog posts like this one ;-). The kind that keeps the AIpocalypse headlines warm and programmers, designers (soon product managers?) mildly anxious about their jobs. The thing I have been experimenting with for the last few weeks lives in that bucket.

But, let's start from the begining.

Some while ago I had an aha moment using AI. Not for code, but to stress-test the calls I was wrestling with. I would describe a product decision and prompt the model to push back from a commercial angle, then a technical one, then a tactical one, then from a skeptical customer's seat. The answers were often surprisingly good. They unearthed thinking flaws and gaps I had not noticed in myself. Something kept nagging at me (no, it was not AGI).

We all know that AI models carry biases from their foundation training. In the end, humans teach them ;). The interesting part for me was the cognitive load of switching perspectives. For me, jumping from commercial to technical to operational takes effort, and the longer I sit with a position the harder it gets to argue against it. For the model, swapping lenses costs almost nothing. One prompt, new seat at the table. Not only ignorance, sometimes oblivion is a bliss, too.

That kept me thinking.

Slow thinking, fast machines

Daniel Kahneman's Thinking, Fast and Slow shaped a lot of how I look at my own work, and at the world. The central idea, that we run on a fast intuitive system and only fall back to slow deliberate reasoning when forced, has held up for me across years of building software and leading teams. We make snap calls. We rationalize them later. We rarely revisit.

In product work this shows up everywhere. We ship a feature. The metric wobbles. We tell ourselves a story about why and move on. The next quarter is already on the roadmap, the next OKR is already written, and the team needs momentum. Confirmation bias does the rest.

The plan, do, learn, act loop is one of those things every product person can recite and few teams actually run. We do the planning. We do the doing. The learning step gets compressed into a retro slide. The act step, the discipline of throwing something out when the numbers say so, is the hardest of all. We love what we built. The old line about falling in love with the problem rather than the solution is true and hard to live by. (Few years ago pendo released a research that ~80% of features ever built are not used)

I started wondering what part of that "machinery" (actually, these are very human traits) a more disciplined collaborator could help with.

A different kind of partner

The first sketch was small. A reasoning assistant for product managers. Something that could hold a hypothesis next to the evidence and ask harder questions than a busy team has time to ask itself. Speaking from experience, the useful angles and lenses tend to repeat: Marty Cagan's four product risks (value, usability, feasibility, viability) served me well for years, de Bono's Six Thinking Hats when a room needs to switch modes on cue, Porter when you need to look outward. Useful, repeatable, and rarely run end to end on a real decision because nobody has the time - unless you are a McKinsey consultant ;-) (SCNR).

Then I read The Book of Why by Judea Pearl not too long ago. Causal modeling, do-calculus, counterfactual reasoning. The idea that you can write down what you believe causes what, attach a prediction to it, and let reality update the picture. I had seen versions of this in academic statistics but never as a working tool for product teams. The math has been around since the 1990s, a bit like double-entry bookkeeping before it got cheap to run at scale. The blocker was always the human cost of maintaining the model. My hypothesis: LLMs change that math.

More dots connected.

If the model could carry the causal graph, run the adversarial debate, hold the institutional memory of past bets, and feed the next round of decisions, you would have something closer to a disciplined collaborator than a smart autocomplete or stochastic parrot. Wouldn't you?!

And the obvious tailwind. Delivery is cheap now. Code is a commodity. Experimentation costs almost nothing. (Well. Tokens prices do go up. We'll see how that trends...). But, the bottleneck has moved upstream, to deciding what is worth experimenting on in the first place.

The sketch grew into a picture. The picture turned into a side project.

A "few" tokens later...

One thing led to another. The reasoning assistant grew a memory. The memory grew a discourse engine. The discourse engine needed delivery to close the loop. Delivery needed telemetry to know if the bet had paid off. All needed a conductor (read: orchestration, moderation, rules, policies, gates...). By the time I stepped back, I had been sketching a firm. So I called it firmd, the firm daemon. Unix folks will get the nod.

One design principle drove a lot of the rest. Every company already runs on three operating systems: a communication system (Slack, Mattermost, Outlook, Teams,...), a content system (Notion, Sharepoint, Atlassian, ...), and a work management system (Jira, Linear, ...). I wanted my agents to live inside those systems and use them the way a human teammate would. Off the shelf where possible. The human organization as the blueprint, agents as new participants on the same org chart. And as little bespoke UI as I could get away with, ideally none to stay true to the d.

Roughly, what came out is a virtual agentic company that runs the full plan, do, learn, act loop end to end. Strategic discourse with specialist agents arguing from different professional lenses. A causal model that captures the firm's bets as a graph. Delivery that turns plans into shipped, instrumented increments. Telemetry that measures prediction error against what was promised. A knowledge graph the agents build as they work, capturing the entities, relationships, and decisions about the product domain, grounding the next round of reasoning.

The honest version of building this is that ninety percent of the work is wiring and plumbing. Containers, queues, schemas, isolation, observability, auth, the unglamorous machinery that turns a clever LLM prompt into a system that converges, behaves roughly (let's embrace some "temperature"!) the same way on the second run. The remaining ten percent is the AI. That ratio surprised me at first and stopped surprising me quickly. Anyone shipping AI products this year will recognise it. As the Anthropic leak also disclosed.

I spent more weeks and tokens on this than I planned. Some days (and nights) the system shipped its own code through its own discourse. Other days it argued in circles, missed a gate, and taught me something about where the abstractions were thin. I had to learn how to learn (evaluate an agentic system), which turns out to be a discipline of its own - tough one, but probably the most important one.

Today

Today firmd shipped its first hello world. Using a quen3.5:397b (only). Yay! A small step for mankind. An idea entered the front of the loop, survived a structured debate, became a Strategic Intent with a measurable prediction, went through delivery, and came out the other side as a shipped increment with telemetry attached, leading to a strategic reflection and a second cycle. End to end, on its own.

It is alpha. It is not live yet. There are still a few token budgets to spend on code reviews, hardening and rough edges. Also, a few tough questions to look at (token economics! what's the price of a decision?!) ... before I would invite real users into a private alpha. Until then, some prose and screenshots must do to spark some conversations.

Back to the opening line. It does not work. Yet. The first hello world shipped today, the loop closes, and that is the thing I needed to see before saying any of this out loud. It's not building CrowdStrike, Salesforce, probably not even a weather App. Yet. Whether "yet" becomes "soon" or "later" or "never"... I don't know yet.

What I do know: The nature of that work is shifting on me. The plumbing and wiring is mostly behind me. What is in front is organizational design. Learn learning. Defining roles for the agents, policies for how they argue, processes and gates for how decisions move through the firm, balancing determinism and LLM, even defining different culture flavors for different kinds of firmd. This is the part I find genuinely fun.

Here's my message in all the AI job cut FUD. AI can write the code. AI can increasingly reason about what to build. Humans are the system builders. We design the structure that AI then runs on. That distinction has held up so far. And that is in high demand.

If any of this resonates, the website is up at firmd.ai. The architecture, the components, the thinking behind each piece, all there. All 100% AI generated from docs, of course, and it keeps growing. Hope to add the "sign up" button soon. I would genuinely love input. Pushback, questions, war stories from your own product work, ideas I have not considered.