• Spec-driven development

    I first read about spec-driven development from this post by Birgitta Böckeler.

    Like with many emerging terms in this fast-paced space, the definition of “spec-driven development” (SDD) is still in flux. Here’s what I can gather from how I have seen it used so far: Spec-driven development means writing a “spec” before writing code with AI (“documentation first”). The spec becomes the source of truth for the human and the AI.

    The premise was intriguing. It reminded me of my early days of learning Java and its slogan, “write once, run anywhere”.

    Then Drew Breunig released a spec only library called whenwords.

    Today I’m releasing whenwords, a relative time formatting library that contains no code.

    whenwords provides five functions that convert between timestamps and human-readable strings, like turning a UNIX timestamp into “3 hours ago”.

    There are many libraries that perform similar functions. But none of them are language agnostic.

    […]

     whenwords contains specs and tests, specifically:

    • SPEC.md: A detailed description of how the library should behave and how it should be implemented.
    • tests.yaml: A list of language-agnostic test cases, defined as input/output pairs, that any implementation must pass.
    • INSTALL.md: Instructions for building whenwords, for you, the human.

    I remember, back in 2012, Mark Zuckerberg abandoned his dream of HTML5 mobile application in favour of native mobile application. And with that there were two separate teams for iOS and Android Facebook mobile application.

    “I think the biggest mistake we made as a company is betting too much on HTML5 as opposed to native,” he admitted, acknowledging his company’s difficulties in piecing together a coherent mobile strategy.

    The spec-driven development approach would have saved Facebook from this costly mistake. It might also have given incentives for mobile developers to create Windows Phone applications.

    So will we start seeing native aplications based on SDD?

    Maybe not. At least not so soon.

    Today I read this post by Drew Breunig, again, who raises and answers this very question. Why Claude’s own desktop app is built with Electron, a cross platform framework to build desktop applications using web technologies, and not in native OS-specific code?

    For one thing, coding agents are really good at the first 90% of dev. But that last bit – nailing down all the edge cases and continuing support once it meets the real world – remains hard, tedious, and requires plenty of agent hand-holding.

    […]

    A good test suite and spec could enable the Claude team to ship a Claude desktop app native to each platform. But the resulting overhead of that last 10% of dev and the increased support and maintenance burden will remain.

    For now, Electron still makes sense. Coding agents are amazing. But the last mile of dev and the support surface area remains a real concern.

    This is an interesting space for me to watch. I guess the easy part in code generation is done. The hard part of managing messy scenarios, edge cases, product decisions, support, maintenance, and more still remain.

    Filed under
  • AI, AI, AI… Aaaarrgghh!

    Narain Jashanmal has created a framework for increased precision in AI discourse.

    “AI” has become semantically meaningless. The term now encompasses everything from a regression model to an autonomous robot, creating confusion in strategic discussions, partner conversations, and product positioning. This taxonomy provides a functional framework based on what the AI actually does, not what technique it uses.

    […]

    We use Analytical AI to decide, Semantic AI to understand andremember, Generative AI to create, Agentic AI to act, Perceptive AI to sense, and Physical AI to move.

    Filed under
  • We will still need software engineers

    Boris Cherny, creator of Claude Code, replying to the question, if Claude Code is writing 100% of Claude Code now why Anthropic has 100+ open developer positions.

    Someone has to prompt the Claudes, talk to customers, coordinate with other teams, decide what to build next. Engineering is changing and great engineers are more important than ever.

    If you read Boris’ post on how he uses Claude Code, which he says is a vanilla setup, you will realise that there’s a lot of engineering knowledge behind him because of which he is able to effectively use Claude Code.

    Filed under
  • First answer

    Dimitris Papailiopoulos highlighting that how, with AI, the first answer has become inexpensive.

    But the intellectually interesting part for me is something else. I now have something close to a magic box where I throw in a question and a first answer comes back basically for free, in terms of human effort. Before this, the way I’d explore a new idea is to either clumsily put something together myself or ask a student to run something short for signal, and if it’s there, we’d go deeper. That quick signal step, i.e., finding out if a question has any meat to it, is what I can now do without taking up anyone else’s time. It’s now between just me, Claude Code, and a few days of GPU time.

    I don’t know what this means for how we do research long term. I don’t think anyone does yet. But the distance between a question and a first answer just got very small.

    Filed under
  • Project

    Sidu Ponnappa says that there is no product.

    If the project isn’t expensive and isn’t risky, its output can’t be amortised – because anyone can produce the same output for less than your subscription costs. Your customers can build it. Their consultants can build it. Given another year of model improvements, their interns can build it. Nothing to sell because there’s nothing scarce.

    The vibe-coding crowd is proving this without realising it. Every weekend project posted on Twitter with “look what I built!” is a demonstration that the output has no product economics. The built-in-a-weekend flex is also the confession – if it took you a weekend, it’ll take your competitor a weekend too. The fact that anyone can build it is precisely why it can’t be productised. They’re not building products. They’re manufacturing disposable inventory and calling it a startup.

    Filed under
  • Inflection point

    I have been reading a lot about AI and there is one topic that has constantly come up over last 2-3 months. That is, how good Claude Code has become and the day we will not need to code is here.

    This post is not about supporting or refuting these claims. I haven’t used Claude Code.

    This is just a checkpoint for me to look back and say, this was the inflection point where the world of programming changed forever. Or at least we all collectively felt that for a while.

    Filed under
  • Solow’s productivity paradox

    Sasha Rogelberg reporting on how AI is resurrecting Solow’s productivity paradox.

    In 1987, economist and Nobel laureate Robert Solow made a stark observation about the stalling evolution of the Information Age: Following the advent of transistors, microprocessors, integrated circuits, and memory chips of the 1960s, economists and companies expected these new technologies to disrupt workplaces and result in a surge of productivity. Instead, productivity growth slowed, dropping from 2.9% from 1948 to 1973, to 1.1% after 1973.

    Newfangled computers were actually at times producing too much information, generating agonizingly detailed reports and printing them on reams of paper. What had promised to be a boom to workplace productivity was for several years a bust. This unexpected outcome became known as Solow’s productivity paradox, thanks to the economist’s observation of the phenomenon.

    “You can see the computer age everywhere but in the productivity statistics,” Solow wrote in a New York Times Book Review article in 1987.

    New data on how C-suite executives are—or aren’t—using AI shows history is repeating itself, complicating the similar promises economists and Big Tech founders made about the technology’s impact on the workplace and economy. Despite 374 companies in the S&P 500 mentioning AI in earnings calls—most of which said the technology’s implementation in the firm was entirely positive—according to a Financial Times analysis from September 2024 to 2025, those positive adoptions aren’t being reflected in broader productivity gains.

    Filed under
  • The knowledge is baked in

    Matthias Kainer explains how T works in GPT. This is possibly the simplest explanation of transformers that I have come across.

    All the “knowledge” a Transformer has is encoded in its weight matrices – those millions (or billions) of numbers that were set during training. When you ask it a question, it does not go look something up. It reconstructs an answer from compressed statistical patterns. Think of it like this: if you memorized every cookbook in the world but then someone asked you “what temperature for roasting a chicken?” – you would reconstruct an answer from all those overlapping memories. Most of the time you would be right. But sometimes your memories would blend together and you would confidently say “180C for 3 hours” when the actual answer depends on the size of the chicken. You would have no way to check, because the cookbooks are gone – only your compressed memory remains.

    That is why the search + embeddings approach from earlier matters so much. RAG (Retrieval Augmented Generation) is essentially saying: “Do not trust your memory alone. Before answering, go find the actual document, read it, and base your answer on that”. It does not completely solve hallucinations, but it dramatically reduces them for factual questions.

    The bottom line: Transformers are prediction machines, not truth machines. They predict what text should come next based on patterns. When the pattern aligns with truth, they are brilliant. When it does not, they are confidently wrong. This is not something that can be “fixed” without fundamentally changing the architecture – it is a feature of how next-word prediction works. Always verify important claims. The AI does not know what it does not know.

    Filed under
  • Culture

    David Attenborough explaining what culture means, in his book A Life on Our Planet, from the perspective an evolutionary biologist.

    To an evolutionary biologist, the term ‘culture’ describes the information that can be passed from one individual to another by teaching or imitation. Copying the ideas or actions of others seems to us to be easy – but that is because we excel at it. Only a handful of other species show any signs of having a culture. Chimpanzees and bottle-nosed dolphins are two of them. But no other species has anything approaching the capacity for culture that we have.

    Culture transformed the way we evolved. It was a new way by which our species became adapted for life on Earth. Whereas other species depended on physical changes over generations, we could produce an idea that brought significant change within a genera-tion. Tricks such as finding the plants that yield water even during a drought, crafting a stone tool for skinning a kill, lighting a fire or cooking a meal, could be passed from one human to another during a single lifetime. It was a new form of inheritance that didn’t rely on the genes which an individual received from its parents.

    So now the pace of our change increased. Our ancestors’ brains expanded at extraordinary speed, enabling us to learn, store and spread ideas. But, ultimately, the physical changes in their bodies slowed almost to a halt. By some 200,000 years ago, anatomically modern humans, Homo sapiens – people like you and me – had appeared. We have changed physically very little since then. What has changed spectacularly is our culture.

    Filed under
  • Sprint

    I recently read two articles which highlighted a very similar problem when using AI for coding—burn out.

    Matthew Hansen talking about how a one-off productivity boost by AI can lead to the team burning out.

    My friend’s panel raised a point I keep coming back to: if we sprint to deliver something, the expectation becomes to keep sprinting. Always. Tired engineers miss edge cases, skip tests, ship bugs. More incidents, more pressure, more sprinting. It feeds itself.

    This is a management problem, not an engineering one. When leadership sees a team deliver fast once (maybe with AI help, maybe not), that becomes the new baseline. The conversation shifts from “how did they do that?” to “why can’t they do that every time?”

    My friend was saying:

    When people claim AI makes them 10x more productive, maybe it’s turning them from a 0.1x engineer to a 1x engineer. So technically yes, they’ve been 10x’d. The question is whether that’s a productivity gain or an exposure of how little investigating they were doing before.

    Burnout and shipping slop will eat whatever productivity gains AI gives you. You can’t optimise your way out of people being too tired to think clearly.

    And here’s Siddhant Khare talking about how an increase in throughput also increases context switching.

    Here’s the thing that broke my brain for a while: AI genuinely makes individual tasks faster. That’s not a lie. What used to take me 3 hours now takes 45 minutes. Drafting a design doc, scaffolding a new service, writing test cases, researching an unfamiliar API. All faster.

    But my days got harder. Not easier. Harder.

    The reason is simple once you see it, but it took me months to figure out. When each task takes less time, you don’t do fewer tasks. You do more tasks. Your capacity appears to expand, so the work expands to fill it. And then some. Your manager sees you shipping faster, so the expectations adjust. You see yourself shipping faster, so your own expectations adjust. The baseline moves.

    Before AI, I might spend a full day on one design problem. I’d sketch on paper, think in the shower, go for a walk, come back with clarity. The pace was slow but the cognitive load was manageable. One problem. One day. Deep focus.

    Now? I might touch six different problems in a day. Each one “only takes an hour with AI.” But context-switching between six problems is brutally expensive for the human brain. The AI doesn’t get tired between problems. I do.

    This is the paradox: AI reduces the cost of production but increases the cost of coordination, review, and decision-making. And those costs fall entirely on the human.

    If the team sprints in one sprint, the expectation is that the team will be sprinting forever.

    And on the other end of the spectrum we have David Crawshaw’s experience:

    I am having more fun programming than I ever have, because so many more of the programs I wish I could find the time to write actually exist. I wish I could share this joy with the people who are fearful about the changes agents are bringing. The fear itself I understand, I have fear more broadly about what the end-game is for intelligence on tap in our society. But in the limited domain of writing computer programs these tools have brought so much exploration and joy to my work.

    This is the most confusing aspect for me. The polar opposite experiences people are sharing while using AI for coding.

    Filed under