Category: Artificial Intelligence

  • AGI. Are we there yet?

    A very pessimistic take by Marcus Hutchins on the current state of AI. The author touches upon a variety of topics which I have read independent of each other.

    A logical problem I previously used to tests early LLMs was one called “The Wolf, The Goat, And The Cabbage”. The problem is simple. You’re walking with a wolf, a goat, and a cabbage. You come to a river which you need to cross. There is a small boat which only has enough space for you and one other item. If left unattended, the wolf will eat the goat, and the goat will eat the cabbage. How do you get all 3 safely across?

    The correct answer is you take the goat across, leaving behind the wolf and the cabbage. You then return and fetch the cabbage, leaving the goat alone on the other side. Because the goat and cabbage cannot be left alone together, you take the goat back, leaving just the cabbage. Now, you can take the wolf across, leaving the wolf and the cabbage alone on the other side, finally returning to fetch the goat.

    Any LLM could effortlessly answer this problem, because it has thousands of instances of the problem and the correct solution in its training data. But it was found that by simply swapping out one item but keeping the same constraints, the LLM would no longer be able to answer. Replacing the wolf with a lion, would result in the LLM going off the rails and just spewing a bunch of nonsense.

    This made it clear the LLM was not actually thinking or reasoning through the problem, simply just regurgitating answers and explanations from its training data. Any human, knowing the answer to the original problem, could easily handle the wolf being swapped for a lion, or the cabbage for a lettuce. But LLMs, lacking reasoning, treated this as an entirely new problem.

    Over time this issue was fixed. It could be that the LLM developers wrote algorithms to identify variants of the problem. It’s also possible that people posting different variants of the problem allowed the LLM to detect the core pattern, which all variants follow, allowing it to substitute words where needed.

    This is when someone found you could just break the problem, and the LLM’s pattern matching along with it. Either by making it so none of the objects could be left unattended, or all of them could. In some variants there was no reason to cross the river, the boat doesn’t fit anyone, was actually a car, or has enough space to carry all the items at once. Humans, having actual logic and reasoning abilities could easily identify the broken versions of the problems and answer accordingly, but the LLMs would just output incoherent gibberish.

    But of course, as more and more ways to disprove LLM reasoning were found, the developers just found ways to fix them. I strongly suspect these issues are not being fixed by any introduction of actual logic or reasoning, but by sub-models built to address specific problems. If this is the case, I’d argue we’re moving away from AGI and back towards building problem specific ML models, which is how “AI” has worked for decades.

    Bonus: Check the wikipedia page of Marcus Hutchins.

  • Inevitabilism

    Tom Renner explaining inevitabilism.

    People advancing an inevitabilist world view state that the future they perceive will inevitably come to pass. It follows, relatively straightforwardly, that the only sensible way to respond to this is to prepare as best you can for that future.

    This is a fantastic framing method. Anyone who sees the future differently to you can be brushed aside as “ignoring reality”, and the only conversations worth engaging are those that already accept your premise.

    “We are entering a world where we will learn to coexist with AI, not as its masters, but as its collaborators.” – Mark Zuckerberg

    “AI is the new electricity.” – Andrew Ng

    “AI will not replace humans, but those who use AI will replace those who don’t.” – Ginni Rometty

    These are some big names in the tech world, all framing the conversation in a very specific way. Rather than “is this the future you want?”, the question is instead “how will you adapt to this inevitable future?”. Note also the threatening tone present, a healthy psychological undercurrent encouraging you to go with the flow, because you’d otherwise be messing with scary powers way beyond your understanding.

  • Summary vs Shortening

    Scott Jenson talking about anthropomorphizing of LLMs and touching upon the difference between summary and shortening. I recommend reading the entire post to avoid taking the subtext below out of context..

    […] we say they can “summarize” a document. But LLMs don’t summarize, they shorten, and this is a critical distinction. A true summary, the kind a human makes, requires outside context and reference points. Shortening just reworks the information already in the text.

    Here is an example using the movie The Matrix:

    Summary

    A philosophical exploration of free will and reality disguised as a sci-fi action film about breaking free from systems of control.

    Shortening

    A computer hacker finds out reality is fake and learns Kung Fu.

    There’s a key difference between summarizing and simply shortening. A summary enriches a text by providing context and external concepts, creating a broader framework for understanding. Shortening, in contrast, only reduces the original text; it removes information without adding any new perspective.

  • Do more with same rather than doing same with less

    Thomas Dohmke—ex CEO of Github—shares his take on the AI vs Developer+AI argument. He still thinks developers will need to get their fundamentals right, review and verify AI generated code, understand and design. But then also acknowledges that AI is going to bring in a significant change in the way developers code in the future.

    Developers rarely mentioned “time saved” as the core benefit of working in this new way with agents. They were all about increasing ambition. We believe that means that we should update how we talk about (and measure) success when using these tools, and we should expect that after the initial efficiency gains our focus will be on raising the ceiling of the work and outcomes we can accomplish, which is a very different way of interpreting tool investments. This helps explain the – perhaps unintuitive at first – observation that many of the developers we interviewed were paying for top-tier subscriptions. When you move from thinking about reducing effort to expanding scope, only the most advanced agentic capabilities will do.

    The last sentence in bold ties back to the title of this post.

  • 10x productivity

    Colton Voege arguing that AI is not making software engineers 10x as productive.

    10x productivity means ten times the outcomes, not ten times the lines of code. This means what you used to ship in a quarter you now ship in a week and a half. These numbers should make even the truest AI believer pause. The amount of product ideation, story point negotiation, bugfixing, code review, waiting for deployments, testing, and QA in that go into what was traditionally 3 months of work is now getting done in 7 work days? For that to happen each and every one of these bottlenecks has to also seen have 10x productivity gains.

    Any software engineer who has worked on actual code in an actual company knows this isn’t possible.

    AI is making coding—which is a small portion of what software engineers do—10x productive. That too, sometimes. Colton Voege touches upon quite a few other topics. A worth while read.

  • Credit card and vibe coding

    Steve Krouse sharing an analogy that vibe coding is like giving child a credit card. The child gets instant gratification, but at the end of the month you need to pay the bill.

    The worst possible situation is to have a non-programmer vibe code a large project that they intend to maintain. This would be the equivalent of giving a credit card to a child without first explaining the concept of debt.

    As you can imagine, the first phase is ecstatic. I can wave this little piece of plastic in stores and take whatever I want!

    Which is a lot like AI can build anything now! Nobody needs to learn how to code! Look at what it just made for me!

    But if you wait a month, you’ll get the credit card bill. Did I actually need to buy all those things? How will I get myself out of this hole?

    It’s similar for the vibe coder. My code broken. What do all these files and folders even do? How will I ever get this fixed? Can I get a refund for the $400 I spent vibe coding?

    If you don’t understand the code, your only recourse is to ask AI to fix it for you, which is like paying off credit card debt with another credit card.

    I saw this post on Hacker News and there was this comment that caught my eye.

    Non-technical or junior people developed and deployed applications, emboldened by the relative ease of Microsoft Access and Excel. There were all kinds of limitations, scaling problems, and maintenance nightmares. But there were a lot of upsides too, and it made the “professionals” up their game to obviate the need for such adhoc and unsanctioned developments.

    Come to think of it, the exact same thing happened when the PC became popular. Mainframe people were aghast at all the horrible unprofessional mess that the PC people were creating.

    This in turn reminded me of the quote from Micha Kaufman.

    You must understand that what was once considered ‘easy tasks’ will no longer exist; what was considered ‘hard tasks’ will be the new easy, and what was considered ‘impossible tasks’ will be the new hard.

    These historical perspectives and statements drive me to a conclusion—vibe coding is here to stay. We will have people on both end of the spectrum. Some folks will rack up huge credit card debt and go bankrupt. Others will use the credit card wisely and travel free with the accumulated reward points.

  • Almost right, but not quite

    The results of Stack Overflow Developer Survey 2025 are in.

    No need to bury the lede: more developers are using AI tools, but their trust in those tools is falling.

    And why is the trust falling?

    The number-one frustration, cited by 45% of respondents, is dealing with “AI solutions that are almost right, but not quite,” which often makes debugging more time-consuming. In fact, 66% of developers say they are spending more time fixing “almost-right” AI-generated code. When the code gets complicated and the stakes are high, developers turn to people. An overwhelming 75% said they would still ask another person for help when they don’t trust AI’s answers.

  • Jagged intelligence

    Andrej Karpathy explaining what jagged intelligence is in AI along with some examples.

    Jagged Intelligence. Some things work extremely well (by human standards) while some things fail catastrophically (again by human standards), and it’s not always obvious which is which, though you can develop a bit of intuition over time. Different from humans, where a lot of knowledge and problem solving capabilities are all highly correlated and improve linearly all together, from birth to adulthood.

    Personally I think these are not fundamental issues. They demand more work across the stack, including not just scaling. The big one I think is the present lack of “cognitive self-knowledge”, which requires more sophisticated approaches in model post-training instead of the naive “imitate human labelers and make it big” solutions that have mostly gotten us this far.

    It’s from a year ago, and some of those jags have been smoothed out.

  • Virtuosity in the world of AI

    Drew Breunig talking about virtuosity how quickly amazing new developments in the world of AI are becoming, meh.

    virtuosity can only be achieved when the audience can perceive the risks being taken by the performer.

    A DJ that walks on stage and hits play is not likely to be perceived as a virtuoso. While a pianist who is able to place their fingers perfectly among a minefield of clearly visible wrong keys is without question a virtuoso. I think this idea carries over to sports as well and can partially explain the decline of many previously popular sports and the rise of video game streaming. We watch the things that we have personally experienced as being difficult. That is essential context to appreciate a performance.

    Initially, many AI applications were, surprisingly, embraced as incredible performances. The images generated by DALLe were usually not more impressive than those of professional illustrators. They were instead incredibly impressive because they had been achieved by a computer program. The same goes for video generating AI demos; none of their video clips are aesthetic or narrative achievements. They are impressive because they were generated by software. But even here, the AI is not the virtuoso. The virtuoso are the teams and companies building these models.

    We’ve been able to watch this sheen come off very quickly. Generating an image from a chatbot is no longer very impressive to our friends. It is a novelty. And this half-life, the time it takes for a model’s output to become merely novel, is shortening with every release.

  • Proof of thought

    Alex Martsinovich talking about how writing has become incredibly cheap and goes on to talk about the AI etiquettes that we need to start following.

    For the longest time, writing was more expensive than reading. If you encountered a body of written text, you could be sure that at the very least, a human spent some time writing it down. The text used to have an innate proof-of-thought, a basic token of humanity.

    Now, AI has made text very, very, very cheap. Not only text, in fact. Code, images, video. All kinds of media. We can’t rely on proof-of-thought anymore. Any text can be AI slop. If you read it, you’re injured in this war. You engaged and replied – you’re as good as dead. The dead internet is not just dead it’s poisoned. So what do we do?