Click here to view all the categories on naveegator.in

Category: Artificial Intelligence

Almost right, but not quite

August 2, 2025

The results of Stack Overflow Developer Survey 2025 are in.

No need to bury the lede: more developers are using AI tools, but their trust in those tools is falling.

And why is the trust falling?

The number-one frustration, cited by 45% of respondents, is dealing with “AI solutions that are almost right, but not quite,” which often makes debugging more time-consuming. In fact, 66% of developers say they are spending more time fixing “almost-right” AI-generated code. When the code gets complicated and the stakes are high, developers turn to people. An overwhelming 75% said they would still ask another person for help when they don’t trust AI’s answers.
Jagged intelligence

July 31, 2025

Andrej Karpathy explaining what jagged intelligence is in AI along with some examples.

Jagged Intelligence. Some things work extremely well (by human standards) while some things fail catastrophically (again by human standards), and it’s not always obvious which is which, though you can develop a bit of intuition over time. Different from humans, where a lot of knowledge and problem solving capabilities are all highly correlated and improve linearly all together, from birth to adulthood.

Personally I think these are not fundamental issues. They demand more work across the stack, including not just scaling. The big one I think is the present lack of “cognitive self-knowledge”, which requires more sophisticated approaches in model post-training instead of the naive “imitate human labelers and make it big” solutions that have mostly gotten us this far.

It’s from a year ago, and some of those jags have been smoothed out.
Virtuosity in the world of AI

July 29, 2025

Drew Breunig talking about virtuosity how quickly amazing new developments in the world of AI are becoming, meh.

…virtuosity can only be achieved when the audience can perceive the risks being taken by the performer.

A DJ that walks on stage and hits play is not likely to be perceived as a virtuoso. While a pianist who is able to place their fingers perfectly among a minefield of clearly visible wrong keys is without question a virtuoso. I think this idea carries over to sports as well and can partially explain the decline of many previously popular sports and the rise of video game streaming. We watch the things that we have personally experienced as being difficult. That is essential context to appreciate a performance.

Initially, many AI applications were, surprisingly, embraced as incredible performances. The images generated by DALLe were usually not more impressive than those of professional illustrators. They were instead incredibly impressive because they had been achieved by a computer program. The same goes for video generating AI demos; none of their video clips are aesthetic or narrative achievements. They are impressive because they were generated by software. But even here, the AI is not the virtuoso. The virtuoso are the teams and companies building these models.

We’ve been able to watch this sheen come off very quickly. Generating an image from a chatbot is no longer very impressive to our friends. It is a novelty. And this half-life, the time it takes for a model’s output to become merely novel, is shortening with every release.
Proof of thought

July 23, 2025

Alex Martsinovich talking about how writing has become incredibly cheap and goes on to talk about the AI etiquettes that we need to start following.

For the longest time, writing was more expensive than reading. If you encountered a body of written text, you could be sure that at the very least, a human spent some time writing it down. The text used to have an innate proof-of-thought, a basic token of humanity.

Now, AI has made text very, very, very cheap. Not only text, in fact. Code, images, video. All kinds of media. We can’t rely on proof-of-thought anymore. Any text can be AI slop. If you read it, you’re injured in this war. You engaged and replied – you’re as good as dead. The dead internet is not just dead it’s poisoned. So what do we do?
We are all junior programmers now

July 22, 2025

Generative AI is the fastest moving technology I have seen till date. I am seeing something or the other happening every single day. It’s hard to keep track of updates. Scott Werner extends this thought and theorises that because of this we are all junior programmers now.

We’re all junior developers again. But not junior in the traditional sense, where senior is waiting after enough years pass. We’re junior in a permanent sense, where the technology evolves faster than expertise can accumulate.

It’s like being a professional surfer on an ocean that keeps changing its physics. Just when you think you understand waves, they start moving sideways. Or backwards. Or turning into birds.

This is either terrifying or liberating, depending on your relationship with control.

Scott also touches upon other interesting aspects like—Time Dilation in the Age of AI.
Anthropomorphization of AI

July 11, 2025

Halvar Flake talking about anthropomorphization of AI.

The moment that people ascribe properties such as “consciousness” or “ethics” or “values” or “morals” to these learnt mappings is where I tend to get lost. We are speaking about a big recurrence equation that produces a new word, and that stops producing words if we don’t crank the shaft.

To me, wondering if this contraption will “wake up” is similarly bewildering as if I was to ask a computational meteorologist if he isn’t afraid of his meteorological numerical calculation will “wake up”.

[…]

Instead of saying “we cannot ensure that no harmful sequences will be generated by our function, partially because we don’t know how to specify and enumerate harmful sequences”, we talk about “behaviors”, “ethical constraints”, and “harmful actions in pursuit of their goals”. All of these are anthropocentric concepts that – in my mind – do not apply to functions or other mathematical objects. And using them muddles the discussion, and our thinking about what we’re doing when we create, analyze, deploy and monitor LLMs.
Disposable code

July 8, 2025

Charlie Guo’s thoughts on programming and AI.

Code has historically been something with a very high upfront cost to create and nearly zero cost to distribute. That’s defined much of the economic models of Silicon Valley – VC-funded startups invest heavily in creating products that can scale near-infinitely.

But we’re turning that model on its head with the ability to create software for a fraction of what it used to cost. And as someone who (at least in part) considers himself a craftsman, I’m learning to embrace cheap, single-use code. I’m not sure how I feel about it – we’re now dealing with the environmental consequences of single-use physical products, despite their convenience. But there’s something fundamentally different about writing a script you’ll use once and throw away versus carefully architecting a system meant to last for years.

What’s more, writing custom software that works used to be only within the domain of software engineers who had either formally studied or had invested hours into teaching themselves the arcane knowledge of compilers, networking, algorithms, and more. Everyone else had to use off-the-shelf products or “no code” platforms that heavily constrained what you could do – like going from a full palette to a paint-by-numbers system.

Now, almost anyone with a bit of product sense can ship something new. Indie hackers don’t have to worry about hiring a whole dev team to get to an MVP, and designers and PMs can vibe code internal prototypes in an afternoon. None of this code will be perfect, but I think that’s sort of the point – it’s an entirely different beast from the type of code I’m used to working with. And I’m reasonably sure I’m going to have to evolve my way of working.

So what happens when AI starts to think—disposable code is all humans want? Do we end up polluting the knowledge base for LLMs just like we did with our environment?
Just-in-time software

July 7, 2025

Shayne Sweeney talking about future of AI-written software.

Picture software that crystallizes only when a user shows up, tuned to their request, then evaporates when no one needs it. No pre-compiled artifacts, no fixed language runtime, just live thoughts turning into running systems. In that world the concept of “code” dissolves into something closer to conversation history.

At that point, the most valuable programmers will be the ones who articulate problems clearly enough that the machine can solve them safely. Syntax will matter about as much as spark-plug gap settings matter to the driver who only presses “Start.”

[…]

So keep your lint rules if they spark joy, but start practicing a different skill: explaining your intent so clearly that two billion transistors and a trillion-parameter model can’t possibly misunderstand you. The next diff you review may be measured in behaviors shipped, not lines changed. And that’s a future worth pressing “Start” for.
Artist vs AI

July 4, 2025

A wonderfully illustrated post by Christoph Niemann talking about his fears about AI. Each one of his thought is supported by its own thoughtfully designed illustration.

Automating the creation of art is like automating life, so you can make it to the finish line faster.

This one hit hard.
Dividing a job into tasks

June 24, 2025

Arvind Narayanan talking about dividing job into tasks and the boundaries between them.

…if you define jobs in terms of tasks maybe you’re actually defining away the most nuanced and hardest-to-automate aspects of jobs, which are at the boundaries between tasks.

Can you break up your own job into a set of well-defined tasks such that if each of them is automated, your job as a whole can be automated? I suspect most people will say no. But when we think about other people’s jobs that we don’t understand as well as our own, the task model seems plausible because we don’t appreciate all the nuances.

If this is correct, it is irrelevant how good AI gets at task-based capability benchmarks. If you need to specify things precisely enough to be amenable to benchmarking, you will necessarily miss the fact that the lack of precise specification is often what makes jobs messy and complex in the first place. So benchmarks can tell us very little about automation vs augmentation.