Benedict Evans talks about how AI is getting better at giving better answers, but still lags behind when it comes to giving the right answer.
Here’s a practical example of the kind of thing that I do quite often, that I’d like to be able to automate. I asked ChatGPT 4o how many people were employed as elevator operators in the USA in 1980. The US Census collected this data and published it: the answer is 21,982
First, I try the answer cold, and I get an answer that’s specific, unsourced, and wrong. Then I try helping it with the primary source, and I get a different wrong answer with a list of sources, that are indeed the US Census, and the first link goes to the correct PDF… but the number is still wrong. Hmm. Let’s try giving it the actual PDF? Nope. Explaining exactly where in the PDF to look? Nope. Asking it to browse the web? Nope, nope, nope…
I faced this issue when I asked AI “what is the composition of nifty 500 between large caps mid caps and small caps“. ChatGPT came close by the getting the right document to refer, but ended up picking the wrong value. I needed the right answer. As Benedict Evans calls it, a deterministic task and not a probablistic task.
The useful critique of my ‘elevator operator’ problem is not that I’m prompting it wrong or using the wrong version of the wrong model, but that I am in principle trying to use a non-deterministic system for a a deterministic task. I’m trying to use a LLM as though it was SQL: it isn’t, and it’s bad at that.
But don’t write off AI so soon. Benedict Evans goes on to talk about how disruption happens.
Part of the concept of ‘Disruption’ is that important new technologies tend to be bad at the things that matter to the previous generation of technology, but they do something else important instead. Asking if an LLM can do very specific and precise information retrieval might be like asking if an Apple II can match the uptime of a mainframe, or asking if you can build Photoshop inside Netscape. No, they can’t really do that, but that’s not the point and doesn’t mean they’re useless. They do something else, and that ‘something else’ matters more and pulls in all of the investment, innovation and company creation. Maybe, 20 years later, they can do the old thing too – maybe you can run a bank on PCs and build graphics software in a browser, eventually – but that’s not what matters at the beginning. They unlock something else.