Om Malik explaning fusion architecture in Apple’s M5 Pro and M5 Max.
For years, Apple’s narrative around its “M-series” chips was about integration. One chip. One die. Everything on the same piece of silicon. Unified memory so the CPU, GPU, and Neural Engine could all access the same data without copying it around. It worked beautifully for the M1 and M2. But now with the rise of AI, chips need to get bigger. AI demands more cores, more memory bandwidth, more compute. So, making one really big honking chip gets really expensive.
The larger a single die gets, the harder it is to manufacture. One tiny defect anywhere on the silicon and you toss the whole thing. Yields drop. Costs climb. AMD’s CEO Lisa Su recently showed that a design using four smaller chiplets delivered more total capability at 59 percent of the cost of one big chip.
Apple, too, faced a fork in the road. Keep building bigger and bigger single chips. Or break the big chip into smaller pieces and connect them together fast enough that software barely notices the split. They chose the second option, but made it their own. They call it Fusion Architecture.
[…]
This approach comes with its own tradeoffs. You split a chip into pieces, the pieces need to talk to each other. That means data traveling between dies, which adds latency. Memory gets divided up between chiplets. You solve the manufacturing problem but you compromise the architecture. Apple decided to do it its own way.
Johny Srouji, who has led Apple’s silicon efforts for years, says Apple kept unified memory intact. In the press release he said Fusion Architecture would “scale the capabilities of Apple silicon while preserving its core tenets of performance, power efficiency, and unified memory architecture.” While Apple says unified memory is preserved across both dies, the technical details of how memory actually works across two dies versus one die aren’t spelled out.