βFrom miles away across the desert, the Great Pyramid looks like a perfect, smooth geometry β a sleek triangle pointing to the stars. Stand at the base, however, and the illusion of smoothness vanishes. You see massive, jagged blocks of limestone. It is not a slope; it is a staircase.
βRemember this the next time you hear futurists talking about exponential growth.
βIntelβs co-founder Gordon Moore (Moore's Law) is famously quoted for saying in 1965 that the transistor count on a microchip would double every year. Another Intel executive, David House, later revised this statement to βcompute power doubling every 18 months." For a while, Intelβs CPUs were the poster child of this law. That is, until the growth in CPU performance flattened out like a block of limestone.
βIf you zoom out, though, the next limestone block was already there β the growth in compute merely shifted from CPUs to the world of GPUs. Jensen Huang, Nvidiaβs CEO, played a long game and came out a strong winner, building his own stepping stones initially with gaming, then computer visioniand recently, generative AI.
βThe illusion of smooth growth
βTechnology growth is full of sprints and plateaus, and gen AI is not immune. The current wave is driven by transformer architecture. To quote Anthropicβs President and co-founder Dario Amodei: βThe exponential continues until it doesnβt. And every year weβve been like, βWell, this canβt possibly be the case that things will continue on the exponentialβ β and then every year it has.β
βBut just as the CPU plateaued and GPUs took the lead, we are seeing signs that LLM growth is shifting paradigms again. For example, late in 2024, DeepSeek surprised the world by training a world-class model on an impossibly small budget, in part by using the MoE technique.
βDo you remember where you recently saw this technique mentioned? Nvidiaβs Rubin press release: The technology includes β…the latest generations of Nvidia NVLink interconnect technology… to accelerate agentic AI, advanced reasoning and massive-scale MoE model inference at up to 10x lower cost per token.β
βJensen knows that achieving that coveted exponential growth in compute doesnβt come from pure brute force anymore. Sometimes you need to shift the architecture entirely to place the next stepping stone.
βThe latency crisis: Where Groq fits in
βThis long introduction brings us to Groq.
βThe biggest gains in AI reasoning capabilities in 2025 were driven by βinference time computeβ β or, in lay terms, βletting the model think for a longer period of time.β But time is money. Consumers and businesses do not like waiting.
βGroq comes into play here with its lightning-speed inference. If you bring together the architectural efficiency of models like DeepSeek and the sheer throughput of Groq, you get frontier intelligence at your fingertips. By executing inference faster, you can βout-reasonβ competitive models, offering a βsmarterβ system to customers without the penalty of lag.
βFrom universal chip to inference optimization
βFor the last decade, the GPU has been the universal hammer for every AI nail. You use H100s to train the model; you use H100s (or trimmed-down versions) to run the model. But as models shift toward "System 2" thinking β where the AI reasons, self-corrects and iterates before answering β the computational workload changes.
βTraining requires massive parallel brute force. Inference, especially for reasoning models, requires faster sequential processing. It must generate tokens instantly to facilitate complex chains of thought without the user waiting minutes for an answer. βGroqβs LPU (Language Processing Unit) architecture removes the memory bandwidth bottleneck that plagues GPUs during small-batch inference, delivering lightning-fast inference.
βThe engine for the next wave of growth
βFor the C-Suite, this potential convergence solves the "thinking time" latency crisis. Consider the expectations from AI agents: We want them to autonomously book flights, code entire apps and research legal precedent. To do this reliably, a model might need to generate 10,000 internal "thought tokens" to verify its own work before it outputs a single word to the user.
-
βOn a standard GPU: 10,000 thought tokens might take 20 to 40 seconds. The user gets bored and leaves.
-
βOn Groq: That same chain of thought happens in less than 2 seconds.
βIf Nvidia integrates Groqβs technology, they solve the "waiting for the robot to think" problem. They preserve the magic of AI. Just as they moved from rendering pixels (gaming) to rendering intelligence (gen AI), they would now move to rendering reasoning in real-time.
βFurthermore, this creates a formidable software moat. Groqβs biggest hurdle has always been the software stack; Nvidiaβs biggest asset is CUDA. If Nvidia wraps its ecosystem around Groqβs hardware, they effectively dig a moat so wide that competitors cannot cross it. They would offer the universal platform: The best environment to train and the most efficient environment to run (Groq/LPU).
Consider what happens when you couple that raw inference power with a next-generation open source model (like the rumored DeepSeek 4): You get an offering that would rival todayβs frontier models in cost, performance and speed. That opens up opportunities for Nvidia, from directly entering the inference business with its own cloud offering, to continuing to power a growing number of exponentially growing customers.
βThe next step on the pyramid
βReturning to our opening metaphor: The "exponential" growth of AI is not a smooth line of raw FLOPs; it is a staircase of bottlenecks being smashed.
-
βBlock 1: We couldn't calculate fast enough. Solution: The GPU.
-
βBlock 2: We couldn't train deep enough. Solution: Transformer architecture.
-
βBlock 3: We can't "think" fast enough. Solution: Groqβs LPU.
βJensen Huang has never been afraid to cannibalize his own product lines to own the future. By validating Groq, Nvidia wouldn't just be buying a faster chip; they would be bringing next-generation intelligence to the masses.
Andrew Filev, founder and CEO of Zencoder



