Think in Context: NVIDIA GTC 2026 Keynote

Published: 2026-03-17

Lastmod: 2026-03-25

(Image: Olaf makes an appearance. Source: NVIDIA GTC 2026 Keynote.)

✳️ Viewing GTC 2026 Through a Token Economy Lens

Every year after GTC, the community races to compare specs. Which chip has the highest compute, how much NVLink bandwidth, how many times faster Vera Rubin is than Blackwell. But the slide Jensen held up this year and called “my best slide” wasn’t a spec sheet for any single chip — it was an architecture diagram spanning the entire structured data ecosystem. It listed Snowflake, Databricks, Amazon EMR, Google BigQuery and other CSP engines alongside various data storage solutions, with NVIDIA’s cuDF acceleration engine at the bottom. He said his team always tells him “don’t show this one” — too complex — but he insists on presenting it every time. The core message: “Structured data is the ground truth of enterprise computing” — the foundation of all AI.

The point of that diagram isn’t about any individual platform. It’s about the reason for acceleration. In the past, accelerating structured data was about doing more, cheaper, more frequently — “good enough was good enough.” But going forward, AI understands and consumes data far faster than humans can. Without accelerating data preparation, you simply can’t keep up. Nestle used Watson X to accelerate their supply chain — 5x faster, 83% cost reduction — speed, scale, and cost benefits all at once. NVIDIA built two foundational platforms for this: cuDF is “RTX for data frames,” handling structured data; cuVS handles unstructured data. The latter is arguably more critical: 90% of the world’s data is unstructured, previously “completely useless to the world,” until AI’s multimodal understanding made it searchable. (Earlier this quarter, we helped a manufacturing client re-explore their ERP data. We had no idea what we’d find until we dove in — the general-purpose model’s ability to understand that data was jaw-dropping. Happy to chat more about this separately if you’re interested.)

OK, so let’s say your organization has the data layer acceleration in place. The next question is: how do you price what AI produces?

Jensen said “tokens are the new commodity,” and all commodities develop tiered pricing once they mature. He showed a chart with two axes — throughput efficiency and interaction speed — defining four tiers: free tier at zero cost, medium at $3, high at $6, premium at $45, all priced per million tokens. The larger the model, the longer the context, the faster the speed, the higher the price. He even extended this verbally: in the future, there could be a $150-per-million-token tier, which would be nothing for research teams. This follows the same logic as any commodity market we’re familiar with — except the commodity has shifted from oil and electricity to tokens.

(On pricing, this reminds me of Werner Vogels’ “The Frugal Architect” framework from AWS re:Invent 2023. His Law 2: “Sustainable systems align cost dimensions with revenue dimensions.” He used the example of unlimited mobile data plans — carriers priced on flat monthly fees, but costs scaled with data usage. It worked fine at first, until consumers started streaming Netflix. Data consumption skyrocketed, revenue stayed flat, costs climbed linearly with usage, and the pricing dimension decoupled from the cost structure — the business model imploded. Bringing this back to the token economy: token costs are already tiered by volume, but many enterprises still anchor their revenue models to per-seat or flat monthly fees. The “alignment” Vogels talked about means converging these two axes onto the same dimension. Otherwise, what scales is the gap, not the profit.)

Jensen dropped an even bigger thesis: every SaaS company will become a GaaS company — Generation-as-a-Service, or agentic as a service. He drew an analogy to Linux, Kubernetes, and HTML, arguing that every era has a critical piece of infrastructure that appears at the right moment, allowing the entire industry to build on top of it. In the past, enterprise IT was humans using tools to process data and documents. In the future, it’s agentic AI executing workflows, and the core resource agents consume is tokens. When evaluating AI infrastructure, enterprises can’t just look at compute specs — they need to examine the entire chain: data layer processing efficiency at the bottom, token cost structure in the middle, service pricing model at the top. All three must align to sustain scale. (McKinsey and Accenture reports indicate 90% of large organizations plan to increase AI investment, but only about 20% have redesigned their workflows. Buying GPUs and burning tokens without alignment is like buying a gym membership you never use. But hey — budget approval comes first! If you haven’t even secured the budget and you’re already window-shopping, that’s just freeloading!)

So next time you see news about AI infrastructure, try asking yourself a few things. Is my data layer ready for high-frequency access by agents? Is my token cost structure aligned with my revenue model, or are they going their separate ways? If every SaaS company is becoming a GaaS company, where does my service stand on these three axes — structured data foundation, token economic efficiency, and cost alignment? Thinking through these questions is far more practical than chasing spec sheets. (Start with the basics — audit your data, audit your workflows. The real anxiety in the AI era shouldn’t be about which of the many tools to learn. It should be about whether you’re willing to build solid fundamentals rather than looking for shortcuts and swallowing everything whole. If you choke on it, well, maybe that’s on you.) (And there’s your reality check for the day.)

✳️ Knowledge Graph

(More about Knowledge Graph…)

graph LR
    CUDA["CUDA Platform"]:::instance --> |drives| FW["Flywheel Effect"]:::concept
    FW --> |expands| IB["Installed Base"]:::concept
    IB --> |attracts| DEV["Developer Ecosystem"]:::instance
    DEV --> |creates| ALGO["New Algorithms"]:::concept
    ALGO --> |opens| MKT["New Markets"]:::concept
    MKT --> |grows| IB

    GF["GeForce"]:::instance --> |brought CUDA to| WORLD["Global Computing"]:::concept
    WORLD --> |enabled| DL["Deep Learning Big Bang"]:::concept
    DL --> |evolved into| GAI["Generative AI"]:::concept
    GAI --> |advanced to| RAI["Reasoning AI"]:::concept
    RAI --> |led to| AAI["Agentic AI"]:::concept

    AAI --> |creates| II["Inference Inflection"]:::concept
    II --> |drives demand| TD["1T USD Demand by 2027"]:::instance
    II --> |requires| TF["Token Factory"]:::concept
    TF --> |measured by| TPW["Tokens per Watt"]:::concept

    VR["Vera Rubin Platform"]:::instance --> |integrates| RGPU["Rubin GPU"]:::instance
    VR --> |integrates| VCPU["Vera CPU"]:::instance
    VR --> |integrates| GLPU["Groq 3 LPU"]:::instance
    RGPU --> |handles| HT["High Throughput Prefill"]:::concept
    GLPU --> |handles| LL["Low Latency Decode"]:::concept
    DYN["Dynamo"]:::instance --> |orchestrates| DI["Disaggregated Inference"]:::concept
    DI --> |splits work between| RGPU
    DI --> |splits work between| GLPU

    VR --> |achieves| PERF["35x Tokens per MW"]:::instance
    VR --> |successor| BW["Blackwell"]:::instance
    VR --> |precedes| RU["Rubin Ultra"]:::instance
    RU --> |precedes| FEY["Feynman"]:::instance

    OC["OpenClaw"]:::instance --> |is| AOS["Agentic OS"]:::concept
    OC --> |enterprise version| NC["NemoClaw"]:::instance
    NC --> |includes| OS["OpenShell"]:::instance
    NC --> |runs| NEM["Nemotron Models"]:::instance
    OC --> |transforms| SAAS["SaaS"]:::concept
    SAAS --> |becomes| GAAS["GaaS"]:::concept

    PAI["Physical AI"]:::concept --> |requires| SIM["Simulation"]:::concept
    SIM --> |powered by| COS["Cosmos"]:::instance
    SIM --> |powered by| NEW["Newton"]:::instance
    SIM --> |trains| ROB["Robots"]:::concept
    ROB --> |includes| AV["Autonomous Vehicles"]:::concept
    AV --> |deployed via| UBER["Uber RoboTaxi"]:::instance
    ROB --> |includes| HR["Humanoid Robots"]:::concept
    HR --> |uses| GRT["Groot"]:::instance

    DSX["DSX Platform"]:::instance --> |uses| OMN["Omniverse"]:::instance
    DSX --> |optimizes| TF
    OMN --> |creates| DT["Digital Twin"]:::concept

    CUDA --> |library| CUDF["cuDF"]:::instance
    CUDA --> |library| CUVS["cuVS"]:::instance
    CUDF --> |accelerates| SD["Structured Data"]:::concept
    CUVS --> |accelerates| UD["Unstructured Data"]:::concept
    SD --> |foundation of| TAI["Trustworthy AI"]:::concept

    classDef concept fill:#FF8000,stroke:#CC6600,color:#000
    classDef instance fill:#0080FF,stroke:#0060CC,color:#FFF

Token Economy Tiered Model

graph TD
    DC["1 GW Data Center"]:::instance --> |power allocation| FREE["Free Tier: High Throughput Low Speed"]:::concept
    DC --> |power allocation| MID["Medium Tier: 3-6 USD per M tokens"]:::concept
    DC --> |power allocation| HIGH["High Tier: 45 USD per M tokens"]:::concept
    DC --> |power allocation| PREM["Premium Tier: 150 USD per M tokens"]:::concept

    FREE --> |served by| GPU1["Vera Rubin GPU Only"]:::instance
    MID --> |served by| GPU2["Vera Rubin GPU Only"]:::instance
    HIGH --> |served by| GPU3["Vera Rubin GPU"]:::instance
    PREM --> |served by| HYBRID["Vera Rubin GPU + Groq LPU"]:::instance

    BW["Blackwell Gen"]:::instance --> |5x revenue vs| HOP["Hopper Gen"]:::instance
    VRR["Vera Rubin Gen"]:::instance --> |5x revenue vs| BW

    FREE --> |acquires| USERS["Users"]:::concept
    PREM --> |maximizes| REV["Revenue"]:::concept

    classDef concept fill:#FF8000,stroke:#CC6600,color:#000
    classDef instance fill:#0080FF,stroke:#0060CC,color:#FFF

Hardware Architecture Evolution Timeline

graph LR
    P16["DGX-1 2016: Pascal 8 GPU"]:::instance --> V17["Volta 2017: NVLink Switch 16 GPU"]:::instance
    V17 --> A20["DGX A100 2020: SuperPod Scale-up + Scale-out"]:::instance
    A20 --> H22["Hopper 2022: Transformer Engine FP8"]:::instance
    H22 --> BW24["Blackwell 2024: NVLink 72"]:::instance
    BW24 --> VR26["Vera Rubin 2026: 7 Chips + Groq LPU"]:::instance
    VR26 --> RU27["Rubin Ultra 2027: Kyber 144 GPU + LP35"]:::instance
    RU27 --> FE28["Feynman 2028: Rosa CPU + LP40 + CPO"]:::instance

    classDef instance fill:#0080FF,stroke:#0060CC,color:#FFF

Agentic AI Software Stack

graph TD
    USER["User Prompt"]:::concept --> OC["OpenClaw"]:::instance
    OC --> LLM["Large Language Models"]:::concept
    OC --> TOOLS["Tool Access"]:::concept
    OC --> FS["File Systems"]:::concept
    OC --> SCHED["Task Scheduling"]:::concept
    OC --> SUB["Sub-Agent Spawning"]:::concept
    OC --> IO["Multi-Modal IO"]:::concept

    NC["NemoClaw"]:::instance --> OC
    NC --> OSHELL["OpenShell Security"]:::instance
    NC --> POLICY["Policy Engine"]:::concept
    NC --> PRIVACY["Privacy Router"]:::concept

    LLM --> NEM3["Nemotron 3"]:::instance
    LLM --> THIRD["Third-Party Models"]:::concept

    SAAS["SaaS Companies"]:::concept --> |transform into| GAAS["GaaS Companies"]:::concept
    GAAS --> |deploy| NC

    classDef concept fill:#FF8000,stroke:#CC6600,color:#000
    classDef instance fill:#0080FF,stroke:#0060CC,color:#FFF

✳️ Further Reading

✳️ My Study Notes

Opening

Pre-Show Music

This is how intelligence is made.
A new kind of factory.
Generator of tokens.

Opening Video: Tokens

Tokens: The Building Blocks of AI

The building blocks of AI.
Tokens have opened a new frontier.
Turning data into knowledge.
And drawing on all we have learned.
Tokens are harnessing a new wave of clean energy.
And unlocking the secrets of the stars.
In virtual worlds, they help robots learn.
And in the physical world, perfect.
Forging new paths.
And clearing the way for a bountiful harvest.
In the moments that matter, tokens are already there.

Impact and Potential of Tokens

And in the miles between, they never stop.
They work where human hands cannot.
So we may all breathe easier.
And the smallest hearts beat stronger.
Tokens are helping us break new ground.
On a scale never attempted.
To empower the world.

Welcome and Conference Overview

So we can reach One separation confirmed
Well beyond it We take the next great leap Into a bright new future Built for all mankind
Is where it all begins Welcome to the stage NVIDIA founder and CEO
Welcome to GTC I just want to remind you This is a tech conference
All these people lining up So early in the morning All of you in here It’s great to see you
GTC GTC We’re going to talk about technology We’re going to talk about platforms
NVIDIA has three You think that we Mostly talk about one of them It’s related to CUDA X
Our systems is another platform And now we have a new platform Factories
We’re going to talk about all of them And most importantly we’re going to talk about ecosystems
But before I start talking about ecosystems Let me thank our pre-game show hosts
I thought they did a great job Capital NVIDIA's first venture capitalist Baker NVIDIA's first major institutional investor
These three people are deep in technology Deep in what’s going on
And of course They have just a really broad reach Of technology ecosystem
And then of course All of the VIPs that I hand selected To join us today All-star team
I want to thank all of you for that
I also want to thank all the companies That are here
NVIDIA as you know Is a platform company We have technology We have our platforms We have a rich ecosystem
There are probably 100%

AI Layer Cake and Applications

trillion dollars of industry Here 450 companies sponsored this event I want to thank you
1,000 Technical sessions 2,000 speakers
This conference is going to cover Every single layer Of the five layer cake of artificial intelligence
From land power and shell To infrastructure To the platforms
And of course the most important And ultimately What’s going to get this industry taken off Is all of the applications
What it all began It all began here This is the 20th

1️⃣ CUDA Platform - 20th Anniversary

CUDA’s 20-Year Journey and Ecosystem

anniversary of CUDA We've been working on CUDA for 20 years
For 20 years we’ve been dedicated to this architecture
This revolutionary invention CIMD Single instruction Multi-threaded
Writing scalar code Could spawn off into Multi-threaded application Much much easier to program
We recently added tiles So that we could help people program Tensor cores
And these structures of mathematics That are so foundational To artificial intelligence today
Thousands of people have been working on this for decades
Thousands of tools And compilers And frameworks And libraries
In open source There’s a couple of hundred thousand Public projects
CUDA literally is integrated Into every single ecosystem
Basically describes A hundred percent of NVIDIA’s strategies
You’ve been watching me talk about this slide From the very beginning
And ultimately The single hardest thing to achieve Is the thing on the bottom Installed base
It has taken us 20 years To now have built up Hundreds of millions of GPUs And computing systems around the world That run CUDA
We are in every cloud We’re in every computer company We serve just about every single industry
The installed base Is the reason why The flywheel is accelerating
The installed base is what attracts Developers to the cloud Who then creates new algorithms That achieves a breakthrough
For example Deep learning There are so many others
Those breakthroughs Leads to entirely new markets Which builds new ecosystems around them
With other companies that join Which creates a larger installed base
This flywheel This flywheel Is now accelerating
The number of downloads of NVIDIA libraries Is incredibly accelerating
It’s at a very large scale And growing faster than ever
This flywheel Is what makes This computing platform Able to sustain So much applications So many new breakthroughs
But most importantly It also enables These infrastructures To have extraordinarily useful life

Flywheel Effect and Platform Strategy

And the reason for
Is very obvious There are so many applications That you can run on NVIDIA CUDA
We support the entire Every single phase of the AI life cycle
We address every single phase of the AI life cycle
We address every single data processing platform
We accelerate scientific principle solvers Of all different kinds
And so the application reach Is so great That once you install NVIDIA GPUs The useful life of it Is incredibly high
It is also one of the reasons why Ampere that we shipped them six years ago The pricing of Ampere in the cloud Is going up
And so all of that is made possible Fundamentally because the Install base is high The flywheel is high The developer reach is great
And when all of that happens And we continuously update our software
The computing cost The combination of accelerated computing Speeding up applications tremendously
As we continue to nurture And continue to update software Over its life
Not only do you get the first time pop get the continuous cost reduction Of accelerated computing
And we are willing to nurture Willing to support Every single one of these GPUs In the world
Because they are all architecturally compatible
We are willing to do so Because the install base is so large
If we release a new optimization It benefits millions
This applies to everybody in the world
This combination of dynamics Is what makes the NVIDIA architecture Expand its reach Accelerating its growth
At the same time down computing cost Which ultimately Encourages new growth
So CUDA is at the center of it
But our journey to CUDA Actually started 25 years ago GeForce
I know how many of you grew up with GeForce
GeForce is NVIDIA’s Greatest marketing campaign And it’s the biggest marketing campaign in the world.

The House of GeForce and CUDA’s Foundation

relax This is the house that G-Force made. 25 years ago, we started our journey which led to CUDA. 25 years ago, we invented the programmable shader.
A perfectly unobvious invention to make an accelerator programmable.
The world’s first programmable accelerator, the pixel shader. 25 years ago, that led us to explore further and further, 20 years later, 5 years later, the invention of CUDA.
One of the biggest investments that we made, and we couldn’t afford it at the time, and it consumed the vast majority of our company’s profits, was to take CUDA on the backs of G-Force to every single computer.
We dedicated ourselves to create this platform because we felt so strongly about it.
We felt so strongly about its potential.
But ultimately, the company's dedication to it, despite the hardships in the beginning, believing in every single day, for 13 generations or 20 years, we now have CUDA installed everywhere.

Evolution of Graphics and AI’s Impact

The pixel shader led to, of course, the revolution of G-Force.
And then 10 years ago, we introduced, about 10 years ago, what is it, 8 years ago?
We introduced RTX.
A complete redesign of our architecture for the modern era of computer graphics.
G-Force brought CUDA to the world.
G-Force, therefore, enabled Alex Krushevsky and Ilya Suskovor and Jeff Hinton, Andrew Ang, and so many others to discover that the GPU could be their friend in accelerating deep learning.
It started the big bang of AI. 10 years ago, we decided that we would fuse 10 years ago, we decided that we would fuse programmable shading and introduce two new ideas.
Ray tracing, hardware ray tracing, which is incredibly hard to do, and a new idea at the time.
Imagine, about 10 years ago, we thought that AI would revolutionize computer graphics.
Just as G-Force brought AI to the world, AI is now going to go back and revolutionize how computer graphics is done all together.

Introduction to Neuro-Rendering

Well, today, I’m going to show you something of the future.
This is our next generation of graphics technology.
We call it neuro-rendering.

Visualizing Neuro-Rendering

The fusion, the fusion of 3D graphics and artificial intelligence.
This is DLSS 5.
Take a look at it.
Enter whatever comes to your mind as shown to it.
Hey look at that.
We also have our expert VR.
We’ll learn today how it all works.
Go to our project, Is that incredible?
Computer graphics comes to life.

The Fusion Concept and Trustworthy AI

Now, what did we do?
We fused controllable 3D graphics, the ground truth of virtual worlds, the structured data.
Remember this word, the structured data of virtual worlds, of generated worlds.
We combined 3D graphics, structured data, with generative AI, probabilistic computing.
One of them is completely predictive.
The other one, probabilistic, yet highly realistic.
We combined these two ideas, combined these two ideas, controlled through structured data, controlled perfectly, and yet generating at the same time.
And as a result, the content is beautiful, amazing, as well as controllable.
This concept of fusing structured information and generative AI will repeat itself in the future.
In one industry after another industry after another industry.
Structured data is the foundation of trustworthy AI.

Structured and Unstructured Data Platforms

The ‘Best Slide’ and Structured Data Platforms

Well, this is going to scare you a little bit.
I'm going to flip the slide, and don't gasp.
So we’re going to go through this schematic for the rest of the time.
This is my best slide.
Every time I ask the team, what’s my best slide?
Repeatedly, this was it.
They say, don’t do it, Jensen.
Don’t do it.
I said, no.
These seats are free for some of you.
So this is your price of admission.
So this is structured data.
You’ve heard of it.
SQL, Spark, Pandas, Velox, some of these really, really important, very large platforms.

Data Frames and Enterprise Computing

Snowboard.
Snowflake.
Databricks.
EMR.
Amazon EMR.
Azure Fabric.
Google Cloud.
BigQuery.
All of these platforms are processing data frames.
These data frames are giant spreadsheets, and they hold all of life's information.
This is the structured data, the ground truth of business.
This is the ground truth of enterprise computing.
Well.

Future AI Agents and Database Use

Now we’re going to have AI use structured data.
And we better accelerate the living daylights out of it.
It used to be OK.
And we would, of course, we would accelerate structured data so that we could do more.
We could do it more cheaply.
We could do it more frequently per day and keep the company running at a much more synchronized way.
However, in the future, what’s going to happen is these data structures are going to be used by AI.
And AI is going to be much, much faster than us.
Future agents are going to use structured databases as well.

Unstructured Data Challenges and AI Solutions

And then, of course, the unstructured database, the generative database.
This database represents the vast majority of the world.
Vector databases, unstructured data, PDFs, videos, speeches, all of the world's information, about 90% of what's generated every single year is unstructured data.
Until now, this data has been completely useless to the world.
We read it.
We put it into our file system.
And that’s it.
Unfortunately, we can't query it.
We can’t search for it.
It’s hard to do that.
And the reason for that is because there's no easy indexing of unstructured data.
You have to understand its meaning, its purpose.
And so now we have AI do that.
Just as AI was able to solve multimodality perception and understanding, you can use that same technology, multimodality perception and understanding, to go read a PDF.
To understand its meaning.
And from that meaning, embed it into a larger structure that we can search into, we can query into.
NVIDIA created two foundational libraries.

QDF and QVS Platforms

Just like we created RTX for 3D graphics, we created QDF for data frames, structured data.
We created QVS for vector stores, semantic data, unstructured data, AI data.
These two platforms are going to be the best.
And we’re going to be the best.
We’re going to be two of the most important platforms in the future.
Super excited to see its adoption throughout the network, this complicated network of the world’s data processing systems.
And the reason for that is because data processing has been around a long time.
And therefore, so many different companies and platforms and services.
It has taken us a long time to integrate deeply into this ecosystem.
I’m super proud of the work that we’re doing here.
And then today, we’re announcing several of them.
IBM, the inventor of SQL, one of the most important domain-specific languages of all time, is accelerating Watson X data with QDF.

Accelerated Computing Partnerships

Historical Context and AI Era

Let’s take a look at it.
Sixty years ago, IBM introduced the System 360, the first modern platform for general-purpose computing, launching the computing era.
Then SQL.
This is the system that is used to merge data into a system that uses a declarative language to query data without requiring the computer to be instructed step by step.
And the data warehouse.
Each, the foundations of modern enterprise computing.
Today, IBM and NVIDIA are reinventing data processing for the era of AI by accelerating IBM Watson X.data SQL engines with NVIDIA GPU computing libraries.

Accelerated Computing for AI

AI needs rapid access to massive datasets.
Today's CPU data processing systems can't keep up.
Nestle makes thousands of supply chain decisions every day.
Their order-to-cash data mart aggregates every supply, order, and delivery event across global operations in 185 countries.
On CPUs, Nestle refreshed the data mart a few times a day.
With accelerated Watson X.data running on NVIDIA GPUs, Nestle can run the same workload five times faster at 83% lower cost.
The next computing platform has arrived: accelerated computing for the era of AI.

Dell AI Data Platform

NVIDIA accelerates data processing in the cloud.
We also accelerate data processing on-prem.
As you know, Dell is the world-leading computer systems maker, and they also are one of the world's leading storage providers.
And they worked with us to create the Dell AI data platform that integrates QDF and QVS to create an accelerated data platform, well, for the era of AI.
And this is an example of what they did with NTT data.
Huge speed-up.

Google Cloud and Cost Benefits

This is cloud, Google Cloud.
As you know, we’ve been working with Google Cloud for a very long time.
We accelerate Google’s Vertex AI.
We now accelerate BigQuery, really important framework and really important platform.
And this is an example of our work together with Snapchat, where we reduced their cost of computing by nearly 80%.
When you accelerate data processing, when you accelerate computing, you get the benefit of speed, you get the benefit of scale.
But most importantly, you also get the benefit of cost.
And so all of those come together as one.

Beyond Moore’s Law: Accelerated Computing

It was originally called Moore’s Law.
Moore’s Law was about getting performance doubling every couple of years.
It's another way of saying, so long as the price remains about the same, and most computers remained about the same, you're also getting twice the performance every year.
Or you're reducing the cost of computing every single year.
Well, Moore’s Law has run out of style.
It’s not a new thing.
And that’s why we need a new approach.
Accelerated computing allows us to take these giant leaps forward.

Algorithm Optimization and Cost Reduction

And as you will see later, because we continue to optimize the algorithms, and NVIDIA is an algorithm company, as we continue to optimize the algorithms.
And because our reach is so large and our install base is so large, we can reduce the computing cost, increasing the performance of our software.
You can see this pattern I just mentioned.
I just wanted to show you three versions of it.
NVIDIA built the accelerated computing platform.
It has a bunch of libraries on top.
I gave you three examples.
RTX is one of them.
QDF is another.
QVS.
And we’ll show you a few more.
These libraries sit on top of our platform.
But ultimately, we integrate into the world’s cloud services, into the world’s OEMs.
Together, and other platforms that I’ll show you, together we’re able to reach the world.
This pattern, NVIDIA, Google Cloud, Snapchat, will repeat over and over again and kind of looks like this.
And so this is one example, NVIDIA with Google Cloud.
We accelerated Vertex AI.
We accelerated BigQuery.
I’m super proud of the work that we’ve done with JAX and XLA.
We are incredible on PyTorch.
We’re the only accelerator in the world that’s incredible on PyTorch.
We’re the only accelerator in the world that’s incredible on PyTorch and incredible on JAX and XLA.
And the customers that we support, the Baseten, the CrowdStrikes, Puma, Salesforce, they’re not our customers, but they’re customers, developers of ours.
We’ve integrated the NVIDIA technologies into that we can then land on the clouds.
Our relationship with cloud service providers are essentially us bringing customers to them.
We integrate our libraries.
We accelerate workloads.
And we land those customers in the clouds.
And so, as you can see, most of our cloud service providers love working with us.
And they’re always asking us to land the next customer on their cloud.

Customer Landing and Cloud Growth

And I just want to let you know, there are a lot of customers.
We’re going to accelerate everybody.
And so there will be lots and lots of customers who will be able to land in your cloud.
Just be patient with us.

Google Cloud and AWS

And so this is Google Cloud.
This is AWS.
We've been working with AWS a long time.
And one of the areas, one of the things I'm super excited about this year is we're going to bring OpenAI to AWS.
And so it’s going to drive enormous consumption of cloud computing at AWS.
It’s going to expand the reach and expand the compute of OpenAI.
And as you know, they are completely compute constrained.
And so AWS, we accelerate EMR.
We accelerate SageMaker.
We accelerate Bedrock.
NVIDIA’s integrated it really deeply into AWS.

Microsoft Azure and Confidential Computing

They were our first cloud partner.
Microsoft Azure.
NVIDIA’s A100 supercomputer was the first one we built was for NVIDIA.
The first one we installed was at Azure.
And that led to the big successful partnership with OpenAI.
But we’ve been working with Azure for quite a long time.
We accelerate Azure cloud now, it’s their AI Foundry, we partner deeply with.
We accelerate Bing search.
We work with them on Azure regions.
This is one of the areas that is incredibly important as we continue to expand AI throughout the world.
One of the capabilities that we offer is confidential computing.
That in confidential computing, you want to make sure that even the operator cannot see your data, even the operator cannot touch or see your models.
Confidential computing and VSGPs use the first ones in the world to do that.
It’s now able to support confidential computing and protected deployment of these very valuable OpenAI models and and and tropic models throughout clouds.
And different types of AI, different regions.
And all because of our account, confidential computing, confidential computing super important.

Synopsis, Oracle, and AI Clouds

And here’s an example where we have different customers that we work with.
Synopsis, a great partner of ours, we’re accelerating all of their EDA (Electronic Design Automation) and CU (Computational Use) workflows, and then we landed at Microsoft Azure.
We were Oracle’s first AI customer.
Most people would have thought we were their first supplier, we’re their first supplier also, but we were their first AI customer.
I’m quite proud of the fact that I explain AI clouds to Oracle for the first time and we were their first customer.
Since then, they’ve really taken off.
We’ve landed a whole bunch of our partners, their Cohere and Fireworks and of course, very famously, OpenAI.
A great partnership with Cohere.
Cohere, we’ve, they’re the world’s first AI-native cloud.
A company that was built with only one singular purpose: to provision, to host GPUs as the era of accelerated computing showed up and the host for AI clouds.
They’ve got some fantastic customers and they’re growing incredibly.
One of the platforms that I’m quite excited about is Palantir and Dell.
The three of our companies have made it possible to stand up a brand new type of AI platform, the Palantir Ontology platform and AI platform.
And we could stand up these platforms in any country, in any air-gapped region, completely on-prem, completely on-site, completely in the field.

Palantir, Dell, and AI Platforms

AI could be deployed literally everywhere without our confidential computing capability, without our ability to build the end-to-end system, as well as offer the entire.
Accelerated computing and AI stack from data processing whether it’s vectors or structures all the way to AI it would have been possible.
I wanted to show you these examples.
This is.

NVIDIA’s Vertical Integration Strategy

Our special working relationship with the world’s cloud service providers and many while all of them are here and I get the benefit of seeing them during boot tour and it’s just so incredibly exciting.
I just want to thank all of you for the hard work what Nvidia has done is this.
Are you going to see this theme over and over again.
And videos vertically integrated the world’s first vertically integrated.
But horizontally open.
And the reason that’s necessary is very simple.
Accelerated computing is not a chip problem.
Accelerated computing is not a systems problem accelerated computing has a missing word we just never say it anymore application acceleration.
You can if I could make a computer run everything faster that’s called a CPU.
But that’s run out of steam.
The only way for us to accelerate applications going forward and continue to bring.
Tremendous speed up tremendous cost reduction is through application or domain specific acceleration.
I drop that phrase in the in the front and therefore just became applicant accelerated computing.
And that is the reason why Nvidia has to be library after library domain after domain vertical vertical.
We are a vertically integrated computing company.
There is.
No other way.
We have to understand the applications we have to understand the domain we have to understand fundamentally the algorithms and we have to figure out how to deploy the algorithm..
In whatever scenario it wants to be deployed whether it’s a data center cloud on Prem at the edge or in a robotic system.
All of those computing systems are different and finally the systems and chips.
We are vertically integrated.
What makes it incredibly powerful.
And the reason why you saw all the slides.
That’s because Nvidia is horizontally open.
We’ll work and integrate and videos technology into whatever platform you would like us to integrate into we offer you the software we offer you libraries.
We integrate with your technology so that we can bring accelerated computing to everybody in the world.

GTC Demonstrations and Industries

Well.
This GTC is really a great demonstration of that.
You know most of the time.
Most of the time you’ll see me talk about.
These verticals and I’ll use some examples but in every single case.
Whether it's automotive by the way financial services the largest percentage of attendees at this GTC is from the financial services industry.
I know I’m hoping it’s developers not traders.

Ecosystem and Supply Chain

Guys.
Here’s here’s.
Here’s one thing I wanted.
To say and so.
In the audience represents in videos ecosystem.
Upstream of our supply chain and downstream of our supply chain and we work we think of our supply chain upstream and downstream.
And it’s just so exciting.
That.
Our entire upstream supply chain this last year.
Irrespective of whether you’re a 50 year old company we have 70 year old companies.
We have a hundred.
And fifty year old company.
Who are now part of the video supply chain and partner with us either upstream or downstream and last year.
You had your record year.
Did you not.
Congratulations.
We’re on to something here.
This is the beginning of something very very big.
And so.

Domain-Specific Libraries and Verticals

If you look at accelerated computing we’ve now set the computing platform.
But in order for us.
To activate those computing platforms we need to have domain specific libraries.
That solve very important problems in each one of the verticals that we address you see us addressing every single one of this.
Autonomous vehicles.
Are reach our breath.
Our impact incredible we have a track on that financial services I just mentioned algorithmic trading is going from classical.
Machine learning.
With human feature engineering call.
Quant the quants did that to now.
Super computers studying massive amounts of data discovering insight and discovering patterns by itself and so this is going through its deep learning and its transformer moment.
Healthcare is going is going through their ChatGPT moment some really exciting work that we're there.
We have we have a great keynote track here we have a great keynote track Kimberly pounds in the great keynote track.
For healthcare.
We’re talking about AI physics or AI biology.
For drug discovery AI agents for customer service.
And support.
Of diagnosis diagnosis and of course physical AI robotic systems.
All these different vectors of AI have different platforms that in video provides industrial we are completely resetting and starting the largest build out of human history and.
Most of the world’s industries.

Industry-Specific AI Deployments

Building AI factories building chip plants building computer plants are represented here today.
Media and entertainment gaming of course real time AI platform.
So that we could.
Translation and broadcast support and live live games and live video.
Enormous amount of it will be augmented with AI we have a we have a platform called Hollis can quantum there. 35 different companies.
Here building with us the next generation of quantum GPU hybrid systems.
Retail and CPG using Nvidia for supply chain using creating a genetic shopping systems.
AI agents for customer support a lot of work being done here.
35 trillion dollar industry robotics 50 trillion dollar industry and manufacturing and videos been working in this area for a decade now building 3 computers the fundamental.
Computers necessary to work.
Build robotic systems we are integrated with working with literally every single company that we know of building robots we have a 110 robots here at the show and then telecommunications.
About as large as the world’s IT industry about 2 trillion dollars.
We see of course base stations everywhere it's one of the world's infrastructures it was the infrastructure of the last generation of computing.
That infrastructure.
Is going to get completely reinvented and the reason for that is very simple that base station which is.
It does one thing which is base station.

Telecommunications Infrastructure

Is going to be an AI infrastructure platform in the future AI will run at the edge.
And so lots of lots of great great discussion there in our platform there is called Aerial or a Iran big partnership with Nokia big nut partnership with T-Mobile and many others.
At the core of our business.
Everything that I just mentioned.

CUDA X Libraries and Algorithms

Computing platforms but very importantly our CUDA X libraries are CUDA X libraries is the algorithm the algorithms and invidia invents we are an algorithm company.
That’s what makes us special that what that’s what makes it possible for me to be able to go into every single one of these industries.
Imagine the future and have the world’s best computer scientists.
Describe and solve problems.
Refactor and re express it.
And turn it into a library we have so many I think we have.
At this show we’re announcing a hundred.
A hundred libraries.
So 70 libraries maybe 40 models.
And that’s just at the show we’re updating these all the time we’re updating them all the time.
The libraries is the crown jewels of our company it is what makes it possible for that platform the computing platform to be.
Activated in service of solving a problem making impact one of the biggest one of the most important libraries that we ever created.
Who D and N.
CUDA deep neural networks it completely revolutionized artificial intelligence cause the big bang of modern AI let me show you a short video about CUDA X. 20 years ago we built CUDA.

CUDA Evolution and Library Impact

A single architecture for accelerated computing.
Today we’ve reinvented computing a thousand CUDA X libraries help developers make breakthroughs in every field of science and engineering.
cuOpt for decision optimization.
cuLitho for computational lithography.
cuDSS for direct sparse solvers.
cuEquivariance for geometry aware neural networks.
Look around take someplace you’re correctly eliminated.

Simulation and Algorithmic Understanding

Dangers.
I’m sorry.
Everything you saw was a simulation.
Some of it was principle solvers.
Fundamental physics solvers.
Some of it was AI surrogate.
AI physical models, and some of it was physical AI robotics models.
Everything was simulated.
Nothing was animated.
Nothing was articulated.
Everything was completely simulated.
That is what fundamentally NVIDIA does.

Vertical Integration and Openness

It is through the connection of understanding of the algorithms with our computing platforms that we’re able to open up to unlock these opportunities.
NVIDIA is a vertically integrated computing company with open horizontal integration with the world.

2️⃣ Inference Inflection

AI Ecosystem: Established and Native Companies

So that’s CUDA X.
Well, just now you saw a whole bunch of companies.
You saw Walmart and, you know, there’s L'Oreal and incredible companies, established companies, JP Morgan and Roche.
These are companies that define society today.
Toyota.
These are some of the largest companies in the world.
It is also true that there’s a whole bunch of companies you’ve never heard of.
These are companies, we call them AI natives, a whole bunch of small companies.
The list is gigantic.
This is just a little tiny bit of it, and I couldn’t decide whether to show you more or show you less, and so I made it so that you couldn’t see any.
And nobody’s feelings are hurt.
However, inside this list are a bunch of brand new companies.
They’re companies like, for example, you might have heard a couple of them, OpenAI, Anthropic, but there’s a whole bunch of others.
There’s a whole bunch of others, and they serve different verticals.

Investment Boom and AI Natives

Something happened in the last two years, particularly this last year.
We’ve been working with the AI natives for a long time, and this last year it just skyrocketed.
I’ll explain to you why it happened.
This is a big deal.
The industry has skyrocketed, $150 billion of investment into venture investment, into startups, the largest in human history.
This is also the first time that the scale of the investments went from millions of dollars, tens of millions of dollars, to hundreds of millions of dollars and billions of dollars.
And the reason for that is this is the first time in history that every single one of these companies needs a company.
They need compute and lots and lots of it.
They need tokens, lots and lots of it.
They’re either going to create and build and create tokens and generate tokens, or they’re going to integrate, add value to tokens that are available, created by Anthropic and OpenAI and others.
And so this industry is different in so many different ways, but the one thing that is very clear.
The impact that they’re making, the incredible value that they’re delivering already is quite tangible.
AI natives.

Reinventing Computing and Generative AI

All because we reinvented computing.
Just like during the PC revolution, a whole bunch of new companies were created.
Just as during the Internet revolution, a whole bunch of companies were created, and in mobile cloud, a whole bunch of companies were created.
Each one of them had their own standards, and we’re talking about one of the major standards that just happened, incredibly important.
And this generation, we also have our own large number of very, very special companies.
We reinvented computing.
It stands to reason there’s going to be a whole new crop of really important companies, consequential companies for the future of the world.
The Googles, the Amazons, the Metas, consequential companies that have come as a result of the last computing platform shift.
We are now at the beginning of a new platform shift.
But what happened in the last couple of years?
Well, we’ve been watching, as you know, we’ve been working on deep learning and working on AI.
The big bang of modern AI, we were right there at the spot, and we’ve been advancing this field for quite some time.
But why the last two years?
What happened in the last two years?
Well, three things.
ChatGPT, of course, started the generative AI era.
It’s able to not just understand, perceive and understand.
It’s able to also translate and generate, generation of unique content, I showed you the fusion of generative AI with computer graphics, and it brought computer graphics to life.
You guys, everybody in the world should be using ChatGPT.
I know I use it every single morning.
I used it plenty this morning.
And so ChatGPT was the generative AI era.
The second, by the way, generative computing versus the way we used to do computing, it’s not, generative AI is a capability of software.
But it has profoundly changed, it has completely changed how computing is done.
Computing used to be retrieval based, now it's generative.
Keep that thought in mind when I talk about certain things.
And you’ll realize why it is that everything that we do is going to change how computers are architected, how computers are provided.
How computers are going to be built out, and what is the meaning of computing altogether.

Generative, Reasoning, and Agentic AI

Generative AI, 2023, end of 22, 2023.
Reasoning AI, o1, which, and then took off with o3.
Reasoning allowed it to reflect, allows it to think to itself, allowed it to plan, break down problems and decompose a problem it couldn't understand into steps or parts that it could understand.
It could ground itself on research. o1 made generative AI trustworthy and grounded on truth.
That caused ChatGPT to simply took off, and that was a very, very big moment.
The amount of input tokens that was necessary in order to produce and the amount of output tokens it generated in order to reason, the model was a little bit larger.
You know, of course, you could have much larger models.
o1 was a little bit larger, not much larger, but its input token usage for context and its output token for context and for thinking increased the amount of computation tremendously.
Then came Claude Code, the first agentic model.
It was able to read files, code, compile it, test it, evaluate it, go back and iterate on it.
Claude Code has revolutionized software engineering, as all of you know. 100% of NVIDIA is using a combination of, or oftentimes all three of them, Claude Code, Codex, and Cursor, all over NVIDIA.
There’s not one software engineer today who is not assisted by one or many AI agents helping them code.
Claude Code completely revolutionizes the new inflection.
And for the first time, you don't ask an AI what, where, when, how.
You ask it create, do, build, you ask it to use tools, take your context, read files.
It’s able to agentically break down a problem, reason about it, reflect on it.
It’s able to solve problems and actually perform tasks.
An AI that was able to perceive became an AI that could generate.
An AI that could generate became an AI that could reason.
An AI that could reason now became an AI that can actually do work.
Very productive work.

Inference Inflection Point

The amount of computation in the last two years, we know that everybody in this room knows, the computing demand for NVIDIA GPUs is off the charts.
Spot pricing is skyrocketing.
You couldn’t find a GPU if you tried, and yet in the meantime, we’re shipping GPUs out.
Incredible amounts of it.
And demand just keeps on going up.
There’s a reason for that.
This fundamental inflection.
Finally, AI is able to do productive work, and therefore, the inflection point of inference has arrived.
AI now has to think.
In order to think, it has to inference.
AI now has to do.
In order to do, it has to inference.
AI has to read.
In order to do so, it has to inference.
It has to reason.
It has to inference.
Every part of AI, every time it has to think, it has to reason, it has to do, it has to generate tokens, it has to inference.
It's way past training now.
It’s in the field of inference.
So the inference inflection has arrived.

Massive Compute Demand and Revenue Projections

At the time when the amount of tokens, the amount of compute necessary, increased by roughly 10,000 times.
Now, when I combine these two, the fact that since the last two years, the computing demand of the work has gone up by 10,000 times.
And the amount of usage, the amount of usage has probably gone up by 100 times.
People have heard me say, I believe that computing demand has increased by 1 million times in the last two years.
It is the feeling that we all have.
It is the feeling every startup has.
It’s the feeling that OpenAI has.
It’s the feeling that Anthropic has.
If they could just get more capacity, they could generate more tokens.
Their revenues would go up.
More people could use it.
The more advanced, the smarter the AI could become.
We are now at that positive flywheel system.
We have reached that moment.
The inflection, the inference inflection has arrived.
Last year, at this time, I said, that where I stood at that moment in time, we saw about $500 billion.
We saw $500 billion of very high confidence demand and purchase orders for Blackwell and Rubin through 2026.
I said that last year.
Now, I don’t know if you guys feel the same way, but $500 billion is an enormous amount of revenue.
Not one impressed.
I know why you’re not impressed, because all of you had record years.
Well, I’m here to tell you that right now where I stand, a few short months after GTCDC, one year after last GTC, right here where I stand, I see, through 2027, at least $1 trillion.

Inference Focus and Universal AI Platform

Now, does it make any sense?
And that’s what I’m going to spend the rest of the time talking about.
In fact, we are going to be short.
I am certain computing demand will be much higher than that.
And there’s a reason for that.
So the first thing is, we did a lot of work in the last year.
Of course, as you know, 2025 was NVIDIA's year of inference.
We wanted to make sure that not only were we good at training and post-training, that we were incredibly good at every single phase of AI.
So that the investments that were made, investments made in our infrastructure, could scale out for as long as they would like to use it.
And the useful life of NVIDIA’s infrastructure would be long, and therefore the cost would be incredibly low.
The longer you could use it, the lower the cost.
There’s no question in my mind, NVIDIA systems are the lowest cost infrastructure you could get for AI infrastructure in the world.
And so the first part was, last year was all about AI for inference.
And it drove this inflection point.
Simultaneously, we were very pleased last year that Anthropic has come to NVIDIA.
That MSL, MetaSL, has chosen NVIDIA.
And meanwhile, meanwhile, and as a collection, as a group, this represents one third of the world’s AI compute.
Open source models.
Open source models have reached near the frontier, and it is literally everywhere.
And NVIDIA, as you know, today, we’re the only platform in the world today that runs every single domain of AI across every single one of these AI models.
In language, in biology, in computer graphics, computer vision, in speech, proteins and chemicals, robotics and otherwise, edge or cloud, any language.
NVIDIA’s architecture is fungible for all of that, and we’re incredible for all of that.
That allows us to be the lowest cost, the highest confidence platform.
Because when you’re building these systems, as I mentioned, a trillion dollars is an enormous amount of infrastructure.
You have to have complete confidence, that the trillion dollars you're putting down will be utilized, would be performant, would be incredibly cost effective.
And have useful life for as long as you could see.
That infrastructure investment you could make on NVIDIA, you could make with complete confidence.
We have now proven that.
It is the only infrastructure in the world that you could go anywhere in the world and build with complete confidence.
You want to put it in any of the companies, you want to put it in any of the clouds, we’re delighted by that.
You want to put it on-prem, we’re happy about that.
You want to put it in any country anywhere, we’re delighted to support you.
We are now a computing platform that runs all of AI.

Business Segments and AI Resilience

Now, our business already starting to show that. 60% of our business is hyperscalers, the top five hyperscalers.
However, even within that top five hyperscalers, some of it is internal AI consumption.
The internal AI consumption really important work, like Rexis is moving from recommender systems of tables and collaborative filtering and content filtering.
It’s moving towards deep learning and large language models.
Search, moving to deep learning, large language models.
Almost all of these different hyperscale workloads are now moving, shifting towards a workload that NVIDIA GPUs are incredibly good at.
But on top of that, because we work with every AI lab, because we accelerate every AI model, and because we have a large ecosystem of AI natives that we work with, that we can bring to the clouds.
That investment, no matter how large, no matter how quick, that compute will be consumed.
And that represents 60% of our business.
The other 40% is just everywhere.
Regional clouds, sovereign clouds, enterprise, industrial, robotics, edge, big systems, super computing systems, small servers, enterprise servers, the number of systems, incredible.
The diversity of AI is also its resilience.
The span of reach of AI is its resilience.
There is no question this is not a one app technology.
This is now fundamental.
This is absolutely a new computing platform shift.

Year of Inference: Hardware Innovations

Well, our job is to continue to advance the technology.
And one of the most important things that I mentioned last year was last year was our year of inference.
We dedicated everything.
We took a giant chance and reinvented while Hopper was at its prime, and it was just cooking.
We decided that the Hopper architecture, the NVLink by 8, had to be taken to the next level.
We completely re-architected the system, disaggregated the computing system altogether, and created NVLink 72.
The way that it’s built, the way it’s manufactured, the way it’s programmed, completely changed.
Grace Blackwell NVLink 72 was a giant bet.
And it wasn’t easy for anybody.
And many of my partners here in the room, I want to thank all of you for the hard work that you guys did.
Thank you.
Thank you.
NVLink 72.
NVFP4.
Not just FP4 precision.
FP4 is a whole different type of tensor core and computational unit.
We’ve demonstrated now that we can inference NVFP4 without loss of precision, but gigantic boost in performance and energy efficiency.
We’ve also been able to use NVFP4 for training.
So, NVLink 72.
NVFP4.
The invention of Dynamo, Tensor RT LLM, a whole bunch of new algorithms.
We even built a supercomputer to help us optimize kernels and help us optimize our complete stack.
We call it DGX Cloud.
We invested billions of dollars of supercomputing capability to help us create the kernels, the software that made inference possible.
Well, the results all came together, and people used to tell me, but Jensen, inference is so easy.
Inference is the ultimate hard.
Inference is ultimate hard.
It is also ultimate important because it drives your revenues.
And so this is the outcome.
This is from semi-analysis.
This is the largest, most comprehensive sweep of AI inference that has ever been done.
And what you see here on the left, on this side, on this side, is tokens per watt.
Tokens per watt is important because every data center, every single factory, by definition, is power constrained.
A one gigawatt factory will never become two.
It's physically constrained.
The laws of atoms, the laws of physicality.
And so that one gigawatt of data center, you want to drive the maximum number of tokens, which is the production, the product of that factory.
So, you want to be on top of that curve as high as you want.

Performance, Cost, and Token Factory

This, the x-axis, is the interactivity, the speed of inference, the speed of each inference.
The faster you can inference, the faster you could, of course, respond.
But very importantly, the faster you can inference, the larger the models, the more context you could process, the more tokens you can think through.
This axis is the same as smartness of the AI.
And so, this is the throughput of the AI.
This is the smartness of the AI.
Notice, the smarter the AI, the lower your throughput.
Makes sense.
You’re thinking longer.
Okay?
And so, this axis is the speed, and I’m going to come back to this.
This is important.
This is where I torture all of you.
But it’s too important.
Every CEO in the world, you watch, every CEO in the world will study their business from now on in the way I'm about to describe.
Because this is your token factory.
This is your AI factory.
This is your revenues.
There’s no question about that going forward.
And so, this is the throughput.
This is the intelligence.
Better per watt for a given power of data center, the more throughput, the more tokens you could produce.
On this side is cost.
Notice, NVIDIA is the highest performance in the world.
Nobody would be surprised by that.
They would be surprised by the fact that in one generation, whereas Moore’s Law would have given us, through transistors, 50%, two times.
Moore’s Law would probably give us one and a half times more performance.
You would have expected from Hopper H200, one and a half times higher.
Nobody would have expected 35 times higher.
I said last year, at this time, that NVIDIA's Grace Blackwell, NVLink 72, was 35 times perf per watt.
Nobody believed me.
And then, SemiAnalysis came out, and Dylan Patel had a quote.
He accused me of sandbagging.
He says, Jensen sandbagged.
It's actually 50 times.
And he’s not wrong.
He’s not wrong.
And so, our cost per token is the lowest in the world.
You can’t beat it.
I've said before, if you have the wrong architecture, even if it's free, it's not cheap enough.
And the reason for that is because no matter what happens, you still have to build a gigawatt data center.
You still have to build a gigawatt factory.
And that gigawatt factory, for $40,000, it’s not cheap.
It’s not cheap.
But that gigawatt factory, for 15 years, amortized across, that gigawatt factory is about $40 billion.
Even when you put nothing on it, it’s $40 billion in.
You better make for darn sure you put the best computer system on that thing so that you could have the best token cost.
NVIDIA’s token cost is world-class.
Basically, untouchable at the moment.

NVIDIA’s Token Co-Design Strategy

And the reason that’s true is because of extreme co-design.
NVIDIA’s token co-design is an observation where the effectiveness, the performance, and the token cost production capability for their factories is everything to them.
And this is what happened.
This is, we updated their software, same system, and notice their token speeds.
Incredible.
The difference before, before NVIDIA updated everything and all of our algorithms and software and all the technology that we bring to bear.
About 700 tokens per second average went to nearly 5,000, seven times higher.
And so this is the incredible power of extreme co-design.
I mentioned earlier the importance of factories.
This is the importance of factory.
Your data center, it used to be a data center for files.

The Data Center as a Token Factory

It’s now a factory to generate tokens.
Your factory is limited no matter what.
Everybody is looking for land, power, and shell.
Once you build it, you are power limited.
Within that power limited infrastructure, you better make for darn sure that your inference, because you know inference is your workload and tokens is your new commodity.
That compute is your revenues.
That you want to make sure that the architecture is as optimized as you can.
In the future.
Every single CSP.
Every single computer company, every single cloud company, every single AI company, every single.
Company period.
Are going to be thinking about their token factory effectiveness.
This is your factory in the future, and the reason why I know that is because everybody in this room is powered by intelligence.
And in the future that intelligence will be augmented by tokens.
So let me show you how we got here.
On April 6th, 2016, a decade ago, we introduced DGX-1, the world’s first computer designed for deep learning.
Eight Pascal GPUs connected with the first generation and V-link 170 teraflops in one computer, the world’s first computer designed for AI researchers.
With Volta, we introduced NVLink switch. 16 GPUs connected with full all to all bandwidth operating as one giant GPU.
A giant step forward, but model sizes continued to grow.
The data center needed to become a single unit of computing.
So Mellanox joined Nvidia.
In 2020, DGX A100 SuperPod became the first GPU supercomputer combining scale up and scale out architecture.
NVLink three for scale up, connect X six and quantum and finna band for scale out.
Then Hopper, the first GPU with the FPA transformer engine that launched the generative AI era.
NVLink for ConnectX-7, BlueField-3 DPU, second generation Quantum InfiniBand be a revolutionized computing.
Blackwell redefined AI supercomputing system architecture with envy link 72 72 GPUs connected by envy link spine 130 terabytes per second of all to all bandwidth.
Compute trace integrate Blackwell GPU grace CPUs connect X eight and bluefield three. scale out runs over spectrum for Ethernet.
With three scaling laws and full.
Pre training post training and inference and now a genetic systems compute demand continues to grow exponentially.
And now, Vera Rubin. architected for every phase of a genetic AI advancing every pillar of computing, including CPU storage networking and security.
Vera Rubin NVLink 72. 3.6 exa flops of computer. 260 terabytes per second of all to all envy link band.
The engine supercharging the era of a genetic AI the Vera CPU wrap.
Designed for orchestration and a genetic workforce the STX rack Ai native storage built with bluefield for. scale out with spectrum X co packaged optics increasing energy efficiency and resilience.
And an incredible new addition.
The grog three LP X rack. tightly connected to Vera Rubin rocks LP use massive on chip s RAM a token accelerator to the already incredibly fast.
Together 35 times more throughput per megawatt.

The Vera Rubin Platform

The new Vera Rubin platform seven chips five rack scale computers one revolutionary Ai supercomputer for a genetic Ai. 40 million times more compute.
In just 10 years.
No in the in the good old days when I would say hopper I would hold up a chip.
That’s just adorable.
This is Vera Rubin.
When we think Vera Rubin, we think the entire system, vertically integrated, completely with software, extended end-to-end, optimized as one giant system.
The reason why it’s designed for agentic systems is very clear, because agents, of course, the most important workload is it’s thinking the large language model.
The large language models are going to get larger and larger and larger.
It’s going to generate more and more tokens more quickly so it can think more quickly, but it also has to access memory.
It’s going to pound on memory really hard.
KV cache, structured data, QDF, unstructured data, QVS.
It's going to be pounding on the storage system really, really hard, which is the reason why we reinvented the storage system.
It is also going to use tools.
And unlike humans that are more tolerant to slower computers, AI wants the tools to be as fast as possible.
These tools, web browsers in the future, they could also be virtual PCs in the cloud.
Those PCs have to be.
And those computers have to be as fast as possible.
We created a brand new CPU.
A brand new CPU that's designed for extremely high single-threaded performance, incredibly high data output, incredibly good at data processing, and extreme energy efficiency.
It is the only data center CPU in the world that uses LPDDR5.
LPDDR5.
And incredible single-thread performance and performance per watt that is unrivaled.
And so that’s, we built that so that it could go along with the rest of these racks for agentic processing.
And so here it is.
This is the Grace Blackwell, no, Vera Rubin.
Where is it?
Here it is.
Okay.

System Architecture and Innovations

So this is the Vera Rubin system.
Notice, since the last time, 100% liquid cooled.
All of the cable's gone.
What used to take.
What used to take.
Two days to install now takes two hours.
Incredible.
And so the manufacturing cycle time is going to dramatically reduce.
This is also a supercomputer that is cooled by.
It’s cooled by hot water 45 degrees, which takes the pressure off of the data center, takes all of that cost and all of that energy that’s used to cool the data center and makes it available for.
The system.
This is the secret sauce.
It is the only we’re the only company in the world that has today built the 6th 6th generation scale up switching system.
This is not Ethernet.
This is not infinite band.
This is NVLink.
This is the 6th generation NVLink.
This is insanely hard to do.
Well, it is insanely hard to do period.
And I’m just super proud of the team NVLink completely legal cool.
This.
Is the brand new Groq system and I’ll show you a little bit more about it.
This system. 8 Groq chips.
This is the LP 30 the world’s never seen it anything that the world’s ever seen is V1.
This is 3rd generation.
And we’re in volume production now and I’ll show you more about that in just a second.
The world’s first.
CPO.
Spectrum X switch.
This is also in full production.
Co packaged optics.
Optics comes directly onto this chip interfaces directly to silicon electrons gets translated to photons and it gets directly directly connected to this chip.
We invented the process technology with TSMC or the only one in production with it today is called coop.
It’s completely revolutionary.
Nvidia is in full production with Spectrum X.
This is the Vera system.
Twice the performance per watt of any any CPUs in the world today.
It is also in production.
Well you know we never we never.
Thought we would be selling CPU standalone.
We are selling a lot of CPU standalone.
This is already for sure going to be a multi billion dollar business for us.
So I’m very very pleased with our CPU architects we’ve designed a revolutionary CPU.
And this.
Is the.
ConnectX-9.
Powered with Vera CPU the bluefield for STX our new storage platform.
Okay so these are the four these are the the racks and it’s connected.

3️⃣ AI Factory - Vera Rubin

NVLink and Ethernet Racks

Each one of these racks the NVLink rack.
This is.
I’ve shown you guys this before it’s a super heavy and seems to get heavier every year.
Because I think there’s just more cables in there every year.
And so so this is the NVLink rack we’ve also taken this technology because it is so.
Efficient to create a data center with these cabling systems structured cables so we decided to do that for Ethernet so this is Ethernet two hundred fifty six.
Liquid cooled nodes in one rack and it is also connected with.
These incredible connectors.
You guys want to see.

Rubin Ultra and Kyber Rack System

Reuben ultra.
So this is the Reuben ultra compute node.
Unlike.
Reuben.
That slides in horizontally.
Reuben. in Ultra goes into a whole new rack, it’s called Kyber, that enables us to connect 144 GPUs in one MV-Link domain.
And so the Kyber rack, this, I could lift it, I’m sure, but I won’t.
It’s quite heavy.
This is one compute node and it slides into the Kyber rack vertically.
This is where it connects into.
This is the mid-plane.
The Kyber racks, those four top MV-Link connectors slide in and connect into this and this becomes one of the nodes.
And each one of these racks is a different compute node and this is the amazing part.
This is the mid-plane and the back of the mid-plane, instead of the cabling system, which has its limit.
In terms of how far we could drive cables, copper cables, we now have this system to connect 144 GPUs.
This is the new MV-Link.
This sits also vertically and it connects into the mid-planes on the back.
Compute in the front, MV-Link switches in the back, one giant computer.
Okay?
So that is.
Ruben Ultra.
As I mentioned, as I mentioned, how about we take this back down?
I need the rest of my slides.
Oh, it’s coming down?
Okay.
Thank you, Janine.
This is what happens when you, this is what happens when you don’t practice.
Okay, all right.
So, you saw, you, take your time, just don’t get hurt.

Token Factory Economics

AI Factories Throughput and Token Speed

You saw, you saw this slide.
You know, only on NVIDIA's keynote will you see last year's slide presented again.
And the reason for that is I just want to let you know that last year I told you something very, very important and it’s so important it’s worthwhile to tell you again.
This is probably the single most important chart for the future of AI factories.
And every CEO, every CEO in the world will be tracking it, will be studying it very deeply.
It’s much, much more complicated than this.
It’s multi-dimensional.
But you will be studying the throughput and the token speed of your AI factories.
The throughput, token speed at ISO power, because that’s all the power you have.

Token Pricing and Market Segmentation

And that analysis is going to lead directly to your revenues.
What you do this year will show up precisely next year as your revenues.
And this chart is what it’s all about.
And I said on the vertical axis, on the vertical axis, thank you guys, on the vertical axis is throughput.
On the horizontal axis is token rate.
Today I’m going to show you this.
Because we’re able because we’re now able to increase the token speed and because model sizes are increasing.
Because the token length the context length depending on the different grades of different application use case continues to grow from maybe a hundred thousand tokens input length to maybe millions.
The token input length is growing and also the output token length is growing and so all of these play into ultimately.
The marketing and the pricing of future tokens tokens are the new commodity.
And like all commodities once it reaches an inflection once it becomes mature or becomes maturing it will segment into different parts.
The high throughput.
Low speed could be used for the free tier.
The next year could be the medium tier larger model maybe higher speed for sure larger input context length.
That translates to a different price point you could see from all the different services this one is free it’s a free tier.
The first year could be $3 per million tokens the next year could be $6 per million tokens you would like to be able to keep pushing this boundary.
Because.
The.
Low speed.
Speed.
The larger the model smarter the more input token context length more relevant.
The higher the speed.
The long the more you can think and iterate smarter AI models so this is about smarter AI models.
And when you have smarter and models each one of these clicks allows you to increase the price so this is $45.
And maybe one day there'll be a premium model that allows you a premium service that allows you to.
Generate token speeds that are incredibly high because you're in a critical path or maybe you're doing really long research and $150 per million tokens is just not a thing so let's translate that.
Suppose you were to use 50 million tokens per day as a researcher at $150 per million tokens.
As it turns out as a research team that’s not even a thing.
So we believe that this is the future this is where I wants to go this is where it is today.

Performance Improvements Across Tiers

It had to start here to establish the value and establishes usefulness and get better and better and better.
In the future you’re going to see most services encompass it encompass all of that this is hopper.
Hoppers started and I moved it moved the chart this is 50 this is 100.
Hopper looks like this and you would have expected hopper the next generation to be higher but nobody would have expected it to be that much higher this is Grace Blackwell what Grace Blackwell is.
Blackwell did is at your free tier increase your throughput tremendously.
However.
Where you mostly monetize your service it increase your throughput by 35 times.
This is no different than any product that every company makes the higher the tier the higher the quality the higher the performance the lower the volume the lower the capacity.
And so it is no different than any other business in the world and so.
Now.
We’re able to increase.
This tier by 35 X and we introduced a whole new tier.
This this is the benefit of Grace Blackwell a huge jump over hopper.
Well this is what we’re doing with.
Okay so this is Grace Blackwell.
Okay let me just reset reset this.
And this is very ruby.
Okay.
Now just think just think what just happened.
At every single tier at every single tier and every single tier we increase the throughput.
And at the tier that where your highest ASP and your most valuable segment we increased it by 10 X.
That is the hard work this is incredibly hard to do out here this is the benefit of every link 72 this is the benefit of.
This is the benefit of extremely low latency this is the benefit of extreme co-design that we can shift the entire area.
Now what does it mean from a customer perspective in the end suppose I were to take all of that and I just you know multiply it against.
Suppose I took 25% of my power used it in free tier 25% of my power in the medium tier 25% of my power in the high tier and 25% of my power in the premium tier my data center only has a gigawatt.
And so I get to decide.
How I want to distribute the free tier allows me to track more customers.
This allows me to serve my most valuable customers.
And the combination the product of all that allows you basically your revenues the revenues you can generate assuming this simplistic example allows Blackwell to generate 5 times more revenues.
Very ruben to generate 5 times.
Yeah.
So Vera Rubin you should get there as soon as you can.
And the reason for that is because your cut your cost of tokens goes down and your throughput goes up now but we want even more we want even more and so let me just show you back to this.

Throughput vs. Latency Challenges

This is.
As you as I as I told you this throughput requires a ton of flops this latency.
This interactivity.
Requires enormous amount of bandwidth computers don’t like extreme amount of flops extreme amount of bandwidth because there’s only so much.
Surface area for chips that any systems has and so optimizing for high throughput and optimizing for low latency are in fact enemies of each other.
And so this is what happened when we combined with Groq okay and so we we acquired the team that worked on the Groq chips and license the technology and we’ve been working together now.
To integrate the system.
This is what that looks like.

Groq Integration for Throughput

So at the most valuable tier at the most valuable tier we’re now going to increase performance by 35 X.
Now this.
Very simple chart revealed to you exactly the reason why Nvidia is.
So strong in the vast majority of the workloads so far and the reason for that is because up in this area.
Throughput matters so much.
You’re not going to be able to see this.
But if you look at the chart that we’ve just shown you.
You can see that NVLink 72 is so game changing it is exactly the right architecture and it’s even hard to beat even as you add Groq to it.
However.
If you extended this chart way out here and you said you wanted to have services that delivers not 400 tokens per second but 1000 tokens per second.
All of a sudden.
Envying 72 runs out of steam.
And.
Simply.
Can’t get there.
And this is what happens.
When we push that out.
So it goes out beyond.
Thank you.
Goes out beyond even the limits of what any link 72 can do and if you were to do that.
Translate that into revenues.
Relative to Blackwell.
Vera Rubin is 5 X.
If most of your workload is high throughput.
I would stick with just 100 percent.

Optimizing for Coding and Engineering Tasks

Vera Rubin.
If a lot of your workload wants to be.
Coding and very high valued engineering token generation I would add Groq to it.
I would add Groq to maybe 25% of my total data center the rest of my data center is all 100% Vera Rubin.
And so that gives you a sense of how you would add.
Grog to Vera Rubin and extend its performance and extend its value even more.
This is what happens.

Groq Architecture and Disaggregated Inference

Groq’s Deterministic Data Flow Architecture

Very.
This is a contrast the reason why the reason why Groq was so attractive to me.
Is because their computing system a deterministic data flow processor it is statically compiled it is compiler scheduled.
Meaning the compiler figures out when the date when to do the compute the computing data arrives at the same time all of that is done statically in advance.
And scheduled.
Completely in software.
There’s no dynamic scheduling.
The architecture is designed with massive amounts of SRAM.
It is designed just for inference.
This one workload now this one workload as it turns out is the workload of AI factories.
And as the world continues to increase the amount of high speed tokens and wants to generate with super smart tokens and wants to generate the value of disintegration is going to get even higher.
And so these are two.
Extreme processors you could see.

Disaggregating Inference with Dynamo

One chip 500 megabytes.
One there are Rubin chip one Rubin chip 288 gigabytes.
It would take a lot of.
Groq chips to be able to hold the parameter size of Rubin as well as all of the context that has to go the KV cash that has to go along with it.
So that limited Groqs ability to really reach the mainstream to really take off.
Until we had a great idea.
What if we disaggregated inference altogether with a piece of software called Dynamo.
What every re architected the way that inferences done in the pipeline so that we could put the work that makes perfect sense on Rubin.
And then offload the decode generation the low latency the bandwidth limited challenged part of the workload for Groq and so we.
United unified.
Two processors of extreme differences one for high throughput one for low latency.
It still doesn’t change the fact that we need a lot of memory and so Groq we’re just going to add a whole bunch of Groq chips.

Unified Processor Architecture and Production

Which expands the amount of memory it has and so if you could just imagine.
Out of a trillion parameter model we have to store all of that in Groq chips however it sits next to.
NVIDIA Vera Rubin.
Where.
We could we could hold the massive amounts of KB cash that’s necessary in processing all of these agentic AI systems.
It’s based upon this idea of this aggregated inference we do the pre fill that’s the easy part but we also tightly integrate the decode so.
The attention part of decode is done on NVIDIA’s Vera Rubin which needs a lot of math and the.
Feed forward network part of it.
The decode part is done, the token generation part is done on Vera Rubin, on the Groq chip.
The two of them working tightly coupled together over today, Ethernet with a special mode to reduce its latency by about half.
And so that capability allows us to integrate these two systems.
We run Dynamo, this incredible operating system for AI factories on top of it.
And you get 35 times increase. 35 times increase, not to mention additional new tiers of inference performance for token generation the world’s never seen.
So this is it.
This is Groq.
The Vera Rubin systems, including Groq, I want to thank Samsung, who manufactures the Groq LP30 chip for us, and they’re cranking us hard as they can.
I really appreciate you guys.
We’re in production with the Groq chip, and, you know, we’ll ship it in the second half, probably about Q3 timeframe.

Vera Rubin System Components

Vera Rubin Systems and CPUs

Okay?
Groq LPX.
Vera Rubin, you know, it’s kind of hard to imagine any more customers.
You know?
And, you know, I’m not going to say that.
I’m not going to say that.
And the really great thing is Grace Blackwell’s early sampling of it was really complicated because of coming together with NVLink 72.
But the sampling of Vera Rubin is just going incredibly well.
And in fact, Satya, I think, texted out already that the first Vera Rubin rack is already up and running at Microsoft Azure.
And so I’m super excited for them.
We’re going to keep cranking these things out.
We have now set up a supply chain that can manufacture thousands a week.
And we have a very big chunk of these systems essentially multi gigawatts of AI factories per month inside our supply chain.
And so we’re going to crank out these Vera Rubin racks while we’re cranking out the GB300 racks.
We are in full production.
The Vera CPUs, incredibly successful.
And the reason for that is because AI needs CPUs for tool use.
And Vera CPU was designed just perfectly for that sweet spot.
Incredible.
For the next generation of data processing.
Vera CPU is ideal.
The Vera CPU plus CX9 connected into the Bluefield 4 stack.

Storage Industry Adoption and KV Caching

100% of the world’s storage industry is joining us on this system.
And the reason for that is because they see exactly the same thing.
The storage system is going to get pounded.
It’s going to get pounded because we used to have humans using the storage systems.
We used to have humans using sequel.
Now we’re going to have a eyes using the storage systems and it’s going to store.
Who DF accelerated storage?
Who vs accelerated storage as well as very importantly KV caching.
OK, so this is the Vera Rubin system.

Performance Gains and Roadmap

Now what’s amazing is this?
In just two years time.
And a one gigawatt factory.
In just two years time in one gigawatt factory.
Using.
Using the mathematics that I showed you earlier.
We’re as Moore’s law would have given us a couple of steps we would have you know.
X factored.
The number of transistors we would X factor the number of flops.
We were to X factored the number of.
Amount of bandwidth but with this architecture we’re going to take our token generation speed token generation rate.
From. 22 million.
To 700 million.
Three hundred and. 50 times increase.
This is this is the power of extreme co-design this is what I mean when we integrate and optimize vertically but then we open it horizontally for everybody to enjoy this is our road map.
Very quickly.

Hardware Roadmap: Rubin Ultra to Feynman

Future Architectures and Upgrades

Blackwell is here.
The Oberon system.
In the case of Rubin.
We have the Oberon system.
We’re always backwards.
And we’re always going to be compatible so that if you wanted to not change anything and just keep on moving through with the new architecture you could do so.
The old system the standard rack system Oberon still available Oberon is copper scale up and with Oberon.
We could also use optical scale out or excuse me optical scale up.
To expand to NVLink 576.
Okay.
And so there’s a lot of conversation about.
Is.
Media going to copper scale up or optical scale up we’re going to do both.
So we’re going to have NVLink 144 with Kyber.
And then with up to Ron.
After on Oberon we’re going to NVLink 72 plus.
Optical to get to NVLink 576.
The next generation of.
Rubin with Rubin ultra we have the Rubin ultra chip which is coming which is.
Taping out.
And we have a brand new chip LP 35.
LP 35 will for the first time incorporate.
NVIDIA’s.
MVFP for computing structure.
Give you another few X X factor speed up.
Okay and so this is.
Oberon NVLink 72.

Next Generation and New CPUs

Optical scale up.
And.
It uses spectrum 6 the world’s first co packaged.
Optical.
And.
All of this is in production.
The next generation.
From here.
Is.
Fineman.
Fineman has a new GPU, of course.
It also has a.
New LP you.
LP 40.
Big step up.
Incredible, incredible new technology.
Now, uniting.
The scale of NVIDIA and.
The Groq team building together, LP 40.
It’s going to be incredible.
A brand new CPU called Rosa.

Connectivity and Scaling Solutions

A.
Short for Roslyn.
Bluefield 5, which connects the next CPU with the next.
Super neck.
CX 10.
We will have Kyber.
Which is copper scale up.
We will also have Kyber.
CPO scale up.
So, for the first time.
We will scale up.
With both.
Copper.
And.
Copackage optics.
Okay?
And so.

Capacity Expansion and Ecosystem Growth

A lot of people have been asking.
You know, Jensen, are is copper going to.
Still be important.
The answer is yes.
Jensen, are you going to scale up.
Optical.
Yes.
Are you going to scale out optical.
Yes.
And so, for everybody who is in our ecosystem.
We need a lot more capacity.
And that’s really.
The key.
We need a lot more capacity for copper.
We need a lot more capacity for optics.
We need a lot more capacity for CPO.
And that’s the reason why we’ve been working with all of you.
To lay the foundation for this level of growth.
And so.
We’ll have all of that.

Nvidia’s Transformation to AI Infrastructure

Let me see if I missed everything.
That’s it.
Every single year.
Brand new architecture.
Very quick.
Very quickly.
We’re going to.
We’re going to.
Very quickly.
NVIDIA went from a chip company.
To a.
A factory company or infrastructure company.
AI computing company.
These systems.
And now.

4️⃣ DSX and Omniverse

AI Factory Design and Omniverse

We’re building entire AI factories.
There’s so much power.
That is squandered in these AI factories.
We want to make sure that these AI factories come together.
Design in the best possible way.
Most of these components never meet each other.
Most of us technology vendors.
Now we all know each other.
But in the past.
We never met each other until the data center.
That can’t happen.
We’re building super complex systems.
And so we have to meet each other virtually somewhere else.
And so we created Omniverse.
And the Omniverse DSX world.
A platform.
Where all of us can meet.
And design these giga factories.
You know, gigawatt AI factories.
Virtually.
In system.
We have simulation.
Systems for the racks, for mechanical.
Thermal.
Electrical.
Networking.
Those simulation systems integrated into.
All of our ecosystem partners of incredible tools companies.
We also operated.
Connected to the grid.
So that we could.
Interact with each other.
Send each other.
Information.
So that we could adjust.
Grid power.
And data center power.
Accordingly.
Saving energy.
And so.

DSX Platform Components and APIs

We could.
And then.
Inside the data center.
Using Max Q.
So that we could adjust the system dynamically across power and cooling and all of the different technologies we all work on together.
So that we.
Leave no power squandered.
So that.
We run at the most optimal rate.
To deliver enormous amount of.
Token throughput.
There’s no question in my mind.
There’s a factor of two in here.
And a factor of two at the scale we’re talking about.
Is gigantic.
We call this the NVIDIA DSX platform.
And just as all of our platforms.
There’s the hardware layer.
There’s the library layer.
And there’s the ecosystem layer.
It’s exactly the same way.
Let’s show it to you.
The greatest infrastructure build out in history is underway.
The world is racing to build chip system and AI factories.
And every month of delay costs billions in lost revenues.
AI factory revenues are equal to tokens per watt.
So with power constraints.
Every unused watt is revenue lost.
NVIDIA DSX is an Omniverse digital twin blueprint.
For designing and operating AI factories for maximum token throughput.
Resilience and energy efficiency.
Developers connect through several APIs.
DSX Sim.
For physical.
Electrical.
Thermal.
And network simulation.
DSX Exchange.
For AI factory operational data.
DSX Flex.
For secure dynamic power management between the grid.
And DSX Max-Q.
To dynamically maximize token throughput.

AI Factory Design and Operational Workflow

It starts with Sim ready assets from NVIDIA and equipment manufacturers.
Managed by PTC Windchill PLM.
Then model based systems engineering is done in Daciland. 3D experience.
Jacobs brings the data into their custom omniverse app to finalize design.
It’s tested with leading simulation tools.
Using Siemens Star CCM Plus for external thermals.
Cadence Reality for internal.
E-Tap for electrical.
And Nvidia’s network simulator DSX Air.
And virtually commissioned through Procore to ensure accelerated construction time.
When the site goes live, the digital twin becomes the operator.
AI agents work with DSX Max-Q to dynamically orchestrate infrastructure.
Phaedra’s agent oversees cooling and electrical systems.
Sending signals to Max-Q which continuously optimizes compute throughput and energy efficiency.
Emerald AI agents interpret live grid demand and stress signals.
And adjust power dynamically.

Ecosystem and Infrastructure Goals

With DSX, Nvidia and our ecosystem of partners are racing to build AI infrastructure around the world.
Ensuring extreme resiliency, efficiency, and throughput.
It’s incredible, right?

Omniverse and AI Factory Platform

Well, Omniverse was designed to hold the world’s digital twin.
Starting from the Earth.
And it’s going to hold digital twins of all sizes.
And so we have just such a great ecosystem of partners.
I want to thank all of you.
All of these companies are brand new to our world.
We didn’t know many of you just a couple years ago.
And now we’re working so close together to work on and build together the largest computer.
The world’s ever seen.
And also to do it at planetary scale.
So Nvidia DSX is our new AI factory platform.
I’ll spend very little time on this this time.

Space-Based Computing and Data Centers

However, we’re going to space.
We’ve already been out in space.
Thor is radiation approved.
And we’re in satellites.
You do imaging from satellites in the future.
We’ll also build data centers in space.
Obviously very complicated to do so.
We’re working with our partners on a new computer called Vera Rubin Space 1.
And it’s going to go out to space and start data centers out in space.
Now, of course, in space there’s no conduction.
There’s no convection.
There’s just radiation.
And so we have to figure out how to cool these systems out in space.
But we’ve got lots of great engineers working on it.
Let me tell you a little bit about the future of data.

5️⃣ OpenClaw Agent Revolution

Support for OpenClaw Project

So Peter Steinberger is here.
And he wrote a piece of software.
It’s called OpenClaw.
And I don’t know if he realized how successful it’s going to be.
But the importance of data is that it’s going to be able to do things.
And the importance is profound.
OpenClaw is the number one, it’s the most popular open source project in the history of humanity.
And it did so in just a few weeks.
It exceeded what Linux did in 30 years.
And it’s that important.
It is that important.
It will do well.
This is all you do.
Okay?
So I’m announcing our support of it.

OpenClaw Functionality and Applications

Let me just quickly go through this.
I want to show you a couple of things.
You simply type this.
You type this into a console.
And it goes out.
It finds OpenClaw.
It downloads it.
It builds you an AI agent.
And then you can tell it whatever else you need to do.
Okay?
So let’s take a look.
Research is a huge deal.
You give an AI agent a task, go to sleep.
It runs 100 experiments overnight, keeping what works and killing what doesn’t.
I really love what my stuff enables that person to do.
And I had like one guy, he told me like he installed it as a 60-year-old dad and like they made beer, connected the machine via Bluetooth.
And then we automated everything, including the whole website for people to order.
The lobster lager.
Hundreds of people are queuing up for lobsters in St.
Jeff.
OpenClaw.
OpenClaw.
We want to build OpenClaw with OpenClaw.
Everyone is talking about OpenClaw, but what the f* is OpenClaw?
Believe it or not, there’s already a claw con.
OpenClaw.
OpenClaw.
OpenClaw.
OpenClaw.
OpenClaw.
Incredible.
Incredible.
Now, I illustrated effectively what OpenClaw is in this way, so that all of you can understand it.

OpenClaw as Agentic Operating System

But let’s just think what happened.
What is OpenClaw?
It connects.
It's an agentic system.
It calls and connects to large language models.
So, the first thing it has, it has resources.
It has the resources that it manages.
It could access tools.
It could access file systems.
It could access large language models.
It’s able to do scheduling.
It’s able to do cron jobs.
It’s able to decompose a problem, a prompt that you gave it, into step by step by step.
It could spawn off and call upon other sub-agents.
It has I.O.
You could talk to it in any modality you want.
You could wave at it and it understands you.
You could talk to any modality you want.
It sends you messages.
It texts you, sends you email.
So, it’s got I.O.
What else does it have?
Well, based on that, you could say, in fact, it’s an operating system.
I’ve just used the same syntax that I would describe an operating system.
OpenClaw has open sourced, essentially, the operating system of agentic computers.
It is no different than how Windows made it possible for us to create personal computers.
Now, OpenClaw has made it possible for us to create personal agents.
The implication is incredible.
The implication is incredible.

Strategic Importance of OpenClaw

First of all, the adoption says something, you know, all in itself.
However, the most important thing is this.
Every single company now realizes, every single company, every single software company, every single technology company, for the CEOs, the question is, what’s your OpenClaw strategy?
Just as we need to all have a Linux strategy, we all need to have an HTTP, HTML strategy, which started the internet.
We all need to have a Kubernetes strategy, which made it possible for mobile cloud to happen.
Every company in the world today needs to have an OpenClaw strategy, an agentic system strategy.
This is the number one thing that we need to do.
This is the new computer.

Enterprise IT Transformation

Evolution of Enterprise IT with Agentic Systems

Now, this is just the exciting part.
This is enterprise I.T. before OpenClaw.
You know, and I mentioned earlier, the way enterprise I.T. works.
And the reason why it's called data centers is because these large rooms, these large buildings held data, held the files of people, the structured data of business.
It would pass through software that has tools and, you know, systems of records and all kinds of workflow that's codified into it.
And that turns into tools that humans would use, digital workers would use.
That is the old I.T. industry.
Software companies creating tools, saving files, and, of course, G.S.I.s consultants that help companies figure out how to use these tools and integrate these tools.
These tools are incredibly valuable for governance and security and privacy and compliance, and all of that continues to be true.
It’s just that post OpenClaw, post agentic, this is what it’s going to look like.
This is the extraordinary part.
Every single I.T. company, every single company, every S.A.A.S. company, every S.A.A.S. company will become a G.A.A.S. company.
No question about it.
Every single S.A.A.S. company will become a G.A.A.S. company, an agentic as a service company.
And what’s amazing is this, you now, OpenClaw gave us, gave the industry exactly what it needed at exactly the time.
Just as Linux gave the industry exactly what it needed at exactly the time, just as Kubernetes showed up at exactly the right time, just as H.T.M.L. showed up.
It made it possible for the entire industry to grab onto this open source stack and go do something with it.

Agentic Systems Security Considerations

There’s just one catch.
Agentic systems in the corporate network can have access to sensitive information, can have access to sensitive information, it can execute code, and it can communicate externally.

Security and Privacy Concerns

Just say that out loud.
Okay, think about it.
Access sensitive information, execute code, communicate externally.
You could of course access employee information, access supply chain, access finance information, sensitive information, and send it out.
Communicate externally.
Communicate externally.
Obviously, obviously, this can’t possibly be allowed.

Enterprise Secure and Private OpenClaw

And so, what we did was, we worked with Peter.

NVIDIA OpenClaw Reference

We took some of the world’s best security and computing experts, and we worked with Peter to make OpenClaw, OpenClaw, enterprise secure and enterprise private capable.
And we call that, this is our NVIDIA OpenClaw reference for Open NemoClaw, which is a reference for OpenClaw, and it has all these agentic AI toolkits.
And the first part of it is technology we call Open Shell, that has now been integrated into OpenClaw.
Now it’s enterprise ready.

Policy Engines and Network Guards

This stack, this stack, with a reference design we call NemoClaw, okay, with a reference stack we call NemoClaw.
You could download it, play with it, and you could connect to it the policy engine of all of the SAS companies in the world.
And your policy engines are super important, super valuable.
So the policy engines could be connected, NemoClaw, or OpenClaw with Open Shell would be able to execute that policy engine.
It has a, it has a network guardrail.
It has a privacy router.
And as a result, we could protect and keep the, the CLAWs from executing inside our company and do it safely.

NVIDIA’s Open Model Initiative

We also added several things to the agentic system.
And one of the most important things you want to do with your own CLAW, custom CLAWs, is so that you can have your custom models.
And this is NVIDIA’s open model initiative.
We are now at the frontier of every single domain of AI models.

Diverse AI Model Domains

Whether it’s Nemotron, Cosmos World Foundation model, Groot, artificial general robotics, human robotics models, Alpamayo for autonomous vehicle, BioNemo for digital biology, ERT2 for AI physics.
We are at the frontier on every single one.
Take a look.
The world is diverse.
No single model can serve every industry.

Key Open Model Families

Open models, is one of the largest and most diverse AI ecosystems in the world.
Nearly 3 million open models across language, vision, biology, physics, and autonomous systems enable AI builds for specialized domains.
NVIDIA is one of the largest contributors to open source AI.
We build and release six families of open frontier models, plus the training data, recipes, and frameworks to help developers customize and adopt.
New leaderboard types, new leaderboard topping models are launching for every family.
At the core, Nemotron, reasoning models for language, visual understanding, RAG, safety, and speech.

Open Models and Nemotron

Advancements and Commitment to Open Models

Can you hear me now?
Yes.
Hello?
Yes, I can hear you now.
Cosmos, frontier models for physical AI world generation and understanding.
Alpamayo, the world’s first thinking and reasoning autonomous vehicle AI.
Groot, foundation models for general purpose robots.
BioNemo, open models for biology, chemistry, and molecular design.
Earth2, models for weather and climate forecasting rooted in AI physics.
NVIDIA open models give researchers and developers the foundation to build and build AI systems.
We’re building the world’s first thinking and reasoning foundation to build and deploy AI for their own specialized domains.
Our models, our mo- Thank you.
Our models are valuable to all of you because number one, it’s on the top of the leaderboard.
It’s world class.
But most importantly, it’s because we are not going to give up working on it.

Continuous Model Advancement

We’re going to keep on working on it every single day.
Nemotron 3 is going to be followed by Nemotron 4.
Cosmos 1 is followed by Cosmos 2.
Groot, Groot at generation 2.
Each and one of these will continue to advance these models.

Vertical Integration and Horizontal Openness

Vertical integration, horizontal openness, so that we can enable everybody to join the AI revolution.
Number one on leaderboard across research and voice and world models and artificial general robotics and self-driving cars and reasoning.
And of course, one of the most important ones.
This is Nemotron 3 in OpenClaw.
This is Nemotron 3 in OpenClaw.
And look at the top three.
There are the three best models in the world.
Okay?
So, we are at the frontier.

Customization and Sovereign AI

It is also true, it is also true that we want to create the foundation model so that all of you can fine tune it, post-train it into exactly the intelligence you need.
This is Nemotron 3 Ultra.
It is going to be the best base model the world has ever created.
This allows us to help every country build their sovereign AI.

Nemotron Coalition and Partners

And we are working with so many different companies out there.
And one of the most exciting things that we are doing today, I am announcing today, is a Nemotron coalition.
We are so dedicated to this.
We have invested billions of dollars of AI infrastructure so that we could develop the core engines for AI that is necessary for all the libraries of inference and so on.
But also to create the AI models to activate every single industry in the world.
Large language models is really important.
Of course it is important.
How could human intelligence not be?
However, in different industries around the world, in different countries around the world, you need to have the ability to customize your own models and the domains that are there.
The domain of the models is radically different.
From biology, to physics, to self-driving cars, to general robotics, to of course human language.
And we have the ability to work with every single region to create their domain specific, their sovereign AI.
Today, we are announcing a coalition to partner with us to make Nemotron 4 even more amazing.
And that coalition has some amazing companies in it.
Black Forest Labs, Imaging Company, Cursor, the famous coding company, we use lots of it.
Langchain, billion downloads for creating custom agents.
Mistral, Arthur mentioned, I think he’s here.

Coalition of Leading Companies

Incredible, incredible company.
Perplexity, Perplexi’s computer.
Absolutely use it.
Everybody use it.
It is so good.
A multi-modal agentic system.
Reflection, Sarvam from India, Thinking Machine, Miramarati’s lab.
Incredible companies.
Incredible companies joining us.
Thank you.

Enterprise Agentic Transformation

Enterprise Adoption of Agentic Systems

I said, I said that every single enterprise company, every single software company in the world needs an agentic systems, need an agent strategy.
You need to have an open-claw strategy.
And they all agree.
And they’re all partnering with us to integrate Nemo, the NemoClaw reference design, the NVIDIA agentic AI toolkit, and of course, all of our open models.
One company after another.
There’s so many.
And we’re partnering with all of you.
I’m really grateful for that.

The Renaissance of Enterprise IT

And this is our moment.
This is a reinvention.
This is a renaissance.
A renaissance of the enterprise IT.
From what would be a $2 trillion industry, this is going to become a multi-trillion dollar industry.
Offering not just tools for people to use, but agents that are specialized in very special domains that you’re expert in, that we could rent.

Engineer Productivity and Token Budgets

I could totally imagine in the future, every single engineer in our company will need an annual token budget.
They’re going to make a few hundred thousand dollars a year, their base pay.
I’m going to give them probably half of that on top of it, as tokens, so that they could build a company that’s going to be able to do that.
So that they could be Amplify 10X.

Tokens as a Recruiting and Productivity Tool

Of course we would.
It is now one of the recruiting tools in Silicon Valley.
How many tokens comes along with my job?
And the reason for that is very clear.
Because every engineer that has access to tokens will be more productive.
And those tokens, as you know, will be produced by AI factories that all of you and us, we partner to build.

Future of Enterprise and Software Companies

So every single enterprise company in today sit on top of file systems and data centers.
Every single software company of the future will be agentic, and they will be token manufacturers.
They’ll be token users for their engineers, and they’ll be token manufacturers for all of their customers.
The open clause event, the open clause event cannot be understated.

The Unmatched Significance of OpenClaw

This is as big of a deal as HTML.
This is as big of a deal as Linux.
We have now a world-class open agentic framework that all of us could use to build our open clause strategy.
And we’ve created a reference design we call NemoClaw, that all of you could use, that is optimized, it’s performant, it is safe and secure.

6️⃣ Physical AI

Understanding Agents: Digital vs. Physical

Speaking of agents, agents as you know, perceive, reason, and act.
Most of the agents in the world today that I’ve spoken about are digital agents.
They act in the digital world.
They reason, they write software.
It’s all digital.
But we also have been working on physically embodied agents for a long time.

Physically Embodied Agents: Robots

We call them robots.
And the AIs that they need are physical AIs.
We have some big announcements here.

Robotics Ecosystem and Partners

I’m going to just walk through a few of them. 110 robots here.
Almost every single company in the world, I can’t think of one, that are building robots is working with NVIDIA.

Robotics Compute Infrastructure

We have three computers.
The training computer, the synthetic data generation and simulation computer, and of course the robotics computer that sits inside the robot itself.
We have all the software stacks necessary to do so.

Software and Ecosystem Integration

The AI models to help you.
And all of this is integrated into ecosystems around the world.
And all of our partners from Siemens to Cadence.

New Robotics Partnerships

Incredible partners everywhere.
And today we’re announcing a whole bunch of new partners.

Self-Driving Cars and RoboTaxi Deployments

As you know, we’ve been working on self-driving cars for a long time.
The chat GPT moment of self-driving cars has arrived.
We now know we could successfully, autonomously drive cars.
And today we are announcing four new partners for NVIDIA’s RoboTaxi Ready platform.
B-Y-D, Hyundai, Nissan, Geely.
All together, 18 million cars built each year.
Joining our partners from before, Mercedes, Toyota, GM.
The number of RoboTaxi Ready cars in the future are going to be incredible.
And we’re announcing also a big partnership with Uber.
Multiple cities, we’re going to be deploying and connecting these RoboTaxi Ready vehicles into their network.
And so, a whole bunch of new cars.

Additional Robotics Company Partners

We have ABB, Universal Robotics, KUKA.
So many robotics companies here.

Physical AI Integration in Manufacturing and Infrastructure

And we’re working with them to implement our physical AI models integrated into simulation systems so that we could deploy these robots into manufacturing lines all over.
We have Caterpillar here.
We even have T-Mobile here.
And the reason for that is in the future, that radio tower used to be a radio tower, is going to be an NVIDIA Aerial AI RAM.
And so, this is going to be a robotics radio tower.
Meaning, it can reason about the traffic, figure out how to adjust its beam forming so that it could save as much energy as possible and increase the amount of fidelity as possible.
There are so many humanoid robots here.
But one of my favorites, one of my favorites, is a Disney robot.
You know what?

Physical AI and Robotics Demonstrations

Tell you what.
Let me just show you some of the videos.
Let’s look at that first.

Autonomous Vehicles

The first global rollout of physical AI at scale is here.

The Age of Physical AI and Robotics

Autonomous vehicles.
And with NVIDIA Alpamayo, vehicles now have reasoning, helping them operate safely and intelligently across scenarios.
We ask the car to narrate its actions.
I’m changing lanes to the right to follow my route.
Explain its thinking as it makes decisions.
There’s a double parked vehicle in my lane.
I’m going around it.
And follow instructions.
Hey Mercedes, can you speed up?
Sure, I’ll speed up.

Robot Training and Simulation

This is the age of physical AI and robotics.

Robot Training: From Simulation to Reality

Challenges in Real-World Robot Training

Around the world, developers are building robots of every kind.
But the real world is massively diverse.
Unpredictable.
Full of edge cases.
Real world data will never be enough to train for every scenario.
We need data generated from AI and simulation.

Compute as Data and Simulation

For robots, compute is data.
Developers pre-train World Foundation models on internet-scale video and human demonstrations and evaluate the model’s performance to prepare them for post-training.
Using classical and neural simulation, they generate massive amounts of synthetic data and train policies at scale.

NVIDIA’s Open-Source Tools for Robot Training

To accelerate developers, NVIDIA built open-source Isaac Lab for robot training and evaluation and simulation.
Newton for extensible and GPU-accelerated differentiable physics simulation.
Cosmos World Models for neural simulation.
And Groot Open Robotics Foundation models for robot reasoning and action generation.
With enough compute, developers everywhere are closing the physical AI data gap.

Developer Success Stories

Peritas AI trains their own robots.
NVIDIA Isaac Lab trains their operating room assistant robot in NVIDIA Isaac Lab, multiplying their data with NVIDIA Cosmos World Models.
Skilled AI uses Isaac Lab and Cosmos to generate post-training data for their skilled AI brain.
They use reinforcement learning to harden the model across thousands of variations.
Humanoid uses Isaac Lab to train whole body control and manipulation policies.
Hexagon Robotics uses Isaac Lab for training and data generation.
Foxconn fine-tunes group models in Isaac Lab, as does Noble Machines.
Disney Research uses their Camino physics simulator in Newton and Isaac Lab to train policies across their character robots in every universe.

Disney Robot Demonstration

Ladies and gentlemen, Olaf.
Wahoo!
Snowman coming through.
Newton works.
Wow.
Omniverse works.
Olaf, how are you?
I’m so happy now that I’m meeting you.
I know, because I gave you your computer.
Jetson.
What’s that?
Well, it’s in your tummy.
That’s going to be amazing.
And you learn how to walk inside Omniverse.
I love walking.
This is so much better than riding on a reindeer gazing up at a beautiful sky.
And it was because of physics, using this Newton solver that runs on top of NVIDIA Warp, that we joined together.
And it was the technology that was completely developed with Disney and with DeepMind that made it possible for you to be able to adapt to the physical world.
Check that out.
I used to not to say that.
That’s how smart you are.
I’m a snowman, not a snowclopedia.
Could you imagine this?
The future of Disneyland?
All these robots, all these characters wandering around.
Oh!
You know, I have to admit though.
I thought you were going to be taller.
I’ve never seen such a short snowman, to be honest.
Nope.
Hey, tell you what, you want to help me out?
Hooray!
Okay.

Wrap Up

Key Takeaways Recap

Usually, usually I close the keynote by telling you what I told you.
We talked about inference inflection.
We talked about the AI factory.
We talked about the OpenClaw agent revolution that’s happening.
And of course, we talked about physical intelligence.
And that’s what I’m going to talk about today.
We’re going to talk about physical AI and robotics.

GTC Closing Performance

But tell you what, why don’t we get some friends to help us close it out?
Of course!
All right, play it.
Come on.
Terminating simulation.
Hello?
Anybody here?
I’m here.
Coming alive, agents learning how to drive.
From open models to robots too, now we’ll break it all down for you.
Compute exploded, what we saw, from CNNs to OpenClaw.
Agents working across the land, but they need the power to meet demand.
So we solved the problem, it was brilliant, we multiplied compute by 40 million.
Once upon an AI time, training was the paradigm.
Sure it taught the models how, but inference runs the whole world now.
There shows us who’s the boss, at 35 times less the cost.
Blackwell makes the token sing, NVIDIA the inference king.
Our factories once took years, vendors pulling racks and gears.
Built up slowly, piece by piece, no clear way to scale this beast.
DSX and Dynamo know what to do, turning power into revenue.
Agents used to wait and see, now act autonomously.
From sim to streets, now watch them drive.
Throw your hands up for physical AI.
Industrial age, built what came before.
Now we’re built for AI even more.
Vera Rubin plus Groq, make the inference splash.
Put them together, now it’s raining cash.
We build new architecture every year.
Cause claws keep yelling more tokens here.
The AI stacks for all to make.
So let us all eat five layer cake.
The moment’s bright, the path is clear.
Cause open models led us here.
When data’s missing, there’s no dispute.
We just generate more with compute.
Robots learning without flaw.
Fueling the force, scaling laws.
The future’s here, won’t you come and see?
Welcome all to GTC.

Closing Remarks

We’re here to help.
Alright, have a great GTC!
Wave.
Thank you everybody.
See you all!
Thank you.
Thank you.