(Illustration: AWS re:Invent 2025 Keynote with Peter DeSantis and Dave Brown. Image source: AWS.)
Peter DeSantis, SVP of Utility Computing at AWS, and Dave Brown took the stage at re:Invent 2025 to deliver a keynote that went deep into the infrastructure fundamentals powering the AI era. Rather than chasing the latest AI hype, they made a compelling case that the core cloud attributes we have relied on for two decades, security, availability, elasticity, agility, and cost, matter more than ever. The announcements ranged from Graviton5 with 192 cores in a single package and 5x more L3 cache, to Lambda Managed Instances that bridge the EC2-Lambda divide, to S3 Vectors hitting GA with sub-100ms queries over 2 billion vectors. On the AI acceleration front, Trainium3 UltraServer delivers 5x output tokens per megawatt, and PyTorch native support means porting GPU code to Trainium is literally a one-line change.
✳️ tl;dr
One theme “Infrastructure Fundamentals for the AI Era” runs throughout, with six sections:
- Core Cloud Attributes: Security, availability, elasticity, agility, and cost remain the foundation. These attributes guided every AWS decision for 20 years and are even more critical in the AI era.
- Nitro and Graviton Evolution: From custom silicon (Nitro) eliminating virtualization jitter to Graviton5 delivering 192 cores with 5x L3 cache. M9g instances offer up to 25% better performance than M8g. Guest: Payam Mirrashidi (Apple) on Swift + Graviton achieving 40% performance gains.
- Serverless Expansion: Lambda Managed Instances bridges the gap between EC2 performance and Lambda simplicity. Your Lambda functions run on EC2 instances you choose, while Lambda manages provisioning, patching, and scaling.
- Inference and Bedrock Architecture: Project Mantle inference engine powers Bedrock with service tiers (priority, standard, flexible), per-customer queue fairness, Journal for fault tolerance, and confidential computing.
- Vector Search and S3 Vectors: Nova Multimodal Embeddings unifies text, image, video, audio into shared vector space. S3 Vectors GA achieves sub-100ms queries on 2 billion vectors. 250K+ vector indexes created in 4 months. Guest: Jae Lee (TwelveLabs) on video intelligence.
- Trainium3 and AI Acceleration: Trainium3 UltraServer with 144 chips, 360 PetaFLOPS, 20TB HBM. 5x output tokens per megawatt. NKI GA and Neuron Explorer for performance profiling. PyTorch native support. Guest: Dean Leitersdorf (Decart) on real-time visual intelligence.
✳️ Live Experience
(Caption: This session used to be held on Monday evenings, originally known as Monday Night Live, and I’d often miss it due to dinner or meeting conflicts — only to catch up later. This year it clashed with other commitments again, so here are some stand-in photos of the Amazon EC2 UltraServer displayed on the CEO Keynote stage and the cute, mischievous Kiro who loves playing hide-and-seek — giving you a glimpse of the physical and protocol world that truly exists behind the cloud’s abstraction.)