re:Invent 2025 Day 4: The Infrastructure Behind Agentic AI

When the curtain lifts on infrastructure, you understand why AI has a future at all.

There is a rhythm to a Peter DeSantis keynote.
You sit down expecting a product announcement, and you end up holding a degree-level lecture on why the future of AI rests on the parts nobody sees. This morning at reInvent was exactly that. A full room, a quiet intro, then straight across the boundary between computer science, silicon strategy, and the economics of AI.

Image

What does AI mean for the cloud

Peter opened with five fundamentals that are not changing simply because we now live in an agentic world.

Security
Bad actors use the same AI we do. The cloud provider who wins is the one who assumes the attackers are equipped, clever, and fast.

Availability
Scale has never mattered more. If our workloads become model driven and real time, downtime hits harder.

Elasticity
Nobody wants to capacity plan a GPU fleet. AWS is already scaling GPU infrastructure exponentially. That growth was charted live on stage.

Cost
AI is expensive. The way to lower the cost curve is not discounts. It is silicon. That is why AWS invests in Trainium and Graviton instead of waiting for somebody else to fix the economics.

Agility
AI is evolving at a pace that kills slow decision making. If teams cannot move quickly, they cannot adopt or compete.

The message was simple.
Infrastructure is not a background feature. It is the product now.

A trip back to 2010

We went back in time to EC2’s early days. Everything ran well, except a frustrating subset of workloads. Latency spikes. Jitter. The virtualisation layer was the enemy. The answer was not a faster hypervisor, but a new direction.

This became Nitro.
Hardware virtualisation. Removal of noisy neighbour problems. Predictable performance without giving up the cloud. Nitro is now literally in computer science textbooks. Peter held one up and then handed out a thousand copies.

That was the energy of the morning. Build the hard things yourself or someone else will own your destiny.

Graviton 5 and M9g instances

Dave Brown took the stage with architecture diagrams. Cache sizes. Thermal design. And performance curves. If you like silicon, this was your Disneyland.

The next generation of AWS silicon introduces a surprising amount of innovation outside the chip itself. AWS redesigned cooling by removing unnecessary layers inside the package, reducing fan power and heat transfer inefficiency. Small change. Big impact at hyperscale.

New instance family

Announcing Amazon EC2 M9g instances powered by Graviton5
Preview URL: https://aws.amazon.com/about-aws/whats-new/2025/12/ec2-m9g-instances-graviton5-processors-preview/

AWS is reporting:

  • Up to 35 percent faster for machine learning inference
  • Up to 35 percent faster for web applications
  • Up to 30 percent faster database performance

Used today by teams like Airbnb, Atlassian, and SAP. A double announcement in truth. A new chip and a new instance family.

Innovate faster by shrinking the compute

The story shifted to serverless. The early philosophy behind Lambda. The era where compute had to be attached to storage. Why people wanted a service where they could just upload code and let AWS do the rest. Then the most interesting line of the morning.

Lambda and EC2 are now under one team.
Compute is no longer boxes. It is a spectrum.

The question is no longer whether serverless can replace servers.
The question is what happens when serverless adopts the flexibility of EC2.

This is where Lambda Managed Instances fits. It was not a new announcement, but today it was positioned. Lambda on EC2 is not a workaround. It is a spectrum of compute moving toward a single mental model.

AI elasticity at cloud scale

Running inference is not about plugging models into GPUs. Traffic is irregular. Agent workloads do not behave like batch jobs. And the classic ELB to GPU workflow wastes capacity.

This is where Project Mantle was introduced. The idea is simple.
Use tiers. Automatically shift AI workloads across different accelerator pools and cost boundaries.

Elasticity for AI like EC2 gave elasticity for compute.

This ties in with announcements across Bedrock this week, including:

  • Identity for agents
  • Lambda functions inside AI agents
  • CloudWatch observability for Bedrock

It is not a collection of features. It is a foundation.

Vectors and the rise of workload specific infrastructure

The final section was a full whiteboard session on vector systems. Multi modal embeddings. Approximate search performance. Why a vector database is only as fast as the structure of the embedding itself.

This led to S3 Vectors and OpenSearch acceleration. AWS is pushing vector performance deeper into native services. Not another product. Infrastructure becoming smarter because it needs to be.

Customer story TwelveLabs.
High velocity video search. A real example of complex embeddings turning into production workloads.

The throughline

By the end of the keynote the lesson was obvious.
AI is only as intelligent as the infrastructure that carries it.

We can talk about agents all week, but if we cannot train, scale, cool, recover, reason, and maintain performance, then agentic AI becomes a toy. Today was a reminder that model innovation and silicon are now the same story.

This was the final full day of re:Invent. We had two keynotes today, and one of them still needs its own story. Werner’s closing chapter deserves a post of its own.

Stay tuned.