脈絡拆解: AWS re:Invent 2023 Monday Night Live Keynote With Peter Desantis

Published: 2023-11-27

Lastmod: 2024-04-16

起因

自從 2020 年初獲選成為 AWS Community Hero 之後，今年很榮幸依然也獲得受邀參加位於 Las Vegas 的 AWS 年度開發者大會 re:Invent 2023，能夠再次體驗各種現場活動，相當開心。

今年是連續第四年，拆解 AWS re:Invent Keynotes 主題演講。除了最多人關注的 CEO Keynote，去年開始嘗試整理其他場次的 Keynote，在整理過程中找到不同收穫，順便將這些整理與大家分享。

跟以往一樣，全文開始我會先試著抓出演講架構，然後放一些觀察與推論，接著各段落放一些流水筆記，方便未來搜尋使用。文中、文末整理有延伸閱讀，可以讓大家延伸對於演講內容的背景情境或資訊。也希望有機會帶大家一起推理為什麼這個時間點會推出這個產品或功能、全球各種產業的趨勢方向往哪裡走、台灣產業關注的議題方向不一定相同，但多看看、多比較、多參考，也許能避開一些誤判的雷區，在資源有限的情況下，如果走錯方向，市場不一定會給機會讓我們再砍掉重練。推理不一定是正確，就當作是種練習與分享。

新服務或新屬性，在本文中使用 [NEW 🚀] 方式標記，方便各位按下 Command/Ctrl+F 內頁搜尋。

本文刻意將大部分產品連結都先拿掉，讓大家能夠專注地閱讀（這年頭我們都少了些專注，是吧？）。若有需要產品連結，可以參閱我平常整理的 AWS 產品清單一覽表。

歡迎大家留言給我關於你的想法。如果對於我如何整理這些知識、資料有興趣，可以參考我的個人知識系統 - Ernest PKM。

那我們就開始吧！

內容大綱

摘要 tl;dr

Monday night live = Lager, Laugh, Launch, Learn, and Lager (啤酒).
Six attributes of a cloud computing service = Elasticity, Cost, Sustainability, Security, Availability, Performance.
Serverless & Quantum computing.
Riot Games: Make a product ask to AWS.

摘要架構

Opening
Serverless
- The promise
- 6 attributes
- Why is not everything we do serverless?
- Pushing in new directions
Road to Serverless (the major part)
- Database Sharding
- Isolation & Consistency
- Time Sync
- AI-driven scaling and optimizations
Quantum Computing
- Bit vs Qubit
- History
- Error Rate
- Quantum Device, Equipment, and Computer
Closing

完整筆記

Opening

(Peter walks on the stage)
As many of you might know, we like to do things a little differently on Monday night,
- but for those of you that don’t know, just remember the five L:
- Lager, Laugh, Launch, Learn, and Lager (啤酒).

Serverless

ℹ️ The promise

Last couple years, we’ve been talking about some of the largest serverless services that we build on top of AWS’s massive infrastructure: S3, Lambda, if you include DynamoDB, these are the canonical examples of delivering on the promise of serverless computing.
Serverless promises to remove the muck of caring for servers.
No need to upgrade software, patch operating systems, retire old hosts, qualify new hosts.
With serverless, it all goes away, but this is just the beginning.

ℹ️ 6 attributes

the six most important attributes of a cloud computing service.
These are the things that we spend a ton of time putting into all of our services, but serverless services let us take these things a step further.

1) Elasticity

For example, serverless is more elastic.
Because we run over our vast infrastructure and share our capacity across a large number of customers, each customer workload represents a small fraction of the capacity.

2) Cost

Serverless is more cost effective.
Unlike a service where you have to pay for what you provision, with a serverless capability, you only pay for what you use.

3) Sustainability

Serverless computing allows us to run our infrastructure more efficiently, and that means better sustainability, because the most efficient power is the power you don’t use.

4) Security

5) Availability

And because these services are built from the ground up to run on AWS’s infrastructure, they deliver better security and better availability, taking advantage of native capabilities, like AWS Nitro and our Availability Zone architecture.

6) Performance

ℹ️ Further reading: AWS Well-Architected and the Six Pillars ¹

ℹ️ Why isn’t everything we do serverless?

Well, I think there’s a couple of reasons for this.

1) Familiarity and legacy code

The first is familiarity of legacy code.
One of the things we know is that, over the long term, change is inevitable, but change over the short term is hard.
Example - mainframe
- If you doubt this, consider the mainframe.
- It’s not that developers love writing and maintaining applications on the mainframe or the innovation of the mainframe that keeps people there.
It’s because it’s hard and expensive to move things off these systems, and even in less extreme examples, developers need time to learn new systems and approaches.

2) Richness of capability.

To deliver on the promise of serverless capabilities, we deliberately introduced more targeted product offerings initially.
For example, at launch, DynamoDB ² offered high performance reads and writes, but with minimal query semantics, useful for a broad range of applications, but a far cry from what a traditional SQL database could do at the time.
Now, we started off pretty simple, literally, with Simple Storage Service ³ and Simple Queue Service in 2006, but we’ve been working really hard over the years to add serverless capabilities.
In fact, we’ve accelerated our pace of innovation, and some of these features and capabilities changed the way these services can be used entirely.
For example, with DynamoDB in 2018, we added the ability to have transactions, and this made DynamoDB a much better replacement for a traditional relational database, and of course, we launched completely new capabilities, including EFS, our Elastic File System, Lambda, which pioneers serverless computing as a service, and Fargate, which allows you to run serverless containers.
Now, there’s amazing stories of technical innovations underpinning all of these capabilities, but tonight, I want to focus on a different story.

ℹ️ Pushing in new directions

One of the things that I love about working at Amazon is we reject the need to only pursue a single path.
ℹ️ Further reading: 單一觀點的謬誤，之一 - Ernest Talks
When you push us in different directions, our normal response is let’s go, and while customers love our serverless offerings, they also love when we innovate around the tools and softwares that they use today, and that’s why we support the broadest range of managed databases, file systems, operating systems, and open-source software.

Road to Serverless (the major part)

— Relational Databases —

We’re committed to making sure that AWS is the best place to run any of the software that you need, and this commitment is why we’ve been making a large investment in delivering the value of serverless computing to the server-full software that you love, and that’s the journey that I want to go on tonight.

✳️ Amazon RDS (2009)

And what better place to start this journey than with the relational database?
Historically, working with a database involved, picking an instance configuration, SSHing onto the instance, installing a database, setting up the database, probably configuring complex replication schemes, and don’t forget the daily joy of patching and maintaining the database, or the occasional thrill of upgrading the database to a new version.
And that’s why, in 2009, we announced Amazon Relational Database Service.
As we described in the launch post, RDS’s goal was to make it easier to set up, operate, and scale a relational database.

✳️ Amazon Aurora (2014)

In the same way that EC2 removes the muck of managing an instance, RDS removes the muck of running a database, but how do you go from a managed database offering to something serverless?
Well, with years of innovation, and now, there’s a lot of innovations underpinning Aurora, which provides a fully compatible, Postgres compatible, and MySQL compatible database, but the biggest innovation of Aurora is Aurora's internal database optimized distributed storage system something we internally refer to as Grover.

Grover

Grover allows us to disaggregate our database from the storage itself.
Now, at first blush, this might not sound all that impressive.
RDS uses EBS, and EBS is disaggregated storage, right?
True, but EBS just provides you with the ability to configure a better instance for your database.
That’s nice, but Grover does a lot more.

Main components of a DBMS

This diagram represents how most modern relational databases are built.

The log is the database

Aurora and Grover are built in much the same way, but their focus of their architecture is around one component, and that component’s the log.
The database log is an essential element to the capabilities that we expect from a relational database, and perhaps less obviously, to the performance of the database itself.
The log is a meticulous record of everything that's happened inside the database.
Everything else is just a manifestation of that log that allows you to run queries quickly and transactions efficiently.
Rather than assure that every modified memory page is immediately synced to durable media, the database engine carefully logs every step it takes using a technique called write-ahead logging, and this log liberates the rest of the database engine to focus on performance while not needing to worry about maintaining consistency and durability, things that we treasure in our database.
The log can also be used to restore a database to any point in time.

Grover has the log

If you have the log, you have the database, and with Aurora, Grover has the log.
Rather than logging locally, Aurora databases send each of their log entries to Grover, and Grover immediately assures the durability and availability of those log entries by replicating them to multiple availability zones, but this is only part of what makes Grover so powerful.
Grover doesn’t just store the log.

Grover log rewrites

It actually processes the log, and it creates an identical copy of the database’s internal memory structures on the remote system, and these data structures can be sent back to the Aurora database anytime they need it, so they can be loaded into the database’s memory.
Now, the primary benefit of this is that it significantly reduces the IO on the main database.
Unlike a traditional database, Aurora no longer needs to write its dirty memory pages to durable memory.
It only needs to log to Grover, and writing a log involves a relatively small amount of sequential IO, something that can be done quite efficiently.

80% less I/O

In fact, Grover can reduce the IO demands of the Aurora database's storage system by 80%,

1) 3~5x price performance

and that’s why, with Aurora, you get three to five times price performance over the equivalent open-source managed databases.

2) Multi-AZ durability

Grover also provides the durability of multiple availability zones without needing to setup database replication.
If your Aurora database or even a whole availability zone goes down, you can relaunch your Aurora database in any of the other availability zones.

3) Ability to scale out read replicas

Aurora also allows you to easily and efficiently scale out by adding read replicas, and of course, you get serverless scaling of your database storage.

4) Serverless scaling of database storage

Because each Aurora database has access to Grover’s multi-tenant distributed storage service, it can scale seamlessly and efficiently from a single table to a massive database, and when the database gets smaller, that’s taken care of too.
Drop a large index; stop paying for the index.
That’s how serverless works.

ℹ️ Are we there yet?

- With our launch of Aurora, we took a big step forward on our journey to making the relational database less server-full and more serverless, but it’s still a far cry from a real serverless service.
- For example, while Aurora helps you easily scale out the read capacity of your database by adding read replicas, you still need to update the primary database if you need more write capacity, and an upgrade or a downgrade of a server size requires you to failover your database, and that is not very serverless, dude.

✳️ Amazon Aurora Serverless (2018)

So, that’s what led us to launch Aurora Serverless.
Aurora serverless scales up and down seamlessly as your database load changes, without needing to resize or failover the database.

Databases on baremetal

Now, how do we make an elastic relational database that can grow and shrink without a failover?
Well, one obvious way to do this is to run the database on a very large physical server and let it grow as needed, and databases are good at this.
They’re used to asking the operating system for more memory when they need it.
In our example here, our database is running on a physical host with 256 gigabytes of memory.
When it needs more memory, it simply asks the operating system, and this is great, except with all that extra memory sitting around, there’s going to be a lot of waste.
So, that’s not going to work.

Insufficient isolation

So, we could try running multiple databases on that same large server, allowing each database to grow and shrink as necessary, and this will make the shared pool of resources more efficient, but there’s a problem.
This involves sharing, and as we’ve discussed in the past, sharing is complicated.
At AWS, we believe the only way to share server resources securely is with a hypervisor.
Others may have different views on this, but for us, processes simply aren't an adequate security boundary.
We also don’t consider containers, which are really processes under the covers, to be adequate ways to isolate workloads.

Strong hypervisor level isolation

Our only abstraction for isolating customer workload is to use a hypervisor.
Our Nitro Hypervisor provides purpose-built capabilities to deliver consistent performance on our EC2 instances.

Example - 8GB instance with a Nitro Hypervisor

Let’s look at what happens when we create an eight-gigabyte instance with a Nitro Hypervisor.
The hypervisor allocates eight gigabytes of physical memory to the instance, and the guest thinks it has eight gigabytes of memory.
When you launch an EC2 instance, you can rest assured that the Nitro Hypervisor is reserving the memory and CPU instances, the resources that your instance comes with.
This is why EC2 instances provide such consistent performance, regardless of what other instances are doing.
And this is great for a database.

Scaleup/down still requires reboot

Databases love consistency, but what happens when our database requests more memory?
Well, the OS simply doesn’t have anything to give.
Even though there’s available resources on this host, the guest is already configured statically with a smaller memory allotment.
So, our only option to grow this database would be to reboot, and that’s not much better than getting a whole new instance and failing over.
So, Nitro isn’t going to help us.

Caspian - Cooperative oversubscription

We need an entirely different approach, and that is why we built Caspian.
Caspian is a combination of innovations that span a new hypervisor, a heat management planning system, and a few changes to the database engine itself.
Together, these innovations enable Aurora serverless databases to resize in milliseconds in response to whatever changing load on the database happens, and we use an approach, that we’ll look at, that’s called cooperative over-subscription.

Dynamic resource allocation

A Caspian instance is always set up to support the maximum amount of memory available on the host that it's running.
In our example here, that’ll be 256 gigabytes, but unlike Nitro, these resources are not allocated to the hypervisor on the physical host.
Instead, physical memory is allocated separately based on the actual needs of the database running on the instance, and this process is controlled by the Caspian Heat Management System.
So, let’s see what happens when we add a database to our instance.

Example - 16GB instances with Caspian hypervisor

Here, we see a single Caspian instance running on our host with 256 gigabytes of memory.
The instance only needs 16 gigabytes of memory to run its database.
So, it’s asked and been granted that memory by the heat management system.
The important thing to take away here is that the database is running on an OS that believes it has 256 gigabytes of memory, but under the covers, we’re only using 16 gigabytes of memory.

Example - multiple instances with Caspian hypervisor

Just like in our original process-based example, Caspian can run multiple databases and allow them to efficiently share the resources of the underlying host, but unlike our original example, with Caspian, we get all the security and isolation of a hypervisor.
So, this seems to work great, but what happens when our databases need more memory?

Caspian Heat Management System

The Caspian Heat Management System is responsible for managing the resources of the underlying physical host.
When a database wants to grow, it must first ask the heat management system for resources, and when additional resources are available, the heat management system can simply say, yes, and the database can instantly scale.
But what happens when we run out of memory?
In this case, the Caspian Heat Management system replies with, please wait, and then it proceeds to migrate one of the Caspian instances to another physical host with available capacity.

Amazon EC2 live migration

Doing this migration quickly is enabled by high bandwidth, low jitter networking, provided by the EC2 instance, and it results in almost no performance impact to the database while it is being migrated, and after the migration’s complete, our database can scale again. —> WHY? Any deep dive material?

Caspian Heat Management System Prediction

Now, of course, the best time to have resources available is when you need them, and so, the Caspian Heat Management System is actually constantly predicting which databases are going to need memory and optimizing the fleet.
Here you can see a portion of our production Caspian fleet scaling up and scaling down as load changes, and you can see things are changing all the time, but the heat stays balanced across the fleet.
So, Caspian allows us to provide a scalable database and run our infrastructure efficiently.

ℹ️ Are we there yet?

Now, we’re getting pretty close to serverless, but are we there yet?
No.
We haven’t quite reached our destination.
What happens when our resources required extend beyond the limits of the physical host that we’re running on?
Well, there’s nothing we can do.
We're still limited by the size of the physical server, and that's not serverless.

✳️ Database Sharding —> the problem

Definition

Database sharding is a well-known technique for improving a database’s performance beyond the limits of a single server.
It involves horizontally partitioning your data into subsets and distributing it to a bunch of physically separated database servers, called shards.
To shard a database effectively, the goal is to identify a way to partition your data such that all the data needed for frequent accesses resides on one shard.
In this way, the shard’s able to execute the transaction locally and as efficiently as a monolithic database.

Schema Design and Access Patterns

With a little thought in your data schema and application design, database sharding offers a powerful tool to eliminate the scaling limitations of a scale-up database and remove the limits of a server.

Challenges of Sharding

However, there’s a bunch of operational complexity involved in managing a sharded database.
1) Managing custom routing and orchestration layer
- First, you need to write your own routing and orchestration layer.
2) Re-sharding and managing scale-out
- Next, you need to set up and manage all those shards, and if you need massive scalability, you might be managing dozens or even hundreds of shards, and with most applications, load is not uniform across all the shards.
- So, you need to worry about scaling each of these shards up and down based on load, and at some point, you’re likely to need to repartition the shards, and moving this data around while the database is operating is a complicated operational task.
3) Handling distributed transactions
- Finally, things get really complicated when you need to make transactional changes across multiple shards.
What would sharding look like in a serverless world?
So, we’ve asked ourselves, what would database sharding look like in a serverless world, and that’s why, tonight, I’m excited to announce Aurora Limitless Database.

✳️ [NEW 🚀] Amazon Aurora Limitless Database - Managed horizonal scale-out beyond the limits of a single instance (Available In Preview Today)

Product Details

With Limitless Database, there’s no need to worry about the routing of your queries to the correct database shard.
Your application just connects to a single endpoint and has the scalability of a sharded database.
Aurora Limitless Database automatically distributes data across multiple shards, and you can configure Aurora Limitless Database to co-locate rows from different tables on the same shard to minimize having to query multiple shards and maximize your performance, but unlike common sharding approaches, Aurora Limitless Database provides transactional consistency across all your shards.
For peak performance, you still want to localize transactions on shards as much as possible, but as I’ll show you shortly, Aurora Limitless Database uses a unique approach to making these cross-shard transactions perform very well.
This probably sounds too good to be true.
So, let’s have a quick look at how it works.

Innovation 1: New Request Routing Layer

As I mentioned earlier, to distribute each of your queries to a sharded database, you need a routing and query orchestration layer.
- So, of course, we built one of those, and with Aurora Limitless Database, we made a couple of important design decisions.
First, we designed our routing layer to require as little database state as possible.
- Our routers only need a small bit of slowly changing data to understand the scheme of the database and the shard partition scheme, and keeping this layer lightweight means that we can scale quickly, and it allows us to run across multiple availability zones efficiently, providing high availability without the need for customers to manage complex replication.
Second, each of the routers is actually an Aurora database.
- So, we can orchestrate complex queries across multiple database shards and combine the results, allowing you to run distributed transactions across your entire sharded database.

Innovation 2: Fully Elastic Shards

Now, the second big challenge of operating a sharded database is managing all the shards.
Because load varies across the shards, traditional sharded databases require considerable operational work to optimize performance and cost.
Fortunately, every one of the Limitless Database shards runs on Caspian, and this allows each shard to scale up and down as needed, to a point.
What happens when we get to the largest database that we can support on a Caspian server?
We’ve been here before.
Well, fortunately, we have a better option than we do with a non-shared database.
We can split our shard into two new shards, and this is easy to do, because Grover makes it easy for us to clone our database and repartition, and once created, we can use our router fleet to easily and transparently update the routing layer without the database clients seeing any change at all.

ℹ️ Are we there yet?

Now things are getting really serverless, but we have one more thing to think about.

✳️ Challenge: Isolation & Consistency

Sequence Number

We started off our discussion of relational databases this evening talking about how important an ordered log is to building a high-performance relational database, but how can you do this on a sharded distributed database?
On a single server, it’s pretty easy and efficient to maintain a sequence number and use it to order everything that’s happening on the database, but how can you accomplish this on a distributed database?

Options for distributed timekeeping

1) Single Timekeeper
- Well, it turns out you have a few options.
- The first option is to have a single server maintain a sequence number and have all the databases coordinate with this server, but this is going to slow things down quite a bit, and it’s definitely not going to scale.
2) Logical Clock
- So, a second option is to use a logical clock.
- A logical clock is basically a counter that gets passed around and incremented every time two servers interact.
- Logical clocks avoid the scaling limitations of a serialization server, but these distributed logical clocks are very different than a simple sequence number, and to implement a traditional relational database on top of a logical clock would be quite an undertaking.
3) Wall Clock
- So, fortunately, there’s a third option, and that’s to use wall clock type.
- If we can have a synchronized clock across all the servers in our sharded database, then it would be easy to establish the order of events by simply comparing timestamps.
- Now, this sounds like a promising solution, unless you’ve spent time with a clock.

Wall clock variance results in unreliable timekeeping

- The average clock in a server drifts by about one second per month.
- Some will gain time, some will lose time.
- Some will drift a little less, some will drift a little more.

✳️ Amazon Time Sync Service (2017)

Product Details

Now, a second in a month may not sound like that much, but it’s more than enough to make the clock pretty much worthless as a sequence number, and of course, the solution to this is to sync your clocks, and that’s why, five years ago, we launched Amazon Time Sync Service to help EC2 users sync their clocks.
Time Sync provides an easy way to keep EC2 instances accurate to within a millisecond.

Impact of clock skew on performance

So, how would this sort of accuracy, one millisecond, help with our database ordering problem?
In a distributed system, if you want to use a clock to order actions, you’re constrained by the accuracy of your clock.
You actually have to wait until you’re certain that your local clock is ahead of all the other clocks in the system, and it turns out that this actually requires you to wait for twice the amount of time that your clock could be inaccurate, because you have to account for some clocks being faster and some clocks being slower.
So, with our one-millisecond clock sync, we can only order 500 things per second, and that’s not a very high number when you’re trying to build a high-performance database.
So, a few years ago, we asked ourselves if we could find a better solution to synchronize EC2 clocks, and this is the third innovation underpinning Aurora Limitless Database.

Innovation 3: Reducing clock error bounds

We’ve changed the database to use wall clock to create a distributed database log that achieves very high performance, and it’s made possible by a very novel approach to synchronizing our EC2 clocks.

Clock Synchonization

Syncing the clock sounds like it should be as simple as one server telling another server what time it is, but of course, it's not that simple, because the time it takes to send a message from one server to another server varies, and without knowing this propagation time, it's impossible for those clocks to be synced with great precision.
Now, Time Sync protocols calculate this propagation by sending roundtrip messages and subtracting the time spent on one server from the time spent on another server, and this sounds easy enough, but there’s caveats.

Limitations of NTP/PTP

The first thing to understand about these clock sync protocols is they work over the same network that you’re sending your data.
They’re running on the same operating system, using the same network cards, traversing the same network devices, running over the same network fibers, and while all of these things can create variability… and each of these things can create variability in the propagation time, and while it’s small, variations will impact how closely you can synchronize your clock.
Additionally, these protocols rely on being able to update the timestamps in your network packets at the very instant they leave a server or a switch, and most hardware is not optimized to do this.
So, this too introduces variability.
Now, while the downsides I just discussed make things hard, the reality is you can do a pretty good job of syncing your clock on a small network, if you’re willing to devote a significant amount of your network to doing so, but it gets really hard to run these protocols at regional or global scale.
So, we decided to do something a bit more custom, and we were inspired by how clocks are synced in some of the most demanding environments, like particle accelerators, where having clocks synced as closely as possible all the time is a must.

AWS Nitro

It might not surprise you to hear this all begins with Nitro.
After all, many of my favorite stories begin with Nitro.
Nitro is the reason that AWS got started building custom chips, and it remains one of the most important reasons why AWS is leading the way with respect to performance and security in the cloud.
One of the things that Nitro enables, that would be really hard and expensive to do without Nitro, is that we can add very specialized capabilities to our EC2 instances at low cost, and that’s exactly what we did here.
Our latest generation Nitro chips have custom hardware to support accurately syncing their local clock based on a time pulse delivered by a custom designed time synchronization network.

Time Sync Infra Rack (Sync in all hardware!!)

Here, you can see a picture of one of our time synchronization racks.
At the top of the rack is a specialized reference clock that receives a very precise timing signal from a satellite-based atomic clock.
Now, these reference clocks provide incredible accuracy.
They can provide a synchronized clock anywhere in the world to within a few nanoseconds.
That’s a clock accurate to billionths of a second anywhere in the world.
Each of these racks also has a local atomic clock to keep time, in case that satellite is momentarily unavailable, and each of our availability zones has multiple of these time distribution racks.
Now, at the bottom of the rack, you can see the specialized time synchronization network that distributes this timing pulse.
Let’s have a look at that.
This is one of those Time Sync appliances, and in the middle, you see a Nitro chip, and on the right, you see an FPGA.
Together, these are used to implement our time synchronization network, but what you don't see on this slide is a network ASIC, and that’s because these devices don’t route packets.
Instead, they do one thing, and they do one thing only, and that is they synchronize clocks.
In combination with the specialized Nitro cards in our EC2 host, this network distributes a timing pulse directly to every EC2 server, and every step of this distribution is done in hardware.
There's no drivers or operating systems or network buffers to add variability.
How cool is this, a custom designed network for synchronizing clocks?
Well, it’s cool enough that we couldn’t keep it to ourselves, and so, a couple of weeks ago, we announced a new version of Amazon Time Sync.
Amazon Time Sync now gives you a way to synchronize to within microseconds of UTC on supported EC2 instances.
That’s a clock that’s accurate to millionths of a second anywhere in the world.
These accurate clocks can be used to more easily order application events, measure one-way network latency, and increase distributed application transaction speed, and of course, with Aurora Limitless Database, this means we can support hundreds of thousands of orders events per second, and this is why we’re able to run those distributed transactions so efficiently, but our journey is far from over.

— Cache —

Relational databases are not the only server-full for things that we’re investing in and reinventing as serverless things.
Caches are another powerful tool, and they’re used for improving latency and are also critical to cost- effectively scaling your services.

✳️ Amazon ElastiCache (2011)

Product Details

Amazon ElastiCache is our managed caching service.
Much like RDS, ElastiCache minimizes the muck of managing popular caching applications, like Redis or Memcached, and while ElastiCache greatly reduces the work in managing a cache, it’s not very serverless.
In fact, the first thing you do with ElastiCache is you select an instance, because unsurprisingly, caches are very tied to the servers that host them, and that’s because the performance of the cache relies on the memory of the server.

Challenges with predicting cache sizing

If your cache is too small, you're going to evict data that is going to be useful, and if your cache is too large, you're wasting money on memory you don't need.
Like Goldilocks searching for the perfect porridge, finding the perfect cache is an elusive goal.
Typically, you end up provisioning for peak, to assure that you have enough memory when you need it most, and this means that, most of the time, you’re likely running on too large a cache and wasting money, but with a serverless cache, you wouldn’t need to worry about this at all.
Well, today, I’m happy to tell you, you have a serverless cache: Amazon ElastiCache Serverless.

✳️ [NEW 🚀] Amazon ElastiCache Serverless - Serverless configuration for Amazon ElastiCache for Redis and Memcached (GA)

Product Details

With ElastiCache Serverless, there’s no infrastructure to manage or no capacity planning to do.
We get it right on your behalf.
A critical feature of any cache is speed, and we’ve covered you there as well.
The median latency of an ElastiCache lookup on serverless is about half a millisecond, and there’s great outlier latencies as well, and ElastiCache Serverless can scale well beyond what any single server could possibly do, five terabytes of memory.
So, I bet you think I’m going to tell you how this works.

How it works? (Caspian, Sharding)

I would, except you already know.
One of the things that I find exciting about building infrastructure in AWS is that, when we solve an interesting problem, we can also often use the solution to solve problems in other places, and Caspian’s ability to right-size in place is exactly what we need.
Now, hopefully, you recognize this diagram, and if you don’t, it’s probably time to lay off the extra lager you took on the way in, but what we’re looking at here isn’t Aurora Limitless Database.
It’s ElastiCache Serverless (Peter 這裡講錯成 Aurora Serverless), with the cache shards running on Caspian rather than the database shards, but everything else here works the same.
Shards can grow and shrink, and the Caspian Heat Management Service works to maintain their magic to keep the fleet well utilized, and with… of course, there's one difference here, which is the request routing layer.
With ElastiCache, the key to this routing layer is assuring it’s really, really fast, because we can’t afford to add latency to any of the sharded cache requests, and that’s exactly what we did.
Customers are excited about the performance and agility of ElastiCache Serverless.
We know that removing the undifferentiated heavy lifting and muck is critical to enabling our customers to innovate, and one of my favorite parts of Monday Night Live is hearing from these customers.
One of those customers is Riot Games, who are making use of several AWS services to power their new experiences for gamers globally.

ℹ️ Customer: Head of Global Infrastructure and Operations, Riot Games

Introducation

To tell us more about their journey on AWS, please welcome Brent Rich, head of Global Infrastructure and Operations at Riot Games.
Hello, everyone.
As Peter said, my name is Brent Rich with Riot Games.
I was just in South Korea at our League of Legends World Championship that you saw at the end of the video, and it still takes my breath away.
It is my honor to be up here today to talk about our journey behind what you just saw and how, with AWS, Riot was able to supercharge our purpose to make it better to be a player.
Now, at times, the story may seem like it’s jumping back and forth, and things were happening all at once, and then that’s because they were, but with the help of AWS, we ended up doing it all, even when we had to change timelines or pivot priorities, and spoiler alert, it happened a lot.
For me, though, it all started five and a half years ago when I joined Riot Games.

Colocation (Colo) Datacenters - League of Legends (2009)

The thing is, back then, we were a single game company with all our eggs in one basket: League of Legends, and back in 2009, when League launched, it was completely reliant on Colo Datacenters, which we managed, because, let’s be honest, we didn’t trust that anyone else could meet our bar for making live games great, and that worked for about a decade,

Migrate to AWS (2017)

but by 2017, it was taking forever to get things done, and we knew that if over 100 million players worldwide didn’t see Riot investing in and loving our own game, they would leave, even if it is free to play.
So, we started looking at options, and we quickly settled on going all in on cloud.
That meant we we’d migrate to AWS, which had the broad set of services that we needed to run Riot.
We also decided that all new things would be born in the cloud, and for us, that meant new games.
Now, let’s pause for a second, because if you didn’t notice, my intro stated that I work at a place called Riot Games, with an S, right?
Well, that S was the next big opportunity in early 2020 to collaborate on with AWS.

The global launch of Valorant (2020) —> Peeker’s Advantage

Valorant, a tactical shooter game, had very specific design goals, and one in particular presented a unique challenge, which we were able to address thanks to AWS’s Global Cloud infrastructure: Peeker's Advantage.
Peeker’s advantage occurs when a player in motion can push the corner, knowing there is a split second delay for the defender on the other side, which allows the peeker to see the defender first, and one way to address this in first-person shooter games is by adding more places for the defenders to hide, but that increases the luck in the chance of the game, versus it being on the player’s skill alone, and to us, that just feels bad.
So, we made it a priority to mitigate this in Valorant, and we determined that, at highly competitive levels of the game, if server tick rate is at least 128 per second, which requires a ton of compute, and network latency was under 35 milliseconds to players, requiring all those AWS locations, if those two things happen, peekers have no advantage, and if you’re not familiar with how this technically works, in the game, player actions and movements are sent to the server.
The server then updates and sends the simulation state of the ten players back to all of them, and it does this 128 times per second across the internet.
It’s this speed of updates between all ten players to the server and back again that effectively mitigates peeker’s advantage and makes playing much more fair, which we hope is more fun as a result, and by using AWS, Riot was able to launch Valorant globally with fairly low risk.
We were able to provision a bunch of cloud capacity all over the world with no long-term commits, leveraging AWS Regions, Local Zones, and Outposts.
This meant that, if a launch didn’t work out, we would simply shut it down and fail fast, and I’d go cry in a corner, and so, after all of that, 2020 was great, right?

Had a new problem (2020)

Not quite, unfortunately.
While we did have a massive success putting that S in riot games, the pandemic was causing havoc in the world of eSports.
All our competitions, like Worlds, had ground to a halt, and while games were doing great, eSports had no new content for them.
So, we had a new problem.
How could we reinvent remote broadcast and eliminate the need for onsite staff sitting in cramped trucks?
Of course, AWS had a solution for that.
We ended up enabling video encoding and production to happen in the cloud, with our folks accessing it through AWS Workspaces from home.
It was pretty wild.
From the proposal to rollout, it ended up taking all of 11 days, and today, we’ve matured even further.
We leverage AWS to remotely produce events all over the world, all from our remote broadcasting centers in Dublin, Ireland and Seattle, Washington.
This allows producers, editors, and casters to be in one place while all our events are another.

Migration Challenges

Okay.
So, now that we had eSports back producing world-class competitions, and Valorant was successfully launched and doing amazingly, it was time to monetize League and migrate to the cloud, but there were some challenges.
Like first, we had to figure out how to safely deploy, configure, and test 30+ microservices in a world where every service team was used to managing their services independently and how they wanted, and second, using Amazon EKS out of the box just didn’t meet our needs, because some of our games can run, on average, 35 minutes, and we couldn’t just pause or take a container out of service within 15 minutes if required for AWS maintenance.
So, we made a product ask, and AWS delivered a short-term solve and then followed through with a long-term solution for us, but we also knew there’d be some additional expected benefits along the way, and one was uptime.

Benefit: Uptime

In the old days when anything in a Riot datacenter failed in an unexpected way, it was often a large outage that lasted one to three hours, but once we got up and running at AWS, those outages instead turned to hiccups that players barely notice, and the other one was visibility.

Timeline

Anyone here ever spent an unreasonable amount of time trying to figure out what you have, how it’s configured, and who consumed what for whom?

Outcome Numbers

So did we, but with AWS, retrieving this data is pretty much now all an API call away, and so, with all these challenges and benefits, what did we get for it?
Well, we did migrate 14 datacenters.
We modernized a decade-old game for hundreds of millions of very loyal players around the world.
We basically rebuilt the plane in flight.
We modernized remote broadcasting, and we launched several global games in cloud over 36 months.
It was a lot.

Takeaway 1: Make the ask

Okay, as I wrap this up, I’d like you to remember that Riot didn’t actually do anything groundbreaking here.
When there were problems, we looked for solutions, and it just so happened that AWS had them for us, and so, a quick tip: make the ask.
It doesn’t matter what size company you are.
If it makes sense for the broader customer base, AWS just might do it, or like us, there might be a short-term solve,

Takeaway 2: At cloud, don’t assume what was true six months ago, is true now

and one final takeaway, when you're looking at cloud, and whether it serves your needs, don't assume what was true six months ago, is true now.
Cloud moves very fast and it’s always changing.
I know for a fact that we would’ve saved quite a few headaches if we were willing to reevaluate more often versus pursuing our own solutions.
So, please keep an open mind, and with that, thank you very much for listening to our journey, and I can’t wait to see what AWS comes out with next.
Thank you.

— Data Warehouse —

It’s exciting to see the growth of Riot, and how we worked together to enable amazing experiences for gamers.
Well, we’re coming to the end of our journey, but we have one more stop, and it’s a big one.
Let’s look at something that has a reputation for needing the biggest servers around: the data warehouse.
Data warehouses are specialized databases that are designed to work over massive datasets.
They serve a large number of users and process millions of queries that range from routine dashboard requests to ETL processes to complex ad-hoc queries.
And traditionally, they have some of the biggest servers available to ensure they have the resources they need for this business-critical workload, and if you’re taking care of a database, it’s not just picking the right server and storage you need to worry about.
Data warehouses come with a ton of knobs and options that can be used to optimize the performance of your data warehouse and achieve your cost objectives, and all of these knobs need to be manually tuned and retuned to achieve optimal results, which is why we challenged ourselves to remove this undifferentiated heavy lifting for you, so that you can focus on getting business value out of your data warehouse and not on getting a PhD in database systems management.

✳️ Amazon Redshift Serverless (2021)

That’s why we launched Redshift Serverless in 2021.
It automatically scales based on workload, optimizes the data layout of the data warehouse, and automates common data management operations, and while we’re happy with our early success, our most demanding Redshift customers have told us that, for some workloads, they still need to intervene.
Let me show you why.
A day in the life of a large production data warehouse might look something like this.
Most of the time, the data warehouse is running small, well-tuned queries to help with routine business processes and serve reports and dashboards.
It’s important to make sure those queries happen quickly and with predictable latency.
Nothing is worse than having the boss’s favorite dashboard load slowly, and there’s usually large periodic workloads, like ETL jobs, that happen hourly or maybe daily, and these need to be carefully managed to avoid interfering with those smaller workloads, and to make things even more interesting, every once in a while, some really smart data scientist runs a massive unexpected query that takes forever to finish and impacts everything else while it’s running.
So, let’s look at how Redshift Serverless manages these challenges.
Today, Redshift Serverless scales based on query volume, and in a world where all the queries are similar, this works really well.
Here, you can see a number of new queries coming into our data warehouse.
Each query is a circle.
Bigger circles are more complex queries, and the container at the bottom is the pre-configured Redshift capacity scaling unit, called an RPU.
The base RPU is set by the database administrator and is a unit of capacity that Redshift Serverless will use to scale your data warehouse.
Currently, we see a bunch of small-and-medium-sized queries coming in and being run successfully, but what happens when our load increases?
Redshift Serverless uses reactive scaling.
When a number of concurrent queries gets above a certain threshold, additional capacity is provisioned.
However, it takes time, and this adds additional, while the database is setting up additional capacity, and while this barely matters to long-running queries, it can make a big difference to those short-running queries.
Let’s hope none of those queries are on the boss’s dashboard.
Once the new cluster capacity comes online, everything’s back to healthy,
but as I mentioned, queries aren’t uniform, and there’s no magic baseline capacity unit that will optimize for all the work that a data warehouse will see.
Here’s our Redshift Serverless cluster database again, and everything’s running quite well, but what happens when that big ETL job starts?
It ends up in the same capacity as the boss’s dashboard, and that’s not great, because it’s a big query, and everything’s going to slow down a bit while it runs, and even though everything’s going slowly, the data warehouse doesn’t scale up, because we haven’t passed our query threshold.
How do customers deal with this today?
Well, with Redshift and other data warehouses, many customers create a second data warehouse to separate their ETL jobs.
This works, but it’s expensive, inefficient, and complex to manage.
Okay, let’s get back to our happy place.
Everything’s running great again, but not so fast.
Your favorite data scientist, Dave, just fired off the biggest, baddest query you've ever seen, and well, let’s just say everyone is probably calling you right about now, because everything is running really slowly.
How do we make sure that Dave can’t ruin your weekend?
Well, today, I’m excited to announce a massive set of improvements to Amazon Redshift Serverless using next generation AI-driven scaling and optimization.

✳️ [NEW 🚀] Amazon Redshift Serverless Next-generation AI-driven scaling and optimizations - New AI scaling techniques to automatically meet your performance targets (Available In Preview Today)

This is going to be exciting.
Our first innovation is building a detailed model of the expected load of the data warehouse to proactively scale our capacity.
By building a machine learning-based forecasting model trained on the historical data of the data warehouse, we can automatically and proactively adjust capacity based on anticipated query load, but of course, no matter how good we are at predicting what’ll happen next, we’re always going to be surprised.
So, we need to improve our ability to react and make good decisions in the moment.
For example, when Dave sends his next complex masterpiece, we want to avoid bogging down the production cluster and instead schedule that big query on its own capacity, and because Dave’s query is long-running, the extra time probably won’t be noticed by him, and actually, he might get better results, because by optimizing the infrastructure for Dave’s query, he might get his results sooner, but how do we understand each query in real-time as it arrives?
Like before, we use AI.
However, this time our machine learning models help us understand the resource demands and performance of each query, and to do this, the query analyzer creates a feature embedding of each query, which takes into account over 50 unique features, things like query structure, types of joins used, and dataset statistics, and these embeddings enable us to better identify the complexity of each query compared to a more naive approach, like just looking at the SQL text, because as datasets change, what was simple yesterday might not be so simple today.
Now, it turns out that 80% of the queries that run on a data warehouse have been seen before, and for these, we’re able to use our encoding to quickly look up information about the query and know, with high confidence, how it’ll perform and how it might be optimized to run faster.
But what if we haven’t seen a query before?
In this case, we have a second small model that’s trained on all the queries that have ever been run on your data warehouse, and because this model is small, it can generate a reasonable prediction about how quickly a query will execute, and speed on this part of the process is important, because if it’s a small query, like a dashboard request, we don’t want to delay it for long while we do more complex analysis.
We want to immediately execute it, but what if the query is complex?
What if Dave is sending us another masterpiece?
If our small model estimates that the query is complex, we perform another analysis with a much larger model that we’ve trained off all the data warehouse queries that we’ve ever run, enabling us to understand the likely performance of a query in more detail, and one of the things we can use this large model to predict is how a particular query will respond to different cluster, resource levels, and it’s worth doing this additional work, because how queries scale is actually fairly hard to predict.
If everything scaled like this line, linearly, life would be easy, but sometimes adding more resources doesn’t result in significantly better performance.
We actually call this sublinear scaling, and with enough resources, every query exhibits this kind of scaling, but other times, adding resources can actually yield greater than linear performance benefits, or super-linear scaling.
This can happen, for example, if a query is memory constrained and constantly swapping memory in and out.
When you add enough memory to prevent the swapping, you can see a big performance gain.
So, when we're analyzing a query, we need to decide how much resource to allocate to the query on behalf of our customer, and if the query's exhibiting super-linear scaling, or even linear scaling, we want to add more resources, because the query will run faster, and there'll be no additional cost, win-win, but at a certain point with every query, adding more resources has diminishing returns.
You can still get performance improvements, but you do so at additional cost.
So, this is a harder question for AWS to answer.
When should we add resources, and when do we stop?
You need to be in charge of that decision, you, and that’s why you can now tell Redshift Serverless how to optimize your data warehouse query in situations like this.
You specify your cost and performance sweet spot.
On the far left, the data warehouse will run cost optimized and will only add resources if it’s cost neutral.
You move the slider to the right to tell us to more aggressively scale, even if adding additional capacity to go faster adds additional cost, and you pick the spot on this slider that makes sense for your business.
Now that we understand what’s happening, let’s look at how all these pieces come together.
Here’s our data warehouse, perfectly sized for steady state workloads of short-running queries.
Things are quiet, almost too quiet.
Ah, we can see that Redshift Serverless is actually adjusting its base capacity down.
We must be going into the weekend, and I guess the boss doesn’t work on the weekend.
So, Redshift Serverless is saving us some money by reducing its base capacity.
Uh-oh.
Looks like Dave does work on the weekend.
That’s a real doozy of a query, but you can see it’s not being run yet.
It’s actually being analyzed by Redshift Serverless to determine how to best run it, and it looks like we’re going to have to spin up a really big cluster.
So, Dave had to wait while we figured that out, but now, Dave’s query is running much faster than it did last time, and there was no impact to the rest of the data warehouse.
Dave’s happy, the boss is happy, you’re happy.
We’re excited by the results that customers are seeing with the new Redshift Serverless.
This is truly the promise of serverless computing, and it’s getting smarter all the time.

ℹ️ The road keeps on going

Well, this wraps up our journey from server-full to serverless, at least for now.
Something tells me that there’ll be more destinations down the road, but for now, let’s shift gears a little and talk about something else.

Quantum Computing

✳️ Introduction

Normally, we spend a few moments on Monday night diving into our latest AWS chip innovations, like Graviton and Trainium, but this year, I want to talk to you about an entirely different type of chip: the type of chip that runs a quantum computer.
What is a quantum computer?
Well, it’s complicated.
It might be tempting to conceptualize a quantum computer as a really, really fast supercomputer, and while it’d be great to have a super supercomputer, that mental model is not useful.
A quantum computer is really something else entirely.

✳️ Binary Bit

We all know the binary bit.
We build computers from transistors that store a bit, a zero or a one, with the presence or absence of a charge inside of a transistor, and we compose these bits into structures and manipulate them with gates to do things like produce floating point numbers and integers, and then we compose those into things like programming languages and databases and operating systems.

✳️ Qubit (Quantum Bit) - superposition + entanglement

On the other hand, quantum computers are constructed out of quantum bits, or qubits, and a qubit is not a transistor, but rather, a quantum object, like an electron or a photon.
We often use the image of a sphere to represent a qubit, with the states one and zero at the poles of the sphere, and in this way, it’s similar to a classic bit, but you can also have combinations of the zero and one, what’s called superposition, which means the state of the qubit can be at any point on the surface of this sphere, and qubits interact with each other in strange ways, which is called entanglement, and it’s the combination of these two characteristics, superpositions and entanglement, that give quantum computers the ability to solve some very hard problems unbelievably quickly.

✳️ Quantum Computing Usages

So, while we can’t compose faster general-purpose computers with qubits, we can use them to solve some really interesting problems in areas like chemistry, cryptography, process optimization, and this, in turn, can help us in fields like agriculture, renewable energy, and drug discovery.

✳️ AWS Center for Quantum Computing at Caltech

That’s why, in 2019, we established the AWS Center for Quantum Computing on a campus at Caltech, and tonight, I want to give you a little peek into what we’ve been working on.

ℹ️ History

Now, Caltech is a good place to start our discussion, because it’s where quantum computing started 40 years ago.
Richard Feynman (費曼先生) first proposed the idea of building a quantum computer, and he did this, because he knew that a classical computer would never be powerful enough to simulate the interactions of particles in the quantum world.
So, he postulated that the only way to do this would be to use quantum particles themselves.
Now, Feynman knew it would take us a number of scientific breakthroughs before we could actually realize a quantum computer.
About ten years after this, Peter Shore, a mathematician at Bell Labs, surprised everyone with the discovery of a factoring algorithm, a quantum algorithm that could provide exponential speedup over the best classical number factoring algorithms.
If you’ve heard that quantum computers are going to destroy the internet, it’s because of this algorithm.
Now, personally, I’m not losing sleep quite yet, and I’ll show you why in a minute, but Shore’s algorithm is a seminal moment in quantum computing, because it showed that quantum computers could be useful for solving problems beyond just simulating the quantum world.
Several years later, physicists first began experimenting in labs with small quantum systems consisting of two qubits that could interact and operate with computational gates.
This was done in a laboratory setting.
Ten years later, scientists figured out how to produce qubits on the same electrical circuits that we use for classical computing, and this marked the beginning of the engineering race to build a useful quantum computer.

ℹ️ How many qubits would we need?

So, how many qubits would we need to do something useful?
We can probably start doing some interesting things in chemistry and physics with a few hundred high-quality bits, but breaking something like RSA is going to require many thousands of qubits.
You may have seen that quantum computers with hundreds or even thousands of qubits are being produced today.
So, it’s a reasonable question to ask, why haven’t quantum computers started to change the world, and like many things, you have to read the fine print, and the fine print with quantum computers says they’re noisy and prone to error.

✳️ Bit Flip —> Error —> Error Correction

In all the computers we use today, we do occasionally experience errors, bit flips, a zero turns into a one, or a one turns into a zero, and we use error correction to protect ourselves from these sorts of errors.
For example, we use ECC memory, which automatically protects against bit flips in our memory system, and the overhead of error correction in a classical computing system is pretty low, because bit flips are very rare, and because bit flips only happen in one dimension, the zero and the one.

✳️ Phase Flip

So, with memory, for example, we can typically protect memory with one single parity bit for every eight bits of data, very small overhead, but in the quantum world, noise is much harder.
Quantum objects are far more sensitive to noise from their environment, and qubits store more than the simple zero and one, as we’ve seen.
Qubits can actually experience errors on two dimensions.
They can have bit flips, the one and the zero, but they can also have phase flips.

✳️ Qubit Error Rate

So, where are we in the quest to minimize qubit error rate?
Well, 15 years ago, the state of the art that we could do was to produce a qubit that had one error every ten quantum operations.
Five years later (應該是經過十年之後), we could achieve one error in 100 operations,
and today, the state-of-the-art is probably about one error per 1,000 quantum operations.
So, we’ve improved 100X in 15 years.
This is pretty good news.
The problem is qubits are still far too noisy to be useful.

✳️ Goal: 1 error in every 100 billion quantum operations

The quantum algorithms that we get excited about require billions of operations without an error.
We’d need about 22 million more of these screens to show this animation.

ℹ️ What about using error correction?

We can do qubit error correction by encoding a block of physical qubits into what we call a logical qubit, but because the underlying qubits still have a pretty high error rate, we need a lot of physical qubits to create one single logical qubit.
With our current 0.1% error rate, each logical qubit requires a few thousand physical qubits.
Here we see the same chart that we looked at earlier, but now, we’re showing the number of physical qubits that we would need to solve a problem with today's 0.1% error rate, and now, you can see why I’m not losing sleep at night.
Shore’s algorithm will require a very large number of qubits.
So, today’s quantum computers aren’t close to where we need them to be to start solving these big, hard, interesting problems, but the good news is, with error correction, things can improve quite quickly.
With a further improvement in physical error rate, we can reduce the overhead of error correction significantly.
Here, I’ve adjusted the graph for another 100X improvement in physical qubit error rate, and you can see that this starts to get these qubit rates down to something you can get your head around, and this is maybe something we can do in the next ten to fifteen years.
But how do we go faster?
Well, another way we can speed up the quest for a reliable qubit is to implement quantum error correction more efficiently, and that’s what our team at the AWS Center for Quantum Computing has been hard at work doing.

✳️ Quantum Device

Today, I’m happy to give you a sneak peek at that look.
This is a quantum device.
It’s a custom-designed chip that’s totally fabricated in-house by our AWS Quantum Team, and the unique thing about this chip is how it approaches error correction by separating the bit flips from the phase flips.
With this prototype device, we've been able to suppress bit flips errors by 100X by using a passive error correction approach, and this allows us to focus our active error correction on just those phase flips that we looked at, and by combining both of these approaches, we’ve shown that we can theoretically achieve quantum error correction six times more efficiently than with standard error correction approaches.
Now, while we should be mindful that we're still early in the days of this journey to the error corrected quantum computer, this step taken is an important part of developing the hardware efficient and scalable quantum error correction that we need to solve interesting problems on a quantum computer.
We’re going to be sharing more details about these experimental results soon.
So, if you’re interested, stay tuned.

✳️ The Equipment

Now, I originally asked my team to bring some really cool equipment here to show us how these chips are built, and then the team showed me how much it would cost if I broke some of that equipment.
So, instead, I have some really cool pictures of that equipment.
With a strong emphasis on reducing noise in our systems, quantum chips need to be developed very carefully, and like most chips, quantum chips begin their life on a silicon wafer.
One of the challenges of building a quantum computer is it needs to operate inside of a refrigerated environment near absolute zero, but we need to access the qubits and connect them to a classical computer outside of the refrigerator.
So, we have to start by bonding multiple chips together.
One chip contains the sensitive qubits, and the other chip contains the wiring used to read the qubits.
From here, the chip is bonded to the external printed circuit board, and the bonded chip and package is then mounted into a gold-plated copper mount, the thing I had in my hand two seconds ago.
It both provides thermal anchoring to the refrigerator, but it also provides the first level of electromagnetic shielding, which in turn, protects the chip from bit flips.
The assembly is then carefully mounted onto a dilution refrigerator where it’s cooled to within a few tens of thousands of a degree above absolute zero.

✳️ Assembling Quantum Computer (2.5hr time-lapse video)

Now, many of you have probably assembled your own PC.
So, this is a familiar process.
This video here is a time-lapse video of Cody from the AWS Center for Quantum Computing, and it was shot over two and a half hours, and finally, when he is done with this process, we can try out the new quantum chip.
Of course, there’s a lot of cool engineering that goes into any chip before you ever lay hands on the silicon, and having good design tools for laying the chip out is also a key part of the development process and an important area of innovation for us.
For example, the teams developed a full-scale electromagnetic simulation of the chip, and that helps us bring down the environmental noise and produce a higher quality qubit.
They’ve even open sourced their toolkit to run these electromagnetic simulations, and I’m told it works best on Graviton.
Now, we’re excited to share our quantum computing milestone, and we believe the industry is at the beginning of an exciting new period of quantum innovation: the period of the error corrected qubit.
It’ll be an exciting journey, and we’ll be sure to continue to update you as we go.

Closing

So, this about seals the deal for another Monday Night Live.
Enjoy re:Invent, and thank you for coming.

起因

內容大綱

摘要 tl;dr

摘要架構

完整筆記

Opening

Serverless

ℹ️ The promise

ℹ️ 6 attributes

1) Elasticity

2) Cost

3) Sustainability

4) Security

5) Availability

6) Performance

ℹ️ Why isn’t everything we do serverless?

1) Familiarity and legacy code

2) Richness of capability.

ℹ️ Pushing in new directions

Road to Serverless (the major part)

— Relational Databases —

✳️ Amazon RDS (2009)

✳️ Amazon Aurora (2014)

Grover

Main components of a DBMS

The log is the database

Grover has the log

Grover log rewrites

80% less I/O

1) 3~5x price performance

2) Multi-AZ durability

3) Ability to scale out read replicas

4) Serverless scaling of database storage

ℹ️ Are we there yet?

✳️ Amazon Aurora Serverless (2018)

Databases on baremetal

Insufficient isolation

Strong hypervisor level isolation

Example - 8GB instance with a Nitro Hypervisor

Scaleup/down still requires reboot

Caspian - Cooperative oversubscription

Dynamic resource allocation

Example - 16GB instances with Caspian hypervisor

Example - multiple instances with Caspian hypervisor

Caspian Heat Management System

Amazon EC2 live migration

Caspian Heat Management System Prediction

ℹ️ Are we there yet?

✳️ Database Sharding —> the problem

Definition

Schema Design and Access Patterns

Challenges of Sharding

1) Managing custom routing and orchestration layer

2) Re-sharding and managing scale-out

3) Handling distributed transactions

✳️ [NEW 🚀] Amazon Aurora Limitless Database - Managed horizonal scale-out beyond the limits of a single instance (Available In Preview Today)

Product Details

Innovation 1: New Request Routing Layer

Innovation 2: Fully Elastic Shards

ℹ️ Are we there yet?

✳️ Challenge: Isolation & Consistency

Sequence Number

Options for distributed timekeeping

1) Single Timekeeper

2) Logical Clock

3) Wall Clock

Wall clock variance results in unreliable timekeeping

✳️ Amazon Time Sync Service (2017)

Product Details

Impact of clock skew on performance

Innovation 3: Reducing clock error bounds

Clock Synchonization

Limitations of NTP/PTP

AWS Nitro

Time Sync Infra Rack (Sync in all hardware!!)

— Cache —

✳️ Amazon ElastiCache (2011)

Product Details

Challenges with predicting cache sizing

✳️ [NEW 🚀] Amazon ElastiCache Serverless - Serverless configuration for Amazon ElastiCache for Redis and Memcached (GA)