AWS re:Invent 2019 New Launch Feature Amazon ECS Cluster Capacity Providers

Introduction

It has been a long time that have a new feature for Amazon Elastic Container Service (Amazon ECS)! This time, there is a great new feature named Amazon ECS Cluster Capacity Providers. This post will share the session of CON312 [NEW LAUNCH!] Automatic cluster scaling with Amazon ECS of AWS re:Invent 2019.

Amazon ECS is a fully managed container orchestration service. Amazon uses Amazon ECS for their couple products and services. At the mean time, Duolingo, Samsung, GE, and Cook Pad are also Amazon ECS users.



People may spend most of time and attention on K8S or Amazon Elastic Kubernetes Service (Amazon EKS)[1], but actually, the small and beautiful Amazon ECS can bring deployment simplification and operational efficiency to certain application scenarios or small and medium-sized enterprise applications, focusing more on the enterprise application itself and profitability. As an Amazon ECS enthusiast (and as a cool dismantler and a non-follower of the trend), let me share with you this time AWS finally released a new feature for Amazon ECS at re: Invent 2019: “Capacity Providers”.


Before diving into the reason for the launch of Capacity Providers, we can look back at the existing scaling capabilities of Amazon ECS.

The original Amazon ECS put the scaling functions in the ECS Service layer. There is a Service Auto Scaling that can be used to do the auto scaling works.

However, the ECS Cluster layer is handed over to the ASG (Auto Scaling Group) that was established when CloudFormation established the ECS Cluster for management, which is quite inconvenient and unintuitive to set up. Even the original official documents are manually adjusted how many ECS Instances we need through the ECS Cluster interface in the AWS Management Console. Moreover, you cannot directly change the LC parameters of that ASG. Because this ASG is grown from CloudFormation. Therefore, if we want to update the AMI ID to upgrade the ECS Container Agent, that would be a scuffle. (If you don’t believe, your can refer to this and this. One of them will also mislead the reader. In the end, it will be overwritten by CloudFormation’s parameters, which is equivalent to wasting time. But now that we have Capacity Providers, there is no need to tell you which one is better not to read.), we will leave this issue to the end of the article to discuss.)

Now with the layer of Capacity Provider, you can easily connect the two layers of ECS Cluster and ECS Service dynamically, making the Amazon ECS service more convenient and simplifying maintenance costs. The following is a brief introduction based on personal participation in the session of re:Invent 2019 CON312. For more details, please see the official document or other re:Invent breakout session content.(You can say this post is just a travelogue one…

CON312: Automatic cluster scaling with Amazon ECS

The original title of CON312 was [NEW LAUNCH!] Automatic cluster scaling with Amazon ECS. It is a new and hidden session that officially came out after Keynote announced Amazon ECS Cluster Capacity Providers on re:Invent 2019 on Tuesday. I register this session when I found it showing in the AWS Event app, and got a chance to meet face to face with members of the Amazon ECS team. In the future, if you will attend the AWS re:Invent event, remember that don’t schedule your agenda after Wednesday to be too full, so that you may get the opportunity to have first-hand communication with the team behind the new service.

After arriving at the session room, I saw that the title of the session was slightly adjusted to become Simpler Application Management with Amazon ECS. It may seem irrelevant to the original agenda topic, but it also brings the concept of thinking for Application-First. For the concept of Application-First, you can read further on the session: CON325-[NEW LAUNCH!] Enabling application-first thinking with Amazon ECS capacity providers.

Overview

This CON312 is presented by Prasad Sristi and David Westbrook.

Many Amazon’s own services are also placed on Amazon ECS. Amazon SageMaker, which mentioned so many times in re:Invent Keynote, also runs on Amazon ECS.

This time re:Invent also released Amazon ECS’s new support for AWS Fargate Spot on the same day. Which is the line marked New! in the slide.

When Applications runs on Amazon ECS, there are often various aspects and details that need to be considered and taken care of.

But what if these requirements, scenarios, and conditions can be adjusted to run our Applications in the four environments: EC2 Spot, EC2 On-demand, AWS Farget, and AWS Fargate Spot?

Capacity Providers

That’s when it comes to introducing Capacity Providers and Capacity Provider Strategies.

The entire establishment process is to gradually build the ECS Cluster, ASG, and Capacity Provider. All three are one-time actions.

Then define the ECS Tasks and run it, then it will start some instances.

Finally, the ECS Tasks will run on the ECS Instances following the corresponding conditions according to the strategies.

Capacity Provider Strategies

Then introduce the Capacity Provider Strategies. The basic section uses weights for deployment. David also explained in CON325 that Capacity Provider Strategies is used to replace the original ECS Launch Types.

You can play and get with results like this and run some ECS Tasks on Spot to save costs.

With IAM policies, you can have fun by playing more restrictions, such as only running on Spot, or please do not run on the second CP and so on.

Managed Auto Scaling

Similar to the logic of most fully managed services, the purpose is to let everyone focus on the development of Applications. Some infrastructure operations and security let full managed help. (The more vague he is, the more people want to ask, so when this Chalk Talk was last Q & A, everyone was concerned about the rules, reaction speed / frequency, etc. of this auto scaling.)

Live Demo

But it doesn’t matter, this Chalk Talk also brings a live demo! Yesterday (the day before this session) just got a new feature of GA, and today ECS team will show it to everyone.

You can also refer to CON325’s video, which starts at about 34:00, and is also the same Live Demo content brought by David.

Let’s start an ASG first. He sets up 0 instances at first, and waits for a demonstration to start 2 instances later on. (Either create a ECS Cluster first or create a ASG first. You can build a Capacity Provider to associate these two later.)

Define a demonstration ECS Service and ECS Task, which was initially in the PROVISIONING state.

The next step is to wait for the ASG to catch the ball. Skip to introduce the new CloudWatch Metrics AWS/ECS/ManagedScaling corresponding to the Capacity Provider function.

The indicator jumped from 100 to 200, which can be used to expand the desired number of instances in ASG.

The number of Desired in ASG has been changed from 0 to 2 by Managed Auto Scaling.

After 2 instances are started,

ECS Task also runs successfully on these two instances, and the state is displayed as RUNNING.

Try to connect to this ECS Task and it works. A very smooth live demo.

Discussion

  • At the Q & A of the day, the Amazon ECS Capacity Provider function can only be operated on the AWS Management Console interface, and has not yet been operated in ECS CLI v2, CloudFormation, CDK, etc., but it should get ready in 2020. You can follow these issues on Github in the future:

  • One of the common problems encountered in the Amazon ECS Cluster maintenance mentioned at the beginning of this article is to update the AMI ID. Whether it is a self-built AMI or an official AMI specially adjusted for ECS, it will need to update the AMI ID for the ECS Container Agent update or other package upgrades. Prior to the launch of the Capacity Provider layer, this topic was quite troublesome. Even the official blogs had teaching content that could not be practically used (fortunately, there was a correction in a later article), but now with the Capacity Provider, it is basically hitting the sweet spot of operstion!

    • AWS Principal Product Manager Nick Coult directly draws the picture for everyone:

  • Calculation of CapacityProviderReservation (Reference):

    • We will be publishing a deep dive blog that covers how the metric is calculated, but the simple version of it is that CapacityProviderReservation = M/N x 100, where N = the number of instances already in your ASG, and M = the estimated number of instances required to run all existing and provisioning tasks. (A provisioning task is a task that was run using a capacity provider strategy and was assigned to a capacity provider that did not have sufficient capacity to run the task immediately. Tasks run using launch type will not reach this state).
    • If you have provisioning tasks assigned to that capacity provider then M>=1. If you have no provisioning tasks, then M=the number of instances running at least one non-daemon service task.
    • Special cases:
      • If N=0 and M>0, then CapacityProviderReservation = 200.
      • If N=0 and M=0, then CapacityProviderReservation = 100.
  • (20200103 updated) Nick Coult (Principal Product Manager for Amazon Elastic Container Service) published a great and detailed post named Deep Dive on Amazon ECS Cluster Auto Scaling on AWS Container Blog. This is a great reference.

    • If the capacity provider is enabled, when the RunTask API is called, the task has an additional “provisioning” state (originally it entered the “failing” state).
    • Currently, each cluster can hold up to 100 “provisioning” state tasks.
    • The “provisioning” state task cannot be placed in an instance for more than 15 minutes and will switch to the “stopped” state.
    • Target tracking scaling policy:
      • the smaller the target value, the more spare capacity you will have available in your ASG.
      • For example, a target value of 10 means that the scaling policy will adjust N (within the limits available) so that about 90% of your ASG’s instances will not be running any tasks, regardless of how many tasks you run.

Finally, I compiled a list of AWS re:Invent 2019 ECS-related breakout sessions. You can use the session ID to find the corresponding slide or video as cross-reference. The sequence and logical context of agenda planning can be seen from the session ID 200, 300, 400 series. Some sessions are not recorded, it may be in the form of Chalk Talk or Workshop. Such agendas should have available slides [2]。

  • CON213 - Using containers & serverless to accelerate application development (youtube)
  • CON217 - Roadmaps for containers, application networking & Amazon Linux at AWS (youtube) (slide)
  • CON218 - How Amazon Lex uses Amazon ECS to process batches at scale (youtube) (slide)
  • CON312 - [NEW LAUNCH!] Automatic cluster scaling with Amazon ECS (slide)
  • CON313 - Life of your containerized application with new Amazon ECS CLI v2 (slide)
  • CON324 - [NEW LAUNCH!] Cost Optimization with Containers and Spot (youtube) (slide)
  • CON325 - [NEW LAUNCH!] Enabling application-first thinking with Amazon ECS capacity providers (youtube) (slide)
  • CON328 - Improving observability of your containers (youtube) (slide)
  • CON333 - Best practices for CI/CD using AWS Fargate and Amazon ECS (youtube)
  • CON407 - CI/CD and deployment strategies for containers running on AWS
  • CON423 - AWS Fargate under the hood (youtube) (slide)

Remarks

[1] This can be seen on AWS Container Roadmap (ECS, ECR, Fargate, and EKS) which topics everyone cares about.

[2] AWS Event Content re:Invent Slides: Slide download site.


Loading comments…