CLD110 · Module 6 — Compute Services

Amazon EC2
Auto Scaling

Right-size your fleet automatically — match capacity to demand
Auto Scaling High Availability Cost Optimization EC2
Mesa Community College · AWS Academy

The capacity planning problem

Real traffic is never flat. Fixed fleets force a painful tradeoff.

Over-provisioned fleet
  • Sized for peak traffic — handles Black Friday
  • Most of the year, instances sit idle
  • You pay for unused compute constantly
  • Wasted spend on every quiet night and weekend
Under-provisioned fleet
  • Sized to save money — lean at off-peak
  • Traffic spike → instances overwhelmed
  • Slow response times, errors, outages
  • Revenue lost during the moments that matter most

Auto Scaling solves this: add instances when demand rises, remove them when it falls — automatically, within minutes.

Core concepts

Auto Scaling Group (ASG)

A logical collection of EC2 instances managed as a unit.


  • Minimum capacity
  • Maximum capacity
  • Desired capacity

ASG maintains the desired count and replaces unhealthy instances.

Launch template

Blueprint used to launch new instances in the group.


  • AMI, instance type
  • Key pair, security groups
  • EBS volumes, user data script

Versioned — you can update and roll out changes.

Scaling policy

Rules that tell the ASG when to add or remove instances.


  • Target tracking
  • Step scaling
  • Scheduled scaling
  • Predictive scaling

Scaling policy types

Policy type How it works Best for
Target tracking You set a target metric value (e.g., 60% CPU). ASG adjusts fleet to maintain it — like a thermostat. Most workloads. Simplest to configure.
Step scaling Alarm thresholds trigger different step sizes. CPU 70–80% → add 1. CPU >80% → add 3. Workloads with sharp, fast spikes needing a proportional response.
Scheduled scaling Add or remove capacity on a cron schedule. Known traffic patterns — scale up at 8 AM, down at 10 PM.
Predictive scaling ML-driven forecast based on historical patterns. Proactively launches capacity before the spike hits. Recurring traffic cycles — daily peaks, weekly patterns.

Combine policies: Predictive scaling handles anticipated load; target tracking handles unexpected spikes on top.

Auto Scaling + load balancer

An ASG works with an Application Load Balancer (ALB) to distribute traffic across healthy instances automatically.

Auto Scaling with ALB Users send requests to an Application Load Balancer, which distributes traffic across EC2 instances in an Auto Scaling Group. When metrics trigger scaling, new instances are added to or removed from the group and automatically registered or deregistered from the ALB. Users HTTP/S traffic ALB Application Load Balancer AUTO SCALING GROUP min: 2 · desired: 3 · max: 8 Instance 1 healthy Instance 2 healthy Instance 3 healthy Instance 4 launching... CloudWatch CPU > 70% → scale out alarm

When an instance launches, it registers with the ALB automatically. When terminated, it deregisters — connection draining ensures in-flight requests complete first.

Health checks and self-healing

Auto Scaling continuously monitors instance health and automatically replaces failed instances — even without a scaling event.

Health check types

  • EC2 status checks — is the instance running? Did the OS boot correctly?
  • ELB health checks — is the application responding to HTTP requests correctly?
  • Custom health checks — signal from your application via the AWS SDK

ELB health checks are more thorough — they verify the application is healthy, not just that the instance is running.

Self-healing flow

  • Instance fails health check
  • ASG marks instance unhealthy
  • Instance is terminated (connection draining first)
  • New instance launched from launch template
  • New instance registers with load balancer
  • Traffic flows to new instance

Multi-AZ for high availability

Auto Scaling Groups can span multiple Availability Zones. AWS automatically distributes instances to maintain balance.

Why multi-AZ matters

  • An AZ is a physically separate data center
  • If one AZ has an outage, traffic is served from others
  • ASG rebalances instances across AZs automatically
  • No single point of failure at the infrastructure level

Rebalancing behavior

  • If AZ-A has 3 instances and AZ-B has 1, ASG launches in AZ-B first
  • When an instance is terminated, ASG checks AZ balance before launching replacement
  • Can also explicitly trigger rebalancing via Rebalance action

Best practice: Always configure your ASG across at least 2 Availability Zones, and attach an ALB so traffic can route around an AZ failure.

Cooldowns and instance warm-up

Auto Scaling uses timing controls to avoid thrashing — launching and terminating instances repeatedly in rapid succession.

Cooldown period

After a scaling action, the ASG waits before evaluating policies again.


  • Default: 300 seconds
  • Prevents a new metric reading from triggering another scale while the first batch of instances is still initializing
  • Applies to simple and step scaling policies

Target tracking manages its own cooldown internally.

Instance warm-up

Time to wait before a new instance is counted toward ASG metrics.


  • A freshly launched instance has low CPU — don't let it hide the real load
  • During warm-up, the instance is not counted in metrics that trigger further scaling
  • Set to match your actual boot + initialization time

Key takeaways

  • Auto Scaling Groups define min/desired/max — the ASG enforces desired count at all times
  • Launch templates define what to launch; scaling policies define when
  • Target tracking is the simplest policy — set a target metric, let AWS do the math
  • Self-healing is automatic — failed instances are replaced without human intervention
  • Multi-AZ ASGs are required for production high availability
The mental model

Auto Scaling is a thermostat for your fleet. You set the target temperature; it adds or removes heaters to maintain it.

Auto Scaling + ALB + multi-AZ is the classic AWS pattern for elastic, highly available web applications.

Review questions

Recall
  • What three capacity values does every Auto Scaling Group define?
  • What is the difference between a launch template and a scaling policy?
  • Which scaling policy type works like a thermostat?
  • Why is the instance warm-up period important?
Apply
  • A news site experiences a massive traffic spike every morning at 8 AM. Which scaling policy type is best suited to handle this pattern?
  • An instance in an ASG fails its ELB health check. What happens next?
  • An ASG has 3 instances in AZ-A and 0 in AZ-B. A scale-in event removes one instance. Which AZ does it come from, and why?

Next: AWS Elastic Beanstalk — how Auto Scaling, EC2, and load balancing are wired together automatically so you can focus on your application code.