CLD110 · Module 6 — Compute Services

Amazon EC2
Auto Scaling

Right-size your fleet automatically — match capacity to demand

Auto Scaling High Availability Cost Optimization EC2

Mesa Community College · AWS Academy

The capacity planning problem

Real traffic is never flat. Fixed fleets force a painful tradeoff.

Over-provisioned fleet

Sized for peak traffic — handles Black Friday
Most of the year, instances sit idle
You pay for unused compute constantly
Wasted spend on every quiet night and weekend

Under-provisioned fleet

Sized to save money — lean at off-peak
Traffic spike → instances overwhelmed
Slow response times, errors, outages
Revenue lost during the moments that matter most

Auto Scaling solves this: add instances when demand rises, remove them when it falls — automatically, within minutes.

Core concepts

Auto Scaling Group (ASG)

A logical collection of EC2 instances managed as a unit.

Minimum capacity
Maximum capacity
Desired capacity

ASG maintains the desired count and replaces unhealthy instances.

Launch template

Blueprint used to launch new instances in the group.

AMI, instance type
Key pair, security groups
EBS volumes, user data script

Versioned — you can update and roll out changes.

Scaling policy

Rules that tell the ASG when to add or remove instances.

Target tracking
Step scaling
Scheduled scaling
Predictive scaling

Scaling policy types

Policy type	How it works	Best for
Target tracking	You set a target metric value (e.g., 60% CPU). ASG adjusts fleet to maintain it — like a thermostat.	Most workloads. Simplest to configure.
Step scaling	Alarm thresholds trigger different step sizes. CPU 70–80% → add 1. CPU >80% → add 3.	Workloads with sharp, fast spikes needing a proportional response.
Scheduled scaling	Add or remove capacity on a cron schedule.	Known traffic patterns — scale up at 8 AM, down at 10 PM.
Predictive scaling	ML-driven forecast based on historical patterns. Proactively launches capacity before the spike hits.	Recurring traffic cycles — daily peaks, weekly patterns.

Combine policies: Predictive scaling handles anticipated load; target tracking handles unexpected spikes on top.

Auto Scaling + load balancer

An ASG works with an Application Load Balancer (ALB) to distribute traffic across healthy instances automatically.

When an instance launches, it registers with the ALB automatically. When terminated, it deregisters — connection draining ensures in-flight requests complete first.

Health checks and self-healing

Auto Scaling continuously monitors instance health and automatically replaces failed instances — even without a scaling event.

Health check types

EC2 status checks — is the instance running? Did the OS boot correctly?
ELB health checks — is the application responding to HTTP requests correctly?
Custom health checks — signal from your application via the AWS SDK

ELB health checks are more thorough — they verify the application is healthy, not just that the instance is running.

Self-healing flow

Instance fails health check
ASG marks instance unhealthy
Instance is terminated (connection draining first)
New instance launched from launch template
New instance registers with load balancer
Traffic flows to new instance

Multi-AZ for high availability

Auto Scaling Groups can span multiple Availability Zones. AWS automatically distributes instances to maintain balance.

Why multi-AZ matters

An AZ is a physically separate data center
If one AZ has an outage, traffic is served from others
ASG rebalances instances across AZs automatically
No single point of failure at the infrastructure level

Rebalancing behavior

If AZ-A has 3 instances and AZ-B has 1, ASG launches in AZ-B first
When an instance is terminated, ASG checks AZ balance before launching replacement
Can also explicitly trigger rebalancing via Rebalance action

Best practice: Always configure your ASG across at least 2 Availability Zones, and attach an ALB so traffic can route around an AZ failure.

Cooldowns and instance warm-up

Auto Scaling uses timing controls to avoid thrashing — launching and terminating instances repeatedly in rapid succession.

Cooldown period

After a scaling action, the ASG waits before evaluating policies again.

Default: 300 seconds
Prevents a new metric reading from triggering another scale while the first batch of instances is still initializing
Applies to simple and step scaling policies

Target tracking manages its own cooldown internally.

Instance warm-up

Time to wait before a new instance is counted toward ASG metrics.

A freshly launched instance has low CPU — don't let it hide the real load
During warm-up, the instance is not counted in metrics that trigger further scaling
Set to match your actual boot + initialization time

Key takeaways

Auto Scaling Groups define min/desired/max — the ASG enforces desired count at all times
Launch templates define what to launch; scaling policies define when
Target tracking is the simplest policy — set a target metric, let AWS do the math
Self-healing is automatic — failed instances are replaced without human intervention
Multi-AZ ASGs are required for production high availability

The mental model

Auto Scaling is a thermostat for your fleet. You set the target temperature; it adds or removes heaters to maintain it.

Auto Scaling + ALB + multi-AZ is the classic AWS pattern for elastic, highly available web applications.

Review questions

Recall

What three capacity values does every Auto Scaling Group define?
What is the difference between a launch template and a scaling policy?
Which scaling policy type works like a thermostat?
Why is the instance warm-up period important?

Apply

A news site experiences a massive traffic spike every morning at 8 AM. Which scaling policy type is best suited to handle this pattern?
An instance in an ASG fails its ELB health check. What happens next?
An ASG has 3 instances in AZ-A and 0 in AZ-B. A scale-in event removes one instance. Which AZ does it come from, and why?

Next: AWS Elastic Beanstalk — how Auto Scaling, EC2, and load balancing are wired together automatically so you can focus on your application code.

Amazon EC2Auto Scaling

The capacity planning problem

Core concepts

Scaling policy types

Auto Scaling + load balancer

Health checks and self-healing

Health check types

Self-healing flow

Multi-AZ for high availability

Why multi-AZ matters

Rebalancing behavior

Cooldowns and instance warm-up

Key takeaways

Review questions

Amazon EC2
Auto Scaling