Amazon EC2
Auto Scaling
The capacity planning problem
Real traffic is never flat. Fixed fleets force a painful tradeoff.
- Sized for peak traffic — handles Black Friday
- Most of the year, instances sit idle
- You pay for unused compute constantly
- Wasted spend on every quiet night and weekend
- Sized to save money — lean at off-peak
- Traffic spike → instances overwhelmed
- Slow response times, errors, outages
- Revenue lost during the moments that matter most
Auto Scaling solves this: add instances when demand rises, remove them when it falls — automatically, within minutes.
Core concepts
A logical collection of EC2 instances managed as a unit.
- Minimum capacity
- Maximum capacity
- Desired capacity
ASG maintains the desired count and replaces unhealthy instances.
Blueprint used to launch new instances in the group.
- AMI, instance type
- Key pair, security groups
- EBS volumes, user data script
Versioned — you can update and roll out changes.
Rules that tell the ASG when to add or remove instances.
- Target tracking
- Step scaling
- Scheduled scaling
- Predictive scaling
Scaling policy types
| Policy type | How it works | Best for |
|---|---|---|
| Target tracking | You set a target metric value (e.g., 60% CPU). ASG adjusts fleet to maintain it — like a thermostat. | Most workloads. Simplest to configure. |
| Step scaling | Alarm thresholds trigger different step sizes. CPU 70–80% → add 1. CPU >80% → add 3. | Workloads with sharp, fast spikes needing a proportional response. |
| Scheduled scaling | Add or remove capacity on a cron schedule. | Known traffic patterns — scale up at 8 AM, down at 10 PM. |
| Predictive scaling | ML-driven forecast based on historical patterns. Proactively launches capacity before the spike hits. | Recurring traffic cycles — daily peaks, weekly patterns. |
Combine policies: Predictive scaling handles anticipated load; target tracking handles unexpected spikes on top.
Auto Scaling + load balancer
An ASG works with an Application Load Balancer (ALB) to distribute traffic across healthy instances automatically.
When an instance launches, it registers with the ALB automatically. When terminated, it deregisters — connection draining ensures in-flight requests complete first.
Health checks and self-healing
Auto Scaling continuously monitors instance health and automatically replaces failed instances — even without a scaling event.
Health check types
- EC2 status checks — is the instance running? Did the OS boot correctly?
- ELB health checks — is the application responding to HTTP requests correctly?
- Custom health checks — signal from your application via the AWS SDK
ELB health checks are more thorough — they verify the application is healthy, not just that the instance is running.
Self-healing flow
- Instance fails health check
- ASG marks instance unhealthy
- Instance is terminated (connection draining first)
- New instance launched from launch template
- New instance registers with load balancer
- Traffic flows to new instance
Multi-AZ for high availability
Auto Scaling Groups can span multiple Availability Zones. AWS automatically distributes instances to maintain balance.
Why multi-AZ matters
- An AZ is a physically separate data center
- If one AZ has an outage, traffic is served from others
- ASG rebalances instances across AZs automatically
- No single point of failure at the infrastructure level
Rebalancing behavior
- If AZ-A has 3 instances and AZ-B has 1, ASG launches in AZ-B first
- When an instance is terminated, ASG checks AZ balance before launching replacement
- Can also explicitly trigger rebalancing via
Rebalanceaction
Best practice: Always configure your ASG across at least 2 Availability Zones, and attach an ALB so traffic can route around an AZ failure.
Cooldowns and instance warm-up
Auto Scaling uses timing controls to avoid thrashing — launching and terminating instances repeatedly in rapid succession.
After a scaling action, the ASG waits before evaluating policies again.
- Default: 300 seconds
- Prevents a new metric reading from triggering another scale while the first batch of instances is still initializing
- Applies to simple and step scaling policies
Target tracking manages its own cooldown internally.
Time to wait before a new instance is counted toward ASG metrics.
- A freshly launched instance has low CPU — don't let it hide the real load
- During warm-up, the instance is not counted in metrics that trigger further scaling
- Set to match your actual boot + initialization time
Key takeaways
- Auto Scaling Groups define min/desired/max — the ASG enforces desired count at all times
- Launch templates define what to launch; scaling policies define when
- Target tracking is the simplest policy — set a target metric, let AWS do the math
- Self-healing is automatic — failed instances are replaced without human intervention
- Multi-AZ ASGs are required for production high availability
Auto Scaling is a thermostat for your fleet. You set the target temperature; it adds or removes heaters to maintain it.
Auto Scaling + ALB + multi-AZ is the classic AWS pattern for elastic, highly available web applications.
Review questions
- What three capacity values does every Auto Scaling Group define?
- What is the difference between a launch template and a scaling policy?
- Which scaling policy type works like a thermostat?
- Why is the instance warm-up period important?
- A news site experiences a massive traffic spike every morning at 8 AM. Which scaling policy type is best suited to handle this pattern?
- An instance in an ASG fails its ELB health check. What happens next?
- An ASG has 3 instances in AZ-A and 0 in AZ-B. A scale-in event removes one instance. Which AZ does it come from, and why?
Next: AWS Elastic Beanstalk — how Auto Scaling, EC2, and load balancing are wired together automatically so you can focus on your application code.