Horizontal (Scale Out): Add more instances. Requires stateless app design. Works with ELB. No downtime. Preferred for web tiers.
Vertical (Scale Up): Increase instance size. Requires downtime (stop/start). Has an upper limit. Use for databases that can't distribute horizontally.
Warm Pools
Pre-initialized instances waiting in a stopped state. Reduces cold-start latency for scale-out events. Useful for apps with long initialization.
Target Tracking is the recommended default policy — it adjusts scaling continuously, not just at alarm thresholds. Use Scheduled Scaling for known traffic patterns. Never rely on Simple Scaling alone.
Task 2.1 — Serverless
Serverless Compute Options
Lambda · Fargate · API Gateway · Step Functions
AWS LambdaEvent-driven functions. Max 15 min execution, up to 10 GB memory. Scales to thousands of concurrent executions automatically. Triggered by S3, SQS, DynamoDB streams, API Gateway, EventBridge, and more. Pay per 100ms of compute.
AWS FargateServerless container compute for ECS and EKS. No EC2 instances to manage. Define CPU/memory per task. Scales automatically. Use when workloads need containers but you want to avoid cluster management overhead.
Amazon API GatewayFully managed API layer. REST, HTTP, and WebSocket APIs. Integrates directly with Lambda, Step Functions, and AWS services. Handles throttling, auth (Cognito, Lambda authorizer), caching, and TLS termination. Regional or Edge-optimized deployments.
AWS Step FunctionsServerless workflow orchestration using state machines. Coordinates Lambda, ECS, DynamoDB, SNS, SQS, and more. Standard workflows (1-year max) for audit trails; Express workflows for high-volume, short-duration tasks. Handles retries, error catching, parallel branches.
Lambda concurrency limit is 1,000 per region by default (can be raised). Reserved concurrency guarantees capacity for critical functions; provisioned concurrency eliminates cold starts. Fargate is EC2-free containers; Lambda is function-level serverless.
2.2
Highly Available & Fault-Tolerant Architectures
Elastic Load Balancing · Multi-AZ · Global Accelerator · Health Checks
Task 2.2 — Load Balancing
Elastic Load Balancer Types
ALB · NLB · CLB · GWLB — choosing the right load balancer
Static Elastic IP per AZ, millions of req/sec, PrivateLink
CLB Classic
L4 + L7
HTTP, HTTPS, TCP, SSL
Legacy EC2-Classic only
Deprecated — migrate to ALB or NLB
GWLB Gateway
Layer 3
IP (GENEVE)
Third-party virtual appliances (firewalls, IDS)
Bump-in-the-wire; transparent traffic inspection
ALB for HTTP/S web traffic and microservices routing. NLB for TCP/UDP, static IPs, or extreme performance. GWLB for routing traffic through security appliances. CLB is legacy — always migrate away.
Task 2.2 — ALB
ALB Routing, Target Groups & Sticky Sessions
Advanced ALB patterns for microservices and containers
Routing Rules
Path-based: /api/* → API service, /images/* → S3/media service
Host-based: app.example.com → app servers, api.example.com → API servers
Header-based: Route by HTTP header value
Query string: Route by URL query params
Source IP: Route by CIDR range
Target Group Types
Instances: EC2 instances (by ID)
IP addresses: Any IP (including on-premises)
Lambda: Invoke Lambda per request
ALB: Nested ALB (via NLB)
Sticky Sessions (Session Affinity)
ALB-generated cookie (AWSALB) or app-based cookie
Routes same user to same target for session duration
Can cause uneven load distribution
Avoid when possible — prefer stateless apps with ElastiCache for session state
Connection Draining (Deregistration Delay)
ALB waits for in-flight requests before deregistering a target
Default: 300 seconds (5 min)
Reduce for fast-cycling Lambda / short-lived requests
Ensures graceful scale-in without dropped connections
"Route different URL paths to different microservices behind one ALB" → path-based routing with multiple target groups. "Users lose session state when hitting different instances" → add ElastiCache for distributed session storage, not sticky sessions.
Task 2.2 — Global Routing
Global Accelerator vs. CloudFront
Two very different global distribution services
AWS Global Accelerator
2 static anycast IP addresses globally
Routes traffic via AWS private backbone (not public internet)
Works with TCP & UDP (non-HTTP too)
Instant failover to healthy endpoint (~30 sec)
Targets: ALB, NLB, EC2, Elastic IPs
No caching — pure routing acceleration
Best for: gaming, IoT, VoIP, non-HTTP workloads, static IP requirement
Amazon CloudFront
400+ edge locations worldwide
Content caching at the edge (CDN)
HTTP/S only
Reduces origin load by serving cached content
Integrates with WAF, Shield, ACM, S3
Custom cache behaviors by path
Best for: web content, API acceleration, static asset delivery, S3 distribution
CloudFront caches content at the edge — reduces origin load and latency for repeat requests. Global Accelerator does NOT cache — it just routes traffic faster via AWS backbone. Static IP requirement → Global Accelerator. HTTP caching → CloudFront.
DLQ must be same type as source queue (FIFO DLQ for FIFO queue)
Set up CloudWatch alarm on DLQ depth
Task 2.3 — SNS
Amazon SNS & the Fan-out Pattern
Pub/Sub messaging and one-to-many delivery
SNS Core Concepts
Pub/Sub: publishers send to topics; subscribers receive
Push-based delivery (unlike SQS pull)
Up to 12.5M subscriptions per topic
Message filtering by attribute
FIFO topics available (ordered + deduplication)
SNS Subscribers
SQS queues (most common)
Lambda functions
HTTP/S endpoints
Email / SMS / Mobile push
Kinesis Data Firehose
SNS + SQS Fan-out Pattern
The canonical multi-consumer architecture
📥 Publisher → SNS Topic
↓ fan-out ↓
SQS Queue A Email Service
SQS Queue B Analytics
Lambda Real-time
One publish → delivered to all subscribers in parallel
Each SQS queue buffers for its own consumer independently
Failure in one consumer doesn't affect others
Add new consumers without changing publisher
"Send one event to multiple downstream services" → SNS + SQS fan-out. SNS alone doesn't buffer — add SQS between SNS and slow consumers to absorb bursts. Message filtering avoids creating separate topics per subscriber.
Task 2.3 — Event Streaming
EventBridge & Kinesis
Event-driven routing and real-time data streaming
Amazon EventBridge
Serverless event bus for AWS services, SaaS, custom apps
Routes events via rules to targets (Lambda, SQS, SNS, Step Functions, etc.)
Schema registry: discover and validate event shapes
Matching access patterns to cost-optimal storage tiers
Storage Class
Availability
Min Duration
Retrieval
Use Case
S3 Standard
99.99% (3+ AZs)
None
Instant
Frequently accessed data
S3 Intelligent-Tiering
99.9%
None
Instant / async
Unknown or changing access patterns
S3 Standard-IA
99.9% (3+ AZs)
30 days
Instant
Infrequently accessed; backups, DR copies
S3 One Zone-IA
99.5% (1 AZ)
30 days
Instant
Re-creatable infrequent data; lower cost
S3 Glacier Instant Retrieval
99.9%
90 days
Instant (ms)
Archive accessed once per quarter
S3 Glacier Flexible Retrieval
99.99%
90 days
Minutes–hours
Archival with flexible retrieval time
S3 Glacier Deep Archive
99.99%
180 days
12–48 hours
Long-term regulatory compliance archives
Lifecycle Policies automate transitions: Standard → Standard-IA (30+ days) → Glacier (90+ days) → Deep Archive (180+ days). Minimum storage duration charges apply even if deleted early. Standard-IA has per-GB retrieval fees — cost more if accessed frequently.
Task 2.4 — Storage Types
EBS · EFS · FSx — Block & File Storage
Choosing the right persistent storage for compute workloads
Amazon EBSBlock storage for a single EC2 instance. AZ-scoped — same AZ as the instance. Snapshot to S3 for backup. Types: gp3 (general, recommended), io2 Block Express (highest IOPS, up to 256,000), st1 (throughput HDD), sc1 (cold HDD). Encrypt with KMS. Multi-Attach for io1/io2 (up to 16 Nitro instances, same AZ).
Amazon EFSManaged NFS file system. Multi-AZ, multi-instance access simultaneously. Grows and shrinks automatically (no provisioning). Linux only (NFSv4). Performance modes: General Purpose and Max I/O. Throughput modes: Bursting, Provisioned, Elastic. EFS Infrequent Access tier for cost savings.
Amazon FSx for WindowsManaged Windows file server (SMB protocol). Active Directory integration. Supports DFS namespaces, NTFS, ACLs. Multi-AZ deployment available. Use for Windows workloads that need shared file storage (home directories, SQL Server backups).
Amazon FSx for LustreHigh-performance parallel file system for HPC, ML, financial modeling. Sub-millisecond latency, hundreds of GB/s throughput. Can link to S3 as a data repository. Ideal for compute-intensive workloads needing fast shared storage.
EBS = one instance, one AZ, block storage. EFS = many Linux instances, multi-AZ, file storage. FSx for Windows = Windows SMB, AD-integrated. FSx for Lustre = HPC / ML high-performance. If you see "shared file system for Linux EC2 fleet" → EFS.
Task 2.4 — Databases
RDS High Availability & Read Scaling
Multi-AZ · Read Replicas · RDS Proxy · Aurora
RDS Multi-AZ
Purpose: High availability (not read scaling)
Replication: Synchronous — standby is always in sync
Failover: Automatic, ~1–2 minutes. DNS endpoint flips.
Standby: Not readable (no traffic served)
Cost: 2× instance cost (active + standby)
RDS Read Replicas
Purpose: Read scaling + cross-region DR
Replication: Asynchronous — small replication lag
Failover: Manual promotion — not automatic
Readable: Yes — serve SELECT queries from replica
Cross-region: Yes — enables cross-region DR
Up to 5 replicas per RDS instance (15 for Aurora)
Amazon Aurora
MySQL/PostgreSQL compatible, 5× faster
Shared storage: 6 copies across 3 AZs automatically
Up to 15 Aurora Replicas (sub-10ms replica lag)
Auto-scales storage 10GB → 128TB
Aurora Serverless v2: scale to zero (dev/test)
Aurora Global Database: ~1s cross-region replication
Failover to replica: <30 seconds
RDS Proxy
Connection pooling for Lambda → RDS patterns
Reduces connection overhead during burst
Automatic failover routing (faster than DNS TTL)
Secrets Manager integration for IAM auth
Multi-AZ = HA (sync, automatic failover, standby not readable). Read Replica = scale reads (async, manual failover, replica is readable). Never use Multi-AZ for read scaling — use Read Replicas. Aurora for anything needing >5 read replicas, global DR, or faster failover.
Task 2.4 — NoSQL & Caching
DynamoDB & ElastiCache Resilience Patterns
NoSQL HA · Global Tables · Redis vs. Memcached
DynamoDB Resilience
Built-in Multi-AZ: Data stored across 3 AZs by default
Global Tables: Multi-Region Active/Active; last-writer-wins; <1s replication
PITR: Point-in-time recovery; restore to any second in last 35 days
On-demand backups: Full table backup anytime; no performance impact
Redis: Persistence, replication, pub/sub, Lua scripting, complex data structures (sorted sets, lists). Multi-AZ with automatic failover. Global Datastore for cross-region. Use for sessions, leaderboards, pub/sub, distributed locks.
Memcached: Simple key-value, multi-threaded, no persistence, no replication. Pure caching, simpler ops. Use when you only need a dumb cache and don't need any Redis features.
Caching Patterns
Lazy Loading (Cache-Aside): Check cache first; miss → load from DB → write to cache
Write-Through: Write to cache AND DB simultaneously; always fresh
Backup & Restore: AWS Backup to S3 / cross-region. High RTO (hours).
Pilot Light: RDS read replica in DR region; EC2 AMIs ready to launch. Scale up on disaster.
Warm Standby: Scaled-down ASG + DB in DR region running. Fast scale-up on failover.
Active/Active: Route 53 latency routing; Aurora Global DB; DynamoDB Global Tables. Near-zero RTO/RPO.
AWS Backup is the single-pane-of-glass answer for "centrally manage backups across services and accounts." Data Lifecycle Manager (DLM) is EBS-snapshots-only. For RPO minutes with cross-region: Aurora Global DB + Route 53 failover routing.
Quick Review
Exam Checklist — Domain 2
Can you answer these?
Task 2.1 — Scalable & Loosely Coupled
Target Tracking vs. Step vs. Scheduled scaling policies
Why horizontal scaling is preferred over vertical
Lambda limits: 15 min max, concurrency, cold starts
Fargate (containers) vs. Lambda (functions) tradeoff
Step Functions for workflow orchestration with retry logic