Throughput · Transfer Acceleration · Multipart Upload · Byte-Range Fetch
S3 Baseline Performance
3,500 PUT/COPY/POST/DELETE requests/sec per prefix
5,500 GET/HEAD requests/sec per prefix
No limits on number of prefixes — parallelise across prefixes for linear throughput scaling
SSE-KMS may bottleneck at KMS quota (5,500–30,000 req/sec depending on region)
Multipart Upload
Required for objects >5 GB; recommended >100 MB
Upload parts in parallel — maximize bandwidth
Resilient to network interruptions (retry individual parts)
Set S3 Lifecycle rule to abort incomplete multipart uploads to avoid cost accumulation
S3 Transfer Acceleration
Routes uploads through CloudFront edge locations → AWS backbone
Benefits: geographically distant clients uploading large files
Extra cost per GB transferred; only use when benefit exceeds cost
Test with speed comparison tool before enabling
Byte-Range Fetches
Download specific byte ranges of an object in parallel
Dramatically speeds up large file downloads
Also useful to fetch only the header of a file (e.g., first 50 bytes of a custom file format)
Combine with multipart upload for full symmetric throughput
"Users globally are uploading large files slowly" → S3 Transfer Acceleration. "Download a 10 GB object faster" → Byte-Range Fetches in parallel. "S3 throttling at high request rates" → distribute objects across multiple prefixes (randomize key names).
Task 3.1 — EBS
EBS Volume Types & Performance
Choosing the right block storage for your IOPS and throughput needs
Volume Type
Category
Max IOPS
Max Throughput
Use Case
gp3
SSD — General
16,000
1,000 MB/s
Default choice: boot volumes, dev/test, most workloads
gp2
SSD — General
16,000
250 MB/s
Legacy — migrate to gp3 for better perf at same cost
io2 Block Express
SSD — Provisioned
256,000
4,000 MB/s
Mission-critical: SAP HANA, large Oracle, high-IOPS DB
io1
SSD — Provisioned
64,000
1,000 MB/s
I/O-intensive databases requiring consistent IOPS
st1
HDD — Throughput
500
500 MB/s
Big data, data warehousing, log processing (sequential)
sc1
HDD — Cold
250
250 MB/s
Infrequently accessed data, lowest cost HDD
gp3 decouples IOPS from size — you can provision up to 16,000 IOPS regardless of volume size (unlike gp2 which ties IOPS to size at 3 IOPS/GB). HDD volumes (st1, sc1) cannot be used as boot volumes. Need >64K IOPS → io2 Block Express only.
Task 3.1 — DynamoDB
DynamoDB Performance Optimization
Capacity Modes · GSIs · Partition Keys · DAX
Capacity Modes
Provisioned: Set RCUs and WCUs manually. Use Auto Scaling to adjust. Predictable traffic, lower cost at steady state.
On-Demand: Scales instantly to any traffic level. Pay-per-request. Up to 2× previous peak. Best for spiky or unknown traffic. Higher per-unit cost.
Matching instance type to workload characteristics
Family
Examples
Optimized For
Use Cases
General Purpose
t3, t4g, m6i, m7g
Balanced CPU / RAM / network
Web servers, small DBs, dev environments
Compute Optimized
c6i, c7g, c6gn
High CPU, low memory ratio
Batch processing, HPC, gaming, media encoding, scientific modeling
Memory Optimized
r6i, r7g, x2idn, u-
High RAM, large in-memory datasets
In-memory DBs (Redis, SAP HANA), real-time big data, large caches
Storage Optimized
i3, i4i, d3, h1
High local NVMe I/O, fast storage
NoSQL DBs (Cassandra), data warehousing, distributed file systems
Accelerated Computing
p4, p5, g5, inf2, trn1
GPUs / custom ML chips
ML training, deep learning, GPU rendering, video transcoding
Burstable
t3, t4g
Baseline CPU + burst credits
Variable workloads; use T3 Unlimited for sustained burst
Compute (C) = CPU-intensive batch. Memory (R, X) = large in-memory datasets. Storage (I, D) = fast local disk. Accelerated (P, G) = GPU / ML. General (M, T) = default web tier. Match the workload bottleneck to the instance family's strength.
Task 3.2 — Placement Groups
EC2 Placement Groups
Controlling physical placement for performance or fault isolation
Cluster Placement Group
All instances in same rack, same AZ
Lowest possible network latency (10 Gbps+ between instances)
Highest risk: if rack fails, all instances fail
Best for: HPC, tightly-coupled distributed computing, MPI workloads
Use with Enhanced Networking (ENA) for max throughput
Use when: ultra-low latency between instances is required
Spread Placement Group
Each instance on a different physical rack
Max 7 instances per AZ per group
Maximizes fault isolation — single hardware failure affects max 1 instance
Can span multiple AZs
Best for: small critical instance sets (primary + replicas), quorum clusters
Use when: critical instances must survive hardware failures independently
Partition Placement Group
Instances divided across logical partitions
Each partition on separate rack
Up to 7 partitions per AZ; hundreds of instances per partition
Topology-aware: applications know which partition they're in
Best for: large distributed systems — HDFS, HBase, Kafka, Cassandra
Use when: large distributed workload needs fault domain awareness
Cluster = performance (low latency, high throughput, same rack, same AZ). Spread = max HA (different rack per instance, 7-instance limit). Partition = large distributed systems needing fault domain awareness (Hadoop, Kafka). Performance vs. resilience is the tradeoff for Cluster vs. Spread.
Task 3.2 — Lambda
Lambda Performance Optimization
Cold Starts · Memory · Provisioned Concurrency · Power Tuning
Cold Starts & Mitigation
Cold start: New execution environment initialized — adds latency (100ms–1s+ for JVM/large packages)
Warm invocation: Execution env reused — fast
Provisioned Concurrency: Pre-warms N execution environments — eliminates cold starts. Use for latency-sensitive APIs.
Move initialization code outside handler (init once per env)
Memory & CPU Scaling
Lambda CPU is proportional to memory — more memory = more vCPU
Range: 128 MB → 10,240 MB (10 GB)
Increasing memory can reduce duration, potentially reducing cost
AWS Lambda Power Tuning tool: find the optimal memory/cost trade-off
Ephemeral storage: 512 MB → 10,240 MB (/tmp)
Concurrency Controls
Reserved Concurrency: Limits max concurrent executions for a function — prevents it from consuming all account concurrency
Provisioned Concurrency: Pre-initializes N instances for instant invocation
Account limit: 1,000 concurrent executions per region (default, can be raised)
Throttled invocations → 429 error or DLQ (async)
"Lambda API has intermittent latency spikes" → Provisioned Concurrency (eliminates cold starts). "Lambda consuming too many concurrent executions" → Reserved Concurrency (caps it). Increasing memory increases CPU proportionally — often the fastest way to improve Lambda performance.
Elastic Network Adapter — modern enhanced networking standard
Up to 100 Gbps network throughput on supported instances
Higher bandwidth, lower latency, lower CPU overhead vs. legacy virtualized networking
Uses SR-IOV: hardware virtualization allows direct NIC access from VM
Available on most current-gen instances (C5, M5, R5, etc.)
No extra cost — enable via instance type selection
Elastic Fabric Adapter (EFA)
Network interface for HPC and ML workloads
OS bypass: application communicates directly with NIC, bypassing OS kernel
Enables MPI and NCCL (ML) inter-node communication at near bare-metal speeds
Required for tightly coupled HPC jobs (weather modeling, CFD, genomics)
Supported on: C5n, P4, Trn1, Hpc6a families
Works with Cluster Placement Group for max performance
Network Performance by Tier
Standard virtualized: Up to 10 Gbps — most instances
Enhanced Networking (ENA): Up to 25–100 Gbps — C5, M5, R5 etc.
Elastic Fabric Adapter: 100 Gbps + OS bypass — HPC/ML
Placement Group (Cluster): 10 Gbps+ between instances in same PG
"HPC cluster needs lowest inter-node latency for MPI" → EFA + Cluster Placement Group. "High network throughput between EC2 instances" → ENA-enabled instance types. EFA is a superset of ENA — it includes all ENA capabilities plus OS bypass.
Cache Hit Ratio: % of requests served from cache — maximize this
TTL (Time to Live): Longer TTL = better cache hit ratio; shorter = fresher content
Cache Key: What CloudFront uses to identify a cached object. Add headers, query strings, cookies only if they change the response — otherwise they fragment the cache unnecessarily
Cache Policies: Managed policies (CachingOptimized, CachingDisabled) or custom
Compression: Enable Gzip/Brotli compression at edge to reduce transfer size
Origin Shield
Additional caching layer between edge locations and your origin
Reduces origin load: fewer cache misses hit the origin directly
Especially useful for origins that are slow or expensive to query (e.g., on-premises, cross-region)
Add a small cost per request through Origin Shield
Lambda@Edge & CloudFront Functions
CloudFront Functions: Lightweight JS at edge (sub-ms). URL rewrites, header manipulation, A/B testing. Runs at all 400+ PoPs.
Lambda@Edge: Full Lambda capabilities at regional edge locations. HTTP auth, custom redirects, body transformation, dynamic content. Higher latency and cost than CF Functions.
Low cache hit ratio? → Check cache key — remove unnecessary query strings/headers that fragment caching. "Origin getting hammered despite CloudFront" → Enable Origin Shield. "Run auth logic at edge before reaching origin" → Lambda@Edge or CloudFront Functions (for simpler logic).
Large data migration, consistent throughput, hybrid cloud, compliance
Quick setup, redundant backup, lower cost
Direct Connect Gateway
Connect one Direct Connect circuit to multiple VPCs across regions and accounts. Avoids needing a separate circuit per VPC or region.
Resilient DX Pattern
Primary: Direct Connect for performance. Backup: Site-to-Site VPN over internet. Failover is automatic if DX fails. Best practice for production hybrid connectivity.
"Consistent, high-bandwidth connectivity to on-premises" → Direct Connect. "Fastest setup for hybrid connectivity" → Site-to-Site VPN. "Backup for Direct Connect" → Site-to-Site VPN. Direct Connect is NOT encrypted — add MACsec or an IPsec VPN over the DX connection for encryption.
3.4 · 3.5
Elastic Solutions & Purpose-Built Databases
Elastic Architecture Patterns · Right Database for the Right Workload
Task 3.4 — Elastic Architectures
Designing Elastic Solutions Across All Tiers
Web · App · Database · Caching — scaling together
Web TierCloudFront + WAF at edge. ALB distributes across AZs. EC2 ASG (target tracking on ALB request count) or Fargate tasks auto-scale. Static assets served from S3 via CloudFront — zero compute cost at scale.
Application TierEC2 ASG or ECS/EKS with auto scaling on CPU/memory/custom metrics. Lambda for event-driven processing — scales from 0 to thousands instantly. SQS queue depth as a scaling metric for async worker fleets.
Database TierAurora Serverless v2 scales write capacity from 0.5 to 128 ACUs instantly. DynamoDB On-Demand scales reads/writes instantly. RDS Read Replicas offload reads. Aurora Auto Scaling adds/removes read replicas based on CPU or connections.
Caching TierElastiCache Redis Cluster Mode distributes data across shards — add shards to scale horizontally. DAX cluster nodes add read capacity for DynamoDB. CloudFront scales caching globally at edge — no management required.
SQS queue depth as a scaling metric is the canonical pattern for auto-scaling async worker fleets — as messages accumulate, scale out workers; as queue drains, scale in. Use ASG target tracking policy with a custom CloudWatch metric.
Task 3.5 — Purpose-Built Databases
Right Database for the Right Workload
The AWS database portfolio — each engine optimized for a specific access pattern