AWS Certified Solutions Architect — Associate · SAA-C03

Design High-Performing
Architectures

Domain 3 — Comprehensive Study Guide
Task Statements 3.1 · 3.2 · 3.3 · 3.4 · 3.5

📋 24% of Exam Score — Third Highest-Weight Domain
Domain 3 Overview

What You Need to Know

Task 3.1 — Storage & Database Performance
  • S3 performance patterns & transfer acceleration
  • EBS volume types and IOPS optimization
  • EFS throughput modes
  • RDS performance: IOPS, read replicas, caching
  • DynamoDB capacity modes, GSIs, DAX
Task 3.2 — Compute Performance
  • EC2 instance families and use cases
  • EC2 placement groups
  • Lambda performance optimization
  • Container performance: ECS vs. EKS
  • Spot Instances and Savings Plans
Task 3.3 — Networking Performance
  • Enhanced Networking & ENA / SR-IOV
  • Elastic Fabric Adapter (EFA) for HPC
  • VPC flow optimization patterns
  • CloudFront caching configuration
  • Direct Connect vs. VPN throughput
Task 3.4 & 3.5 — Elastic & Purpose-Built DB
  • Elastic solutions: auto scaling across tiers
  • Relational: RDS, Aurora
  • Key-value & wide column: DynamoDB
  • In-memory: ElastiCache Redis & MemoryDB
  • Search: OpenSearch. Graph: Neptune. Ledger: QLDB
3.1

High-Performing Storage & Databases

S3 Optimization · EBS Types · EFS Modes · RDS Tuning · DynamoDB Scaling

Task 3.1 — S3

S3 Performance Optimization

Throughput · Transfer Acceleration · Multipart Upload · Byte-Range Fetch
S3 Baseline Performance
  • 3,500 PUT/COPY/POST/DELETE requests/sec per prefix
  • 5,500 GET/HEAD requests/sec per prefix
  • No limits on number of prefixes — parallelise across prefixes for linear throughput scaling
  • SSE-KMS may bottleneck at KMS quota (5,500–30,000 req/sec depending on region)
Multipart Upload
  • Required for objects >5 GB; recommended >100 MB
  • Upload parts in parallel — maximize bandwidth
  • Resilient to network interruptions (retry individual parts)
  • Set S3 Lifecycle rule to abort incomplete multipart uploads to avoid cost accumulation
S3 Transfer Acceleration
  • Routes uploads through CloudFront edge locations → AWS backbone
  • Benefits: geographically distant clients uploading large files
  • Extra cost per GB transferred; only use when benefit exceeds cost
  • Test with speed comparison tool before enabling
Byte-Range Fetches
  • Download specific byte ranges of an object in parallel
  • Dramatically speeds up large file downloads
  • Also useful to fetch only the header of a file (e.g., first 50 bytes of a custom file format)
  • Combine with multipart upload for full symmetric throughput
"Users globally are uploading large files slowly" → S3 Transfer Acceleration. "Download a 10 GB object faster" → Byte-Range Fetches in parallel. "S3 throttling at high request rates" → distribute objects across multiple prefixes (randomize key names).
Task 3.1 — EBS

EBS Volume Types & Performance

Choosing the right block storage for your IOPS and throughput needs
Volume Type Category Max IOPS Max Throughput Use Case
gp3 SSD — General 16,000 1,000 MB/s Default choice: boot volumes, dev/test, most workloads
gp2 SSD — General 16,000 250 MB/s Legacy — migrate to gp3 for better perf at same cost
io2 Block Express SSD — Provisioned 256,000 4,000 MB/s Mission-critical: SAP HANA, large Oracle, high-IOPS DB
io1 SSD — Provisioned 64,000 1,000 MB/s I/O-intensive databases requiring consistent IOPS
st1 HDD — Throughput 500 500 MB/s Big data, data warehousing, log processing (sequential)
sc1 HDD — Cold 250 250 MB/s Infrequently accessed data, lowest cost HDD
gp3 decouples IOPS from size — you can provision up to 16,000 IOPS regardless of volume size (unlike gp2 which ties IOPS to size at 3 IOPS/GB). HDD volumes (st1, sc1) cannot be used as boot volumes. Need >64K IOPS → io2 Block Express only.
Task 3.1 — DynamoDB

DynamoDB Performance Optimization

Capacity Modes · GSIs · Partition Keys · DAX
Capacity Modes
  • Provisioned: Set RCUs and WCUs manually. Use Auto Scaling to adjust. Predictable traffic, lower cost at steady state.
  • On-Demand: Scales instantly to any traffic level. Pay-per-request. Up to 2× previous peak. Best for spiky or unknown traffic. Higher per-unit cost.
  • 1 RCU = 1 strongly consistent 4 KB read/sec (or 2 eventually consistent)
  • 1 WCU = 1 write of up to 1 KB/sec
Partition Key Design
  • High cardinality = better distribution across partitions
  • Bad: using "status" as PK (only a few values → hot partition)
  • Good: UserID, OrderID (many unique values → even spread)
  • Write sharding: Append a random suffix (user#1, user#2…) to spread writes across partitions for high-volume keys
  • Composite sort key: Enables range queries within a partition
Global Secondary Indexes (GSIs)
  • Alternative access pattern without full table scan
  • Can be created and deleted anytime (unlike LSIs)
  • Separate provisioned throughput from the base table
  • GSI throttling causes base table throttling — provision GSI capacity generously
  • Max 20 GSIs per table; project only needed attributes
DynamoDB Accelerator (DAX)
  • In-memory write-through cache for DynamoDB
  • Reduces read latency: milliseconds → microseconds
  • Fully API-compatible — change endpoint only
  • Cluster with 1–10 nodes; Multi-AZ for HA
  • Ideal for read-heavy, eventually consistent workloads (gaming, social, product catalog)
  • NOT useful for strongly consistent reads — bypasses DAX
Hot partition = uneven key distribution → some partitions throttled. Fix with high-cardinality keys or write sharding. "Microsecond DynamoDB reads" → DAX. "Query by non-primary-key attribute" → GSI. Strongly consistent reads bypass DAX and cost 2× RCUs.
3.2

High-Performing Compute

EC2 Instance Families · Placement Groups · Lambda Optimization · Containers

Task 3.2 — EC2

EC2 Instance Families

Matching instance type to workload characteristics
Family Examples Optimized For Use Cases
General Purpose t3, t4g, m6i, m7g Balanced CPU / RAM / network Web servers, small DBs, dev environments
Compute Optimized c6i, c7g, c6gn High CPU, low memory ratio Batch processing, HPC, gaming, media encoding, scientific modeling
Memory Optimized r6i, r7g, x2idn, u- High RAM, large in-memory datasets In-memory DBs (Redis, SAP HANA), real-time big data, large caches
Storage Optimized i3, i4i, d3, h1 High local NVMe I/O, fast storage NoSQL DBs (Cassandra), data warehousing, distributed file systems
Accelerated Computing p4, p5, g5, inf2, trn1 GPUs / custom ML chips ML training, deep learning, GPU rendering, video transcoding
Burstable t3, t4g Baseline CPU + burst credits Variable workloads; use T3 Unlimited for sustained burst
Compute (C) = CPU-intensive batch. Memory (R, X) = large in-memory datasets. Storage (I, D) = fast local disk. Accelerated (P, G) = GPU / ML. General (M, T) = default web tier. Match the workload bottleneck to the instance family's strength.
Task 3.2 — Placement Groups

EC2 Placement Groups

Controlling physical placement for performance or fault isolation
Cluster Placement Group
  • All instances in same rack, same AZ
  • Lowest possible network latency (10 Gbps+ between instances)
  • Highest risk: if rack fails, all instances fail
  • Best for: HPC, tightly-coupled distributed computing, MPI workloads
  • Use with Enhanced Networking (ENA) for max throughput
Use when: ultra-low latency between instances is required
Spread Placement Group
  • Each instance on a different physical rack
  • Max 7 instances per AZ per group
  • Maximizes fault isolation — single hardware failure affects max 1 instance
  • Can span multiple AZs
  • Best for: small critical instance sets (primary + replicas), quorum clusters
Use when: critical instances must survive hardware failures independently
Partition Placement Group
  • Instances divided across logical partitions
  • Each partition on separate rack
  • Up to 7 partitions per AZ; hundreds of instances per partition
  • Topology-aware: applications know which partition they're in
  • Best for: large distributed systems — HDFS, HBase, Kafka, Cassandra
Use when: large distributed workload needs fault domain awareness
Cluster = performance (low latency, high throughput, same rack, same AZ). Spread = max HA (different rack per instance, 7-instance limit). Partition = large distributed systems needing fault domain awareness (Hadoop, Kafka). Performance vs. resilience is the tradeoff for Cluster vs. Spread.
Task 3.2 — Lambda

Lambda Performance Optimization

Cold Starts · Memory · Provisioned Concurrency · Power Tuning
Cold Starts & Mitigation
  • Cold start: New execution environment initialized — adds latency (100ms–1s+ for JVM/large packages)
  • Warm invocation: Execution env reused — fast
  • Provisioned Concurrency: Pre-warms N execution environments — eliminates cold starts. Use for latency-sensitive APIs.
  • Snap Start (Java): Caches initialized snapshot. Reduces cold start by ~90%.
  • Minimize package size — load SDK modules selectively
  • Move initialization code outside handler (init once per env)
Memory & CPU Scaling
  • Lambda CPU is proportional to memory — more memory = more vCPU
  • Range: 128 MB → 10,240 MB (10 GB)
  • Increasing memory can reduce duration, potentially reducing cost
  • AWS Lambda Power Tuning tool: find the optimal memory/cost trade-off
  • Ephemeral storage: 512 MB → 10,240 MB (/tmp)
Concurrency Controls
  • Reserved Concurrency: Limits max concurrent executions for a function — prevents it from consuming all account concurrency
  • Provisioned Concurrency: Pre-initializes N instances for instant invocation
  • Account limit: 1,000 concurrent executions per region (default, can be raised)
  • Throttled invocations → 429 error or DLQ (async)
"Lambda API has intermittent latency spikes" → Provisioned Concurrency (eliminates cold starts). "Lambda consuming too many concurrent executions" → Reserved Concurrency (caps it). Increasing memory increases CPU proportionally — often the fastest way to improve Lambda performance.
3.3

High-Performing Networking

Enhanced Networking · EFA · CloudFront Caching · Direct Connect · VPN

Task 3.3 — Instance Networking

Enhanced Networking & Elastic Fabric Adapter

Maximizing instance-to-instance network throughput
Enhanced Networking (ENA)
  • Elastic Network Adapter — modern enhanced networking standard
  • Up to 100 Gbps network throughput on supported instances
  • Higher bandwidth, lower latency, lower CPU overhead vs. legacy virtualized networking
  • Uses SR-IOV: hardware virtualization allows direct NIC access from VM
  • Available on most current-gen instances (C5, M5, R5, etc.)
  • No extra cost — enable via instance type selection
Elastic Fabric Adapter (EFA)
  • Network interface for HPC and ML workloads
  • OS bypass: application communicates directly with NIC, bypassing OS kernel
  • Enables MPI and NCCL (ML) inter-node communication at near bare-metal speeds
  • Required for tightly coupled HPC jobs (weather modeling, CFD, genomics)
  • Supported on: C5n, P4, Trn1, Hpc6a families
  • Works with Cluster Placement Group for max performance
Network Performance by Tier
  • Standard virtualized: Up to 10 Gbps — most instances
  • Enhanced Networking (ENA): Up to 25–100 Gbps — C5, M5, R5 etc.
  • Elastic Fabric Adapter: 100 Gbps + OS bypass — HPC/ML
  • Placement Group (Cluster): 10 Gbps+ between instances in same PG
"HPC cluster needs lowest inter-node latency for MPI" → EFA + Cluster Placement Group. "High network throughput between EC2 instances" → ENA-enabled instance types. EFA is a superset of ENA — it includes all ENA capabilities plus OS bypass.
Task 3.3 — CloudFront

CloudFront Performance Configuration

Cache Behaviors · TTL · Origin Shield · Lambda@Edge
Cache Behavior Optimization
  • Cache Hit Ratio: % of requests served from cache — maximize this
  • TTL (Time to Live): Longer TTL = better cache hit ratio; shorter = fresher content
  • Cache Key: What CloudFront uses to identify a cached object. Add headers, query strings, cookies only if they change the response — otherwise they fragment the cache unnecessarily
  • Cache Policies: Managed policies (CachingOptimized, CachingDisabled) or custom
  • Compression: Enable Gzip/Brotli compression at edge to reduce transfer size
Origin Shield
  • Additional caching layer between edge locations and your origin
  • Reduces origin load: fewer cache misses hit the origin directly
  • Especially useful for origins that are slow or expensive to query (e.g., on-premises, cross-region)
  • Add a small cost per request through Origin Shield
Lambda@Edge & CloudFront Functions
  • CloudFront Functions: Lightweight JS at edge (sub-ms). URL rewrites, header manipulation, A/B testing. Runs at all 400+ PoPs.
  • Lambda@Edge: Full Lambda capabilities at regional edge locations. HTTP auth, custom redirects, body transformation, dynamic content. Higher latency and cost than CF Functions.
Low cache hit ratio? → Check cache key — remove unnecessary query strings/headers that fragment caching. "Origin getting hammered despite CloudFront" → Enable Origin Shield. "Run auth logic at edge before reaching origin" → Lambda@Edge or CloudFront Functions (for simpler logic).
Task 3.3 — Hybrid Connectivity

AWS Direct Connect vs. Site-to-Site VPN

High-throughput, low-latency hybrid network connectivity
Feature AWS Direct Connect Site-to-Site VPN
Connection typeDedicated physical circuitIPsec tunnel over public internet
Bandwidth1 Gbps, 10 Gbps, 100 Gbps; or sub-1G via hostedUp to ~1.25 Gbps per tunnel (ECMP for more)
LatencyConsistent, low — dedicated pathVariable — internet routing
Setup timeWeeks to months (physical provisioning)Minutes to hours (software config)
CostHigher — port hours + data transferLower — per-hour + data transfer
EncryptionNot encrypted by default — add IPsec or MACsecEncrypted by default (IPsec)
Use casesLarge data migration, consistent throughput, hybrid cloud, complianceQuick setup, redundant backup, lower cost
Direct Connect Gateway

Connect one Direct Connect circuit to multiple VPCs across regions and accounts. Avoids needing a separate circuit per VPC or region.

Resilient DX Pattern

Primary: Direct Connect for performance. Backup: Site-to-Site VPN over internet. Failover is automatic if DX fails. Best practice for production hybrid connectivity.

"Consistent, high-bandwidth connectivity to on-premises" → Direct Connect. "Fastest setup for hybrid connectivity" → Site-to-Site VPN. "Backup for Direct Connect" → Site-to-Site VPN. Direct Connect is NOT encrypted — add MACsec or an IPsec VPN over the DX connection for encryption.
3.4 · 3.5

Elastic Solutions & Purpose-Built Databases

Elastic Architecture Patterns · Right Database for the Right Workload

Task 3.4 — Elastic Architectures

Designing Elastic Solutions Across All Tiers

Web · App · Database · Caching — scaling together
Web Tier CloudFront + WAF at edge. ALB distributes across AZs. EC2 ASG (target tracking on ALB request count) or Fargate tasks auto-scale. Static assets served from S3 via CloudFront — zero compute cost at scale.
Application Tier EC2 ASG or ECS/EKS with auto scaling on CPU/memory/custom metrics. Lambda for event-driven processing — scales from 0 to thousands instantly. SQS queue depth as a scaling metric for async worker fleets.
Database Tier Aurora Serverless v2 scales write capacity from 0.5 to 128 ACUs instantly. DynamoDB On-Demand scales reads/writes instantly. RDS Read Replicas offload reads. Aurora Auto Scaling adds/removes read replicas based on CPU or connections.
Caching Tier ElastiCache Redis Cluster Mode distributes data across shards — add shards to scale horizontally. DAX cluster nodes add read capacity for DynamoDB. CloudFront scales caching globally at edge — no management required.
SQS queue depth as a scaling metric is the canonical pattern for auto-scaling async worker fleets — as messages accumulate, scale out workers; as queue drains, scale in. Use ASG target tracking policy with a custom CloudWatch metric.
Task 3.5 — Purpose-Built Databases

Right Database for the Right Workload

The AWS database portfolio — each engine optimized for a specific access pattern
Relational (RDBMS)
  • RDS: MySQL, PostgreSQL, Oracle, SQL Server, MariaDB
  • Aurora: MySQL/PostgreSQL-compatible, 5× performance
  • ACID transactions, complex queries, structured data
  • Use for: ERP, CRM, financial systems, traditional apps
Key-Value / Wide Column
  • DynamoDB: Fully managed NoSQL, single-digit ms, infinite scale
  • High throughput at low latency, no complex joins
  • Use for: shopping carts, session state, user profiles, leaderboards, IoT
  • DAX for microsecond read latency
In-Memory
  • ElastiCache Redis: Cache, sessions, pub/sub, leaderboards
  • ElastiCache Memcached: Simple distributed cache
  • MemoryDB for Redis: Durable in-memory DB (not just cache). Redis API + Multi-AZ durability
  • Use for: <1ms latency reads, caching DB results, real-time dashboards
Document
  • DocumentDB: MongoDB-compatible managed document store
  • JSON documents, flexible schema
  • Use for: content management, catalogs, user data, mobile backends
  • Scales storage automatically to 64 TB
Analytics & Search
  • Redshift: Petabyte-scale data warehouse. Columnar. OLAP.
  • Redshift Serverless: Auto-scales warehouse capacity
  • OpenSearch Service: Full-text search, log analytics (Elasticsearch-compatible)
  • Athena: SQL queries on S3 data — serverless, pay per query
Specialized
  • Neptune: Graph database — social networks, fraud detection, knowledge graphs
  • QLDB: Ledger database — immutable, cryptographically verifiable transaction log
  • Timestream: Time-series database — IoT, DevOps metrics, telemetry
  • Keyspaces: Managed Apache Cassandra
Task 3.5 — Decision Guide

Database Selection Decision Tree

Map workload characteristics to the right engine
Complex SQL queries, transactions, structured relational data
RDS or Aurora
High-throughput key-value or simple queries at any scale
DynamoDB (+ DAX for microsecond reads)
Sub-millisecond reads; caching layer in front of DB
ElastiCache Redis or Memcached
Redis as primary durable database (not just cache)
MemoryDB for Redis
Full-text search, log analytics, Elasticsearch workloads
Amazon OpenSearch Service
Petabyte analytics, complex OLAP queries, data warehouse
Amazon Redshift
SQL queries on S3 data without loading into a DB
Amazon Athena
Graph relationships (social, fraud, recommendations)
Amazon Neptune
IoT sensor data, metrics, time-ordered events
Amazon Timestream
Immutable audit log, cryptographic verification
Amazon QLDB
Quick Review

Exam Checklist — Domain 3

Can you answer these?
Task 3.1 — Storage & Database Performance
  • S3 request rate limits per prefix and how to distribute load
  • S3 Transfer Acceleration vs. Byte-Range Fetches use cases
  • gp3 (decouple IOPS/size) vs. io2 (ultra-high IOPS) vs. st1 (throughput HDD)
  • DynamoDB On-Demand vs. Provisioned capacity mode trade-offs
  • Hot partition problem: cause, detection, and fix (key design / sharding)
  • DAX: write-through cache, microsecond reads, doesn't help strong consistency
Task 3.2 — Compute Performance
  • EC2 instance families: C (compute), R/X (memory), I/D (storage), P/G (GPU)
  • Placement Groups: Cluster (perf) vs. Spread (HA, 7-instance limit) vs. Partition (large distributed)
  • Lambda: Provisioned Concurrency eliminates cold starts; memory ∝ CPU
  • T3 Unlimited for sustained burst; Reserved Concurrency caps Lambda executions
Task 3.3 — Networking Performance
  • ENA = Enhanced Networking (SR-IOV, up to 100 Gbps, reduced latency)
  • EFA = ENA + OS bypass for HPC/MPI workloads
  • CloudFront cache hit ratio: minimize cache key fragmentation
  • Origin Shield reduces origin requests by centralizing cache misses
  • Direct Connect (dedicated, consistent, not encrypted) vs. VPN (quick, encrypted, variable)
  • DX + VPN backup = resilient hybrid connectivity best practice
Tasks 3.4 & 3.5 — Elastic & Purpose-Built DB
  • SQS queue depth as auto scaling metric for worker fleets
  • Aurora Serverless v2 and DynamoDB On-Demand for elastic database tiers
  • Redshift (OLAP warehouse) vs. Athena (serverless S3 SQL) vs. OpenSearch (full-text)
  • Neptune (graph), QLDB (immutable ledger), Timestream (IoT time-series)
  • MemoryDB (durable Redis primary DB) vs. ElastiCache (cache only)
Quick Reference

Service → Performance Scenario Quick Map

Storage Performance
  • S3 Transfer Acceleration → global upload speed
  • S3 Byte-Range Fetches → parallel large downloads
  • S3 Multipart Upload → parallel large uploads
  • EBS gp3 → general purpose, decoupled IOPS
  • EBS io2 → highest IOPS for critical DBs
  • EBS st1 → high sequential throughput (HDD)
Compute Performance
  • C family → CPU-bound batch / HPC
  • R / X family → large in-memory datasets
  • I / D family → fast local NVMe storage
  • P / G family → GPU / ML training
  • Cluster PG → lowest inter-instance latency
  • Lambda Provisioned Concurrency → no cold starts
Network Performance
  • ENA → 25–100 Gbps instance networking
  • EFA → OS-bypass for HPC / MPI / NCCL
  • CloudFront → global HTTP cache & edge compute
  • Origin Shield → reduce origin request volume
  • Direct Connect → dedicated consistent bandwidth
  • Global Accelerator → static IP + AWS backbone routing
Database Performance
  • DAX → microsecond DynamoDB reads
  • ElastiCache Redis → sub-ms app-level caching
  • MemoryDB → durable primary Redis DB
  • RDS Read Replicas → scale SQL read workloads
  • Aurora → 15 replicas, faster failover, auto-storage
  • Aurora Serverless v2 → elastic write capacity
Analytics & Search
  • Redshift → petabyte OLAP warehouse
  • Athena → serverless SQL on S3
  • OpenSearch → full-text search & log analytics
  • Timestream → IoT / metrics time-series
  • Glue → ETL pipeline for data lake
  • EMR → Hadoop / Spark big data processing
Specialized Workloads
  • Neptune → graph relationships
  • QLDB → immutable audit ledger
  • DocumentDB → MongoDB-compatible documents
  • Keyspaces → Apache Cassandra-compatible
  • EFA + Cluster PG → HPC tightly-coupled MPI
  • FSx for Lustre → HPC parallel file system

Domain 3 Complete

You're ready for Domain 3

24% of SAA-C03 · Design High-Performing Architectures
Good luck on the exam!

3.1 — Storage & DB Performance 3.2 — Compute Performance 3.3 — Networking Performance 3.4 — Elastic Solutions 3.5 — Purpose-Built Databases