AWS Solutions Architect — Domain 3: Design High-Performing Architectures

Domain 3 Overview

What You Need to Know

Task 3.1 — Storage & Database Performance

S3 performance patterns & transfer acceleration
EBS volume types and IOPS optimization
EFS throughput modes
RDS performance: IOPS, read replicas, caching
DynamoDB capacity modes, GSIs, DAX

Task 3.2 — Compute Performance

EC2 instance families and use cases
EC2 placement groups
Lambda performance optimization
Container performance: ECS vs. EKS
Spot Instances and Savings Plans

Task 3.3 — Networking Performance

Enhanced Networking & ENA / SR-IOV
Elastic Fabric Adapter (EFA) for HPC
VPC flow optimization patterns
CloudFront caching configuration
Direct Connect vs. VPN throughput

Task 3.4 & 3.5 — Elastic & Purpose-Built DB

Elastic solutions: auto scaling across tiers
Relational: RDS, Aurora
Key-value & wide column: DynamoDB
In-memory: ElastiCache Redis & MemoryDB
Search: OpenSearch. Graph: Neptune. Ledger: QLDB

Task 3.1 — S3

S3 Performance Optimization

Throughput · Transfer Acceleration · Multipart Upload · Byte-Range Fetch

S3 Baseline Performance

3,500 PUT/COPY/POST/DELETE requests/sec per prefix
5,500 GET/HEAD requests/sec per prefix
No limits on number of prefixes — parallelise across prefixes for linear throughput scaling
SSE-KMS may bottleneck at KMS quota (5,500–30,000 req/sec depending on region)

Multipart Upload

Required for objects >5 GB; recommended >100 MB
Upload parts in parallel — maximize bandwidth
Resilient to network interruptions (retry individual parts)
Set S3 Lifecycle rule to abort incomplete multipart uploads to avoid cost accumulation

S3 Transfer Acceleration

Routes uploads through CloudFront edge locations → AWS backbone
Benefits: geographically distant clients uploading large files
Extra cost per GB transferred; only use when benefit exceeds cost
Test with speed comparison tool before enabling

Byte-Range Fetches

Download specific byte ranges of an object in parallel
Dramatically speeds up large file downloads
Also useful to fetch only the header of a file (e.g., first 50 bytes of a custom file format)
Combine with multipart upload for full symmetric throughput

"Users globally are uploading large files slowly" → S3 Transfer Acceleration. "Download a 10 GB object faster" → Byte-Range Fetches in parallel. "S3 throttling at high request rates" → distribute objects across multiple prefixes (randomize key names).

S3 performance questions focus on three main scenarios: 1. Global upload speed: Transfer Acceleration routes through CloudFront PoPs to the AWS backbone. It's not always faster — there's a speed comparison tool at accelerate-speed-comparison.s3-accelerate.amazonaws.com. Only pay for it if it actually improves speed. 2. Large object uploads: Multipart upload splits an object into parts, uploads them in parallel, then assembles. The minimum part size is 5 MB. Never upload a single 5 GB or larger object as a single PUT. 3. Request rate throttling: S3 auto-scales to 3,500 writes and 5,500 reads per second per prefix. A prefix is anything before the object name in the key. If all your keys start with the same prefix, you hit the limit faster. Old advice was to randomize key names to spread across prefixes — with modern S3 partitioning this matters less, but is still occasionally tested. KMS throttling: if you use SSE-KMS and hit high request rates, KMS API quotas can become the bottleneck. Request a quota increase if needed.

Task 3.1 — EBS

EBS Volume Types & Performance

Choosing the right block storage for your IOPS and throughput needs

Volume Type	Category	Max IOPS	Max Throughput	Use Case
gp3	SSD — General	16,000	1,000 MB/s	Default choice: boot volumes, dev/test, most workloads
gp2	SSD — General	16,000	250 MB/s	Legacy — migrate to gp3 for better perf at same cost
io2 Block Express	SSD — Provisioned	256,000	4,000 MB/s	Mission-critical: SAP HANA, large Oracle, high-IOPS DB
io1	SSD — Provisioned	64,000	1,000 MB/s	I/O-intensive databases requiring consistent IOPS
st1	HDD — Throughput	500	500 MB/s	Big data, data warehousing, log processing (sequential)
sc1	HDD — Cold	250	250 MB/s	Infrequently accessed data, lowest cost HDD

gp3 decouples IOPS from size — you can provision up to 16,000 IOPS regardless of volume size (unlike gp2 which ties IOPS to size at 3 IOPS/GB). HDD volumes (st1, sc1) cannot be used as boot volumes. Need >64K IOPS → io2 Block Express only.

EBS volume type selection is a classic exam comparison. Key decision points: gp3 vs gp2: Always recommend gp3. It's the same cost as gp2 but offers better baseline performance. With gp3 you independently provision IOPS and throughput — you're not locked into the 3 IOPS/GB formula of gp2. io1 vs io2 vs io2 Block Express: io2 provides better durability than io1 (99.999% vs 99.8-99.9%). io2 Block Express is for extreme workloads needing over 64,000 IOPS. These are for latency-sensitive databases where consistent, predictable I/O matters. st1 and sc1: HDD-based, sequential workloads only. st1 is for active workloads like log streaming, big data. sc1 is for cold data that's accessed infrequently. Neither can be a boot volume. IOPS vs Throughput: IOPS is the number of I/O operations per second (important for random I/O — databases). Throughput is MB/s (important for sequential I/O — log streaming, video processing). The right metric depends on the workload access pattern.

Task 3.1 — DynamoDB

DynamoDB Performance Optimization

Capacity Modes · GSIs · Partition Keys · DAX

Capacity Modes

Provisioned: Set RCUs and WCUs manually. Use Auto Scaling to adjust. Predictable traffic, lower cost at steady state.
On-Demand: Scales instantly to any traffic level. Pay-per-request. Up to 2× previous peak. Best for spiky or unknown traffic. Higher per-unit cost.
1 RCU = 1 strongly consistent 4 KB read/sec (or 2 eventually consistent)
1 WCU = 1 write of up to 1 KB/sec

Partition Key Design

High cardinality = better distribution across partitions
Bad: using "status" as PK (only a few values → hot partition)
Good: UserID, OrderID (many unique values → even spread)
Write sharding: Append a random suffix (user#1, user#2…) to spread writes across partitions for high-volume keys
Composite sort key: Enables range queries within a partition

Global Secondary Indexes (GSIs)

Alternative access pattern without full table scan
Can be created and deleted anytime (unlike LSIs)
Separate provisioned throughput from the base table
GSI throttling causes base table throttling — provision GSI capacity generously
Max 20 GSIs per table; project only needed attributes

DynamoDB Accelerator (DAX)

In-memory write-through cache for DynamoDB
Reduces read latency: milliseconds → microseconds
Fully API-compatible — change endpoint only
Cluster with 1–10 nodes; Multi-AZ for HA
Ideal for read-heavy, eventually consistent workloads (gaming, social, product catalog)
NOT useful for strongly consistent reads — bypasses DAX

Hot partition = uneven key distribution → some partitions throttled. Fix with high-cardinality keys or write sharding. "Microsecond DynamoDB reads" → DAX. "Query by non-primary-key attribute" → GSI. Strongly consistent reads bypass DAX and cost 2× RCUs.

DynamoDB performance is a deep topic. Key concepts: Partitions: DynamoDB automatically partitions data. Each partition has a limit of 3,000 RCUs and 1,000 WCUs. If all your traffic goes to one partition key value (hot partition), that partition becomes a bottleneck. Good key design avoids this. On-Demand vs Provisioned: On-Demand is not always more expensive. For very spiky workloads with long quiet periods, on-demand can be cheaper because you don't pay for idle capacity. For steady workloads, provisioned + auto scaling is cheaper. DAX caching: DAX is a write-through cache. Writes go to both DAX and DynamoDB. Reads check DAX first (item cache). Cache hit avoids DynamoDB read entirely. Cache miss fetches from DynamoDB and caches the result. DAX is NOT useful for strongly consistent reads — those must go to DynamoDB directly. If your application requires strong consistency, DAX doesn't help. GSIs: Think of them as separate tables with different primary keys, updated asynchronously. They enable "query by email address" even if your base table uses UserID as the partition key.

Task 3.2 — EC2

EC2 Instance Families

Matching instance type to workload characteristics

Family	Examples	Optimized For	Use Cases
General Purpose	t3, t4g, m6i, m7g	Balanced CPU / RAM / network	Web servers, small DBs, dev environments
Compute Optimized	c6i, c7g, c6gn	High CPU, low memory ratio	Batch processing, HPC, gaming, media encoding, scientific modeling
Memory Optimized	r6i, r7g, x2idn, u-	High RAM, large in-memory datasets	In-memory DBs (Redis, SAP HANA), real-time big data, large caches
Storage Optimized	i3, i4i, d3, h1	High local NVMe I/O, fast storage	NoSQL DBs (Cassandra), data warehousing, distributed file systems
Accelerated Computing	p4, p5, g5, inf2, trn1	GPUs / custom ML chips	ML training, deep learning, GPU rendering, video transcoding
Burstable	t3, t4g	Baseline CPU + burst credits	Variable workloads; use T3 Unlimited for sustained burst

Compute (C) = CPU-intensive batch. Memory (R, X) = large in-memory datasets. Storage (I, D) = fast local disk. Accelerated (P, G) = GPU / ML. General (M, T) = default web tier. Match the workload bottleneck to the instance family's strength.

EC2 instance family selection appears in scenario-based performance questions. The key is matching the bottleneck. "CPU is always maxed out" → Compute Optimized (C family). "Application loads the entire dataset into RAM" → Memory Optimized (R or X family). "In-memory database like Redis is the bottleneck" → Memory Optimized. "Cassandra cluster needs fast local disk I/O" → Storage Optimized (I family with NVMe SSD). "ML model training needs GPUs" → Accelerated Computing (P or G family). "Variable web traffic, mostly light load with occasional spikes" → Burstable (T family). T3 Unlimited is important: by default, T3 instances can exhaust burst credits and throttle. T3 Unlimited keeps bursting at extra charge. On the exam, if a web server "randomly becomes slow" and uses T3 instances, the answer is often "switch to T3 Unlimited or a larger instance type."

Task 3.2 — Placement Groups

EC2 Placement Groups

Controlling physical placement for performance or fault isolation

Cluster Placement Group

All instances in same rack, same AZ
Lowest possible network latency (10 Gbps+ between instances)
Highest risk: if rack fails, all instances fail
Best for: HPC, tightly-coupled distributed computing, MPI workloads
Use with Enhanced Networking (ENA) for max throughput

Use when: ultra-low latency between instances is required

Spread Placement Group

Each instance on a different physical rack
Max 7 instances per AZ per group
Maximizes fault isolation — single hardware failure affects max 1 instance
Can span multiple AZs
Best for: small critical instance sets (primary + replicas), quorum clusters

Use when: critical instances must survive hardware failures independently

Partition Placement Group

Instances divided across logical partitions
Each partition on separate rack
Up to 7 partitions per AZ; hundreds of instances per partition
Topology-aware: applications know which partition they're in
Best for: large distributed systems — HDFS, HBase, Kafka, Cassandra

Use when: large distributed workload needs fault domain awareness

Cluster = performance (low latency, high throughput, same rack, same AZ). Spread = max HA (different rack per instance, 7-instance limit). Partition = large distributed systems needing fault domain awareness (Hadoop, Kafka). Performance vs. resilience is the tradeoff for Cluster vs. Spread.

Placement groups are tested in performance AND resilience questions. Know all three: Cluster: The performance winner. Same rack means incredibly low latency (sub-millisecond) and high bandwidth between instances. The tradeoff is all eggs in one basket — a rack failure takes all instances. Best for MPI-based HPC jobs where inter-node communication is the bottleneck. Spread: The resilience winner. Each instance is on separate underlying hardware. The hard limit of 7 instances per AZ limits this to small sets of critical instances like a primary database and its replicas. Partition: The scale winner for distributed systems. Think Hadoop clusters or Kafka brokers where you want fault isolation (a rack failure shouldn't take out more than one partition) but you need more than 7 instances. AWS tells each instance which partition it's in, so topology-aware applications can make smart replication decisions. Common exam trap: "Deploy a Hadoop cluster with 100 nodes for fault isolation" → Partition group (not Spread — Spread is limited to 7 per AZ).

Task 3.2 — Lambda

Lambda Performance Optimization

Cold Starts · Memory · Provisioned Concurrency · Power Tuning

Cold Starts & Mitigation

Cold start: New execution environment initialized — adds latency (100ms–1s+ for JVM/large packages)
Warm invocation: Execution env reused — fast
Provisioned Concurrency: Pre-warms N execution environments — eliminates cold starts. Use for latency-sensitive APIs.
Snap Start (Java): Caches initialized snapshot. Reduces cold start by ~90%.
Minimize package size — load SDK modules selectively
Move initialization code outside handler (init once per env)

Memory & CPU Scaling

Lambda CPU is proportional to memory — more memory = more vCPU
Range: 128 MB → 10,240 MB (10 GB)
Increasing memory can reduce duration, potentially reducing cost
AWS Lambda Power Tuning tool: find the optimal memory/cost trade-off
Ephemeral storage: 512 MB → 10,240 MB (/tmp)

Concurrency Controls

Reserved Concurrency: Limits max concurrent executions for a function — prevents it from consuming all account concurrency
Provisioned Concurrency: Pre-initializes N instances for instant invocation
Account limit: 1,000 concurrent executions per region (default, can be raised)
Throttled invocations → 429 error or DLQ (async)

"Lambda API has intermittent latency spikes" → Provisioned Concurrency (eliminates cold starts). "Lambda consuming too many concurrent executions" → Reserved Concurrency (caps it). Increasing memory increases CPU proportionally — often the fastest way to improve Lambda performance.

Lambda performance questions focus on two main issues: cold starts and concurrency limits. Cold starts: when Lambda needs a new execution environment, it must download your package, start the runtime, and run your initialization code. For Python/Node, this is fast (tens of milliseconds). For Java or .NET with large packages, this can be 1-2 seconds. Solutions: Provisioned Concurrency (most effective), SnapStart for Java, minimize package size, move initialization outside the handler function. Memory and CPU: Lambda doesn't let you configure CPU directly. When you increase memory, you also get proportionally more vCPU. So if your function is CPU-bound (heavy computation, image processing), increasing memory is the lever to improve performance. The AWS Lambda Power Tuning tool helps find the sweet spot. Reserved vs Provisioned Concurrency: Reserved = ceiling (max N executions, excess throttled). Provisioned = floor (N environments pre-warmed). Both are types of concurrency management but solve different problems.

Task 3.3 — Instance Networking

Enhanced Networking & Elastic Fabric Adapter

Maximizing instance-to-instance network throughput

Enhanced Networking (ENA)

Elastic Network Adapter — modern enhanced networking standard
Up to 100 Gbps network throughput on supported instances
Higher bandwidth, lower latency, lower CPU overhead vs. legacy virtualized networking
Uses SR-IOV: hardware virtualization allows direct NIC access from VM
Available on most current-gen instances (C5, M5, R5, etc.)
No extra cost — enable via instance type selection

Elastic Fabric Adapter (EFA)

Network interface for HPC and ML workloads
OS bypass: application communicates directly with NIC, bypassing OS kernel
Enables MPI and NCCL (ML) inter-node communication at near bare-metal speeds
Required for tightly coupled HPC jobs (weather modeling, CFD, genomics)
Supported on: C5n, P4, Trn1, Hpc6a families
Works with Cluster Placement Group for max performance

Network Performance by Tier

Standard virtualized: Up to 10 Gbps — most instances
Enhanced Networking (ENA): Up to 25–100 Gbps — C5, M5, R5 etc.
Elastic Fabric Adapter: 100 Gbps + OS bypass — HPC/ML
Placement Group (Cluster): 10 Gbps+ between instances in same PG

"HPC cluster needs lowest inter-node latency for MPI" → EFA + Cluster Placement Group. "High network throughput between EC2 instances" → ENA-enabled instance types. EFA is a superset of ENA — it includes all ENA capabilities plus OS bypass.

Enhanced Networking and EFA are tested in HPC and performance-sensitive architecture questions. Standard networking goes through software virtualization — multiple context switches between application, OS, and hypervisor. SR-IOV (single-root I/O virtualization) allows the VM to access the hardware NIC more directly, reducing latency and CPU overhead. This is Enhanced Networking. EFA takes this further with OS-bypass: the application communicates with the network hardware directly, skipping the OS kernel entirely. This enables the inter-node communication latencies needed for tightly coupled HPC workloads where thousands of nodes need to synchronize frequently. MPI (Message Passing Interface) is the standard for HPC inter-node communication. NCCL (NVIDIA Collective Communications Library) is equivalent for distributed deep learning. Both benefit enormously from EFA. For the exam: if you see "HPC," "MPI," "tightly coupled simulation," or "distributed ML training needing fast inter-node communication" → EFA + Cluster Placement Group.

Task 3.3 — CloudFront

CloudFront Performance Configuration

Cache Behaviors · TTL · Origin Shield · Lambda@Edge

Cache Behavior Optimization

Cache Hit Ratio: % of requests served from cache — maximize this
TTL (Time to Live): Longer TTL = better cache hit ratio; shorter = fresher content
Cache Key: What CloudFront uses to identify a cached object. Add headers, query strings, cookies only if they change the response — otherwise they fragment the cache unnecessarily
Cache Policies: Managed policies (CachingOptimized, CachingDisabled) or custom
Compression: Enable Gzip/Brotli compression at edge to reduce transfer size

Origin Shield

Additional caching layer between edge locations and your origin
Reduces origin load: fewer cache misses hit the origin directly
Especially useful for origins that are slow or expensive to query (e.g., on-premises, cross-region)
Add a small cost per request through Origin Shield

Lambda@Edge & CloudFront Functions

CloudFront Functions: Lightweight JS at edge (sub-ms). URL rewrites, header manipulation, A/B testing. Runs at all 400+ PoPs.
Lambda@Edge: Full Lambda capabilities at regional edge locations. HTTP auth, custom redirects, body transformation, dynamic content. Higher latency and cost than CF Functions.

Low cache hit ratio? → Check cache key — remove unnecessary query strings/headers that fragment caching. "Origin getting hammered despite CloudFront" → Enable Origin Shield. "Run auth logic at edge before reaching origin" → Lambda@Edge or CloudFront Functions (for simpler logic).

CloudFront performance is tested from two angles: maximizing cache hit ratio and reducing origin load. Cache hit ratio is the key performance metric. If 90% of requests are served from cache, only 10% hit your origin. If the ratio is low, the cache key is likely too specific — removing unnecessary cookies, query strings, and headers from the cache key lets more requests match the same cached object. Origin Shield acts as a central cache region between edge locations and your origin. Without it, each of CloudFront's 400+ PoPs might independently request the same object from your origin on a cache miss. With Origin Shield, those requests consolidate to one shield location that caches aggressively. Lambda@Edge vs CloudFront Functions: both run code at the edge, but at different scales. CloudFront Functions run at all PoPs with sub-millisecond latency (but limited JS runtime, no network access). Lambda@Edge runs at ~13 regional edge locations with full Lambda capabilities (can make network requests, use Node/Python, access DynamoDB etc.). Use CF Functions for URL rewrites and header manipulation; use Lambda@Edge for complex auth, body transformation, or external API calls.

Task 3.3 — Hybrid Connectivity

AWS Direct Connect vs. Site-to-Site VPN

High-throughput, low-latency hybrid network connectivity

Feature	AWS Direct Connect	Site-to-Site VPN
Connection type	Dedicated physical circuit	IPsec tunnel over public internet
Bandwidth	1 Gbps, 10 Gbps, 100 Gbps; or sub-1G via hosted	Up to ~1.25 Gbps per tunnel (ECMP for more)
Latency	Consistent, low — dedicated path	Variable — internet routing
Setup time	Weeks to months (physical provisioning)	Minutes to hours (software config)
Cost	Higher — port hours + data transfer	Lower — per-hour + data transfer
Encryption	Not encrypted by default — add IPsec or MACsec	Encrypted by default (IPsec)
Use cases	Large data migration, consistent throughput, hybrid cloud, compliance	Quick setup, redundant backup, lower cost

Direct Connect Gateway

Connect one Direct Connect circuit to multiple VPCs across regions and accounts. Avoids needing a separate circuit per VPC or region.

Resilient DX Pattern

Primary: Direct Connect for performance. Backup: Site-to-Site VPN over internet. Failover is automatic if DX fails. Best practice for production hybrid connectivity.

"Consistent, high-bandwidth connectivity to on-premises" → Direct Connect. "Fastest setup for hybrid connectivity" → Site-to-Site VPN. "Backup for Direct Connect" → Site-to-Site VPN. Direct Connect is NOT encrypted — add MACsec or an IPsec VPN over the DX connection for encryption.

Direct Connect vs VPN is one of the most tested hybrid connectivity comparisons on the exam. Direct Connect (DX): a dedicated physical circuit from your data center to an AWS Direct Connect location. Data never traverses the public internet, providing consistent bandwidth and predictable latency. Perfect for large data transfers (petabytes) or workloads requiring deterministic network performance. Lead time is weeks to months for physical provisioning. Site-to-Site VPN: configured in minutes. Creates an encrypted tunnel over the public internet. Bandwidth is limited by your internet connection and the IPsec overhead. Latency varies with internet conditions. Cheap and fast to set up. Best practice: use both. DX for primary connectivity (performance), VPN as backup (if DX circuit fails). This is the resilient hybrid architecture pattern. Direct Connect encryption: the physical circuit is not encrypted by default. For data-in-transit compliance, enable MACsec (Layer 2 encryption on the physical connection) or run an IPsec VPN tunnel over the DX connection. Direct Connect Gateway: a single DX connection can reach VPCs in multiple regions through DX Gateway — you don't need a separate circuit per region.

Task 3.4 — Elastic Architectures

Designing Elastic Solutions Across All Tiers

Web · App · Database · Caching — scaling together

Web Tier CloudFront + WAF at edge. ALB distributes across AZs. EC2 ASG (target tracking on ALB request count) or Fargate tasks auto-scale. Static assets served from S3 via CloudFront — zero compute cost at scale.

Application Tier EC2 ASG or ECS/EKS with auto scaling on CPU/memory/custom metrics. Lambda for event-driven processing — scales from 0 to thousands instantly. SQS queue depth as a scaling metric for async worker fleets.

Database Tier Aurora Serverless v2 scales write capacity from 0.5 to 128 ACUs instantly. DynamoDB On-Demand scales reads/writes instantly. RDS Read Replicas offload reads. Aurora Auto Scaling adds/removes read replicas based on CPU or connections.

Caching Tier ElastiCache Redis Cluster Mode distributes data across shards — add shards to scale horizontally. DAX cluster nodes add read capacity for DynamoDB. CloudFront scales caching globally at edge — no management required.

SQS queue depth as a scaling metric is the canonical pattern for auto-scaling async worker fleets — as messages accumulate, scale out workers; as queue drains, scale in. Use ASG target tracking policy with a custom CloudWatch metric.

Elastic architecture means every tier can scale independently based on demand. The exam tests whether you know the scaling mechanisms at each tier. Web tier: The key insight is that static content (HTML, CSS, JS, images) served from S3 via CloudFront scales infinitely with zero EC2 capacity. Only dynamic requests need to hit compute. Application tier: SQS queue depth as a scaling metric is a high-value pattern. The question usually describes a variable workload with a message queue — the answer is to scale workers based on the number of messages in the queue (ApproximateNumberOfMessagesVisible CloudWatch metric). This ensures you have exactly enough workers to drain the queue. Database tier: Aurora Serverless v2 is elastic in a way standard RDS isn't — it scales compute up and down seamlessly in response to load. Aurora Auto Scaling automatically adds and removes Aurora Replicas. DynamoDB On-Demand requires no pre-planning at all. Caching tier: properly sized caches at each tier reduce database load, enabling the database to scale less aggressively. Caching is the highest-leverage performance optimization in most web applications.

Task 3.5 — Purpose-Built Databases

Right Database for the Right Workload

The AWS database portfolio — each engine optimized for a specific access pattern

Relational (RDBMS)

RDS: MySQL, PostgreSQL, Oracle, SQL Server, MariaDB
Aurora: MySQL/PostgreSQL-compatible, 5× performance
ACID transactions, complex queries, structured data
Use for: ERP, CRM, financial systems, traditional apps

Key-Value / Wide Column

DynamoDB: Fully managed NoSQL, single-digit ms, infinite scale
High throughput at low latency, no complex joins
Use for: shopping carts, session state, user profiles, leaderboards, IoT
DAX for microsecond read latency

In-Memory

ElastiCache Redis: Cache, sessions, pub/sub, leaderboards
ElastiCache Memcached: Simple distributed cache
MemoryDB for Redis: Durable in-memory DB (not just cache). Redis API + Multi-AZ durability
Use for: <1ms latency reads, caching DB results, real-time dashboards

Document

DocumentDB: MongoDB-compatible managed document store
JSON documents, flexible schema
Use for: content management, catalogs, user data, mobile backends
Scales storage automatically to 64 TB

Analytics & Search

Redshift: Petabyte-scale data warehouse. Columnar. OLAP.
Redshift Serverless: Auto-scales warehouse capacity
OpenSearch Service: Full-text search, log analytics (Elasticsearch-compatible)
Athena: SQL queries on S3 data — serverless, pay per query

Specialized

Neptune: Graph database — social networks, fraud detection, knowledge graphs
QLDB: Ledger database — immutable, cryptographically verifiable transaction log
Timestream: Time-series database — IoT, DevOps metrics, telemetry
Keyspaces: Managed Apache Cassandra

Task 3.5 tests whether you can match a described workload to the right database engine. Know the trigger phrases for each. "Need to store and query relationships between entities" → Neptune (graph DB). "Regulatory requirement: need immutable, auditable financial transaction log" → QLDB. "Store time-series IoT sensor data efficiently" → Timestream. "Full-text search across millions of documents" → OpenSearch Service. "Complex analytical queries on petabytes of historical data" → Redshift. "Run SQL queries directly on files stored in S3 without loading into a database" → Athena. "Existing application uses MongoDB" → DocumentDB (MongoDB-compatible). "Existing application uses Apache Cassandra" → Keyspaces (Cassandra-compatible). MemoryDB for Redis is commonly confused with ElastiCache Redis. MemoryDB is a primary database with durability — it persists all data across a Multi-AZ cluster using a distributed transaction log. ElastiCache is a cache in front of a database — it's not a source of truth. Use MemoryDB when the Redis data IS the primary database, not just a cache.

Task 3.5 — Decision Guide

Database Selection Decision Tree

Map workload characteristics to the right engine

Complex SQL queries, transactions, structured relational data

→

RDS or Aurora

High-throughput key-value or simple queries at any scale

→

DynamoDB (+ DAX for microsecond reads)

Sub-millisecond reads; caching layer in front of DB

→

ElastiCache Redis or Memcached

Redis as primary durable database (not just cache)

→

MemoryDB for Redis

Full-text search, log analytics, Elasticsearch workloads

→

Amazon OpenSearch Service

Petabyte analytics, complex OLAP queries, data warehouse

→

Amazon Redshift

SQL queries on S3 data without loading into a DB

→

Amazon Athena

Graph relationships (social, fraud, recommendations)

→

Amazon Neptune

IoT sensor data, metrics, time-ordered events

→

Amazon Timestream

Immutable audit log, cryptographic verification

→

Amazon QLDB

This decision tree is the most direct mapping from exam question description to correct answer for Task 3.5. Memorize it. The most frequently missed: Athena vs. Redshift. Athena queries data IN PLACE in S3 using serverless SQL — you pay per query and there's no infrastructure. Redshift requires loading data into a cluster and is for complex, repeated analytical workloads. Use Athena for ad-hoc queries; use Redshift for sustained analytical workloads with complex joins. MemoryDB vs ElastiCache is the other common confusion point — covered in the notes on the previous slide. QLDB: the keyword is "cryptographically verifiable." QLDB maintains a journal of every change, and uses cryptographic hashing to make it impossible to alter historical records without detection. Used for financial ledgers, supply chain records, medical records. Neptune: the keyword is "relationships" — social graphs, recommendation engines, fraud detection networks, knowledge graphs. Standard relational databases are very inefficient at traversing many-hop relationships; graph databases are optimized exactly for this.

Quick Review

Exam Checklist — Domain 3

Can you answer these?

Task 3.1 — Storage & Database Performance

S3 request rate limits per prefix and how to distribute load
S3 Transfer Acceleration vs. Byte-Range Fetches use cases
gp3 (decouple IOPS/size) vs. io2 (ultra-high IOPS) vs. st1 (throughput HDD)
DynamoDB On-Demand vs. Provisioned capacity mode trade-offs
Hot partition problem: cause, detection, and fix (key design / sharding)
DAX: write-through cache, microsecond reads, doesn't help strong consistency

Task 3.2 — Compute Performance

EC2 instance families: C (compute), R/X (memory), I/D (storage), P/G (GPU)
Placement Groups: Cluster (perf) vs. Spread (HA, 7-instance limit) vs. Partition (large distributed)
Lambda: Provisioned Concurrency eliminates cold starts; memory ∝ CPU
T3 Unlimited for sustained burst; Reserved Concurrency caps Lambda executions

Task 3.3 — Networking Performance

ENA = Enhanced Networking (SR-IOV, up to 100 Gbps, reduced latency)
EFA = ENA + OS bypass for HPC/MPI workloads
CloudFront cache hit ratio: minimize cache key fragmentation
Origin Shield reduces origin requests by centralizing cache misses
Direct Connect (dedicated, consistent, not encrypted) vs. VPN (quick, encrypted, variable)
DX + VPN backup = resilient hybrid connectivity best practice

Tasks 3.4 & 3.5 — Elastic & Purpose-Built DB

SQS queue depth as auto scaling metric for worker fleets
Aurora Serverless v2 and DynamoDB On-Demand for elastic database tiers
Redshift (OLAP warehouse) vs. Athena (serverless S3 SQL) vs. OpenSearch (full-text)
Neptune (graph), QLDB (immutable ledger), Timestream (IoT time-series)
MemoryDB (durable Redis primary DB) vs. ElastiCache (cache only)

Quick Reference

Service → Performance Scenario Quick Map

Storage Performance

S3 Transfer Acceleration → global upload speed
S3 Byte-Range Fetches → parallel large downloads
S3 Multipart Upload → parallel large uploads
EBS gp3 → general purpose, decoupled IOPS
EBS io2 → highest IOPS for critical DBs
EBS st1 → high sequential throughput (HDD)

Compute Performance

C family → CPU-bound batch / HPC
R / X family → large in-memory datasets
I / D family → fast local NVMe storage
P / G family → GPU / ML training
Cluster PG → lowest inter-instance latency
Lambda Provisioned Concurrency → no cold starts

Network Performance

ENA → 25–100 Gbps instance networking
EFA → OS-bypass for HPC / MPI / NCCL
CloudFront → global HTTP cache & edge compute
Origin Shield → reduce origin request volume
Direct Connect → dedicated consistent bandwidth
Global Accelerator → static IP + AWS backbone routing

Database Performance

DAX → microsecond DynamoDB reads
ElastiCache Redis → sub-ms app-level caching
MemoryDB → durable primary Redis DB
RDS Read Replicas → scale SQL read workloads
Aurora → 15 replicas, faster failover, auto-storage
Aurora Serverless v2 → elastic write capacity

Analytics & Search

Redshift → petabyte OLAP warehouse
Athena → serverless SQL on S3
OpenSearch → full-text search & log analytics
Timestream → IoT / metrics time-series
Glue → ETL pipeline for data lake
EMR → Hadoop / Spark big data processing

Specialized Workloads

Neptune → graph relationships
QLDB → immutable audit ledger
DocumentDB → MongoDB-compatible documents
Keyspaces → Apache Cassandra-compatible
EFA + Cluster PG → HPC tightly-coupled MPI
FSx for Lustre → HPC parallel file system

This reference card maps every major Domain 3 scenario to its answer service. Use it for final review. The cells in this table correspond directly to how exam questions are categorized. When you read a question, identify the workload type first, then find the matching service. Three services that appear across multiple categories and are especially important: 1. ElastiCache Redis — appears in caching, session management, real-time analytics, pub/sub, and leaderboard scenarios. 2. CloudFront — appears in performance (caching), security (WAF integration), and cost optimization scenarios. 3. Aurora — appears in performance (read replicas, fast failover), resilience (Global Database), and elasticity (Serverless v2) scenarios. Knowing these versatile services deeply pays off across multiple domain questions.

Design High-Performing
Architectures

What You Need to Know

High-Performing Storage & Databases

S3 Performance Optimization

EBS Volume Types & Performance

DynamoDB Performance Optimization

High-Performing Compute

EC2 Instance Families

EC2 Placement Groups

Lambda Performance Optimization

High-Performing Networking

Enhanced Networking & Elastic Fabric Adapter

CloudFront Performance Configuration

AWS Direct Connect vs. Site-to-Site VPN

Elastic Solutions & Purpose-Built Databases

Designing Elastic Solutions Across All Tiers

Right Database for the Right Workload

Database Selection Decision Tree

Exam Checklist — Domain 3

Service → Performance Scenario Quick Map

You're ready for Domain 3

Design High-PerformingArchitectures

What You Need to Know

High-Performing Storage & Databases

S3 Performance Optimization

EBS Volume Types & Performance

DynamoDB Performance Optimization

High-Performing Compute

EC2 Instance Families

EC2 Placement Groups

Lambda Performance Optimization

High-Performing Networking

Enhanced Networking & Elastic Fabric Adapter

CloudFront Performance Configuration

AWS Direct Connect vs. Site-to-Site VPN

Elastic Solutions & Purpose-Built Databases

Designing Elastic Solutions Across All Tiers

Right Database for the Right Workload

Database Selection Decision Tree

Exam Checklist — Domain 3

Service → Performance Scenario Quick Map

You're ready for Domain 3

Design High-Performing
Architectures