Optional: Archive tiers (90 and 180 days of inactivity)
Small monthly monitoring fee per object (~$0.0025/1,000 objects)
Best for: data lakes, user uploads, logs — when access pattern is unknown or variable
No retrieval fees between Frequent/Infrequent tiers
When to Use Each Tier
Standard: Accessed daily/weekly. No minimum duration.
Standard-IA: Monthly backups, DR copies. ≥128 KB objects. 30-day min.
One Zone-IA: Re-creatable data only (thumbnails). 20% cheaper than Standard-IA. Single AZ risk.
Glacier Instant: Medical images, media archives. Accessed ~quarterly. 90-day min.
Glacier Flexible: Backups. 3–5 hour retrieval OK. 90-day min.
Deep Archive: 7–10 year compliance archive. 12–48h retrieval. 180-day min.
Standard-IA and Glacier tiers charge per-GB retrieval fees — if data is accessed frequently, the retrieval fees exceed the storage savings. Always calculate before choosing IA tiers for hot data.
"Unknown access pattern" → S3 Intelligent-Tiering. "Log files that must be retained 7 years, rarely accessed" → Lifecycle to Deep Archive. "Application backups stored for 30 days, then deleted" → Lifecycle expiration rule. Standard-IA has 30-day minimum storage charge — don't use for short-lived objects.
Cost Allocation Tags: Tag buckets and volumes by team/project for chargeback
"EC2 instances terminated but costs still high" → check for unattached EBS volumes and old EBS snapshots. "Shared file system costs too high" → enable EFS Intelligent-Tiering or switch to EFS One Zone for dev environments. gp3 costs the same as gp2 but gives more control — always prefer gp3.
Pay per second (Linux) or per hour (Windows). No commitment. Most flexible, highest cost. Use for: short-term, unpredictable workloads; development/testing; new applications being sized. Baseline price — no discount.
Reserved Instances (RI)
1- or 3-year commitment to a specific instance type, OS, and region. Up to 72% discount vs On-Demand. Standard RI: locked to instance type. Convertible RI: can change family/OS/tenancy (up to 66% discount). Payment options: All Upfront (max discount), Partial Upfront, No Upfront.
Savings Plans
Flexible commitment to a $/hour spend level. Compute SP: applies to EC2 (any family/region), Lambda, Fargate — up to 66% discount. EC2 Instance SP: specific instance family + region — up to 72% discount. Recommended over RIs for most use cases due to flexibility.
Spot Instances
Bid on spare EC2 capacity. Up to 90% discount vs On-Demand. AWS can reclaim with 2-minute warning. Use for: fault-tolerant, stateless, flexible start/end time workloads — batch jobs, data processing, rendering, CI/CD, containerized tasks.
Dedicated Hosts / Instances
Physical server dedicated to you. Dedicated Host: you control socket/core placement, bring your own license (BYOL) for SQL Server, Oracle, Windows. Most expensive option. Use only when licensing or compliance requires physical isolation.
"Steady-state 24/7 workload, 1-year plan" → Reserved Instance or Savings Plan. "Batch jobs that can be interrupted" → Spot Instances. "Need to bring SQL Server license from on-premises" → Dedicated Host. "Mixed fleet flexibility across families and regions" → Compute Savings Plan over Standard RI.
Task 4.2 — Commitment Discounts
Savings Plans vs. Reserved Instances
Choosing the right commitment model for long-running workloads
Exact instance type known long-term; want max discount
Savings Plans are recommended over RIs for most new workloads due to flexibility. Use Standard RI only when the exact instance type is known for years and you want to use the RI Marketplace. Compute Savings Plan covers Lambda and Fargate — RIs do not.
Task 4.2 — Spot Instances
Spot Instances — Maximum Savings for Flexible Workloads
Up to 90% discount · Interruption handling · Spot Fleet strategies
Spot — Good Fit ✅
Batch data processing (ETL, analytics, log processing)
Image / video rendering and transcoding
CI/CD build and test pipelines
Containerized microservices (stateless)
HPC / scientific simulation workloads
Machine learning model training
Background workers behind an SQS queue
Dev / test environments (acceptable downtime)
Spot — Poor Fit ❌
Stateful databases (RDS, production databases)
Production web servers handling live user traffic alone
Applications with no checkpointing or restart capability
Long-running jobs that cannot be safely interrupted mid-way
Any workload with an SLA requiring high availability
Spot + On-Demand mix: ASG with base capacity On-Demand/Reserved, overflow on Spot
Capacity Rebalancing: Proactively replace Spot instances at elevated interruption risk
"Batch jobs that can be interrupted" → Spot. "Workers behind SQS processing images" → Spot ASG on queue depth. The pattern for resilient Spot is: diversify across instance types + AZs, handle the 2-minute warning, and use SQS/checkpointing so interrupted work can resume. Never use Spot as the only compute for user-facing production services.
Task 4.2 — Right-Sizing & Architecture
Right-Sizing, Graviton & Serverless Cost Models
Compute Optimizer · ARM instances · Lambda pay-per-use
AWS Compute Optimizer
Analyzes CloudWatch metrics to identify over-provisioned resources
Up to 40% better price/performance vs equivalent Intel/AMD x86 instances
Works with: Linux workloads, containerized apps, JVM, Python, Node, Go
Not compatible with: Windows, some legacy binaries requiring x86
Supported for Lambda (arm64 architecture — 20% cheaper than x86)
Serverless Cost Model
Lambda: Pay per invocation ($0.20/1M) + duration ($0.0000166667/GB-sec). Zero cost when idle. Scales to zero automatically.
Fargate: Pay per vCPU-hour + GB-hour per task. No idle cluster cost.
vs. EC2: Serverless eliminates cost of idle compute. Break-even depends on utilization — EC2 RI is cheaper at sustained high utilization; serverless is cheaper at low/variable utilization.
"Identify underutilized EC2 instances" → AWS Compute Optimizer or Trusted Advisor. "20–40% cost reduction without code changes" → migrate to Graviton (ARM) instances. "Workload runs only occasionally, not 24/7" → Lambda or Fargate eliminates idle compute cost vs. always-on EC2.
4.3
Cost-Effective Database Solutions
RDS Reserved · Aurora Serverless · DynamoDB Cost Modes · Caching to Reduce DB Spend
RDS Reserved Instances: 1- or 3-year commitment on RDS instance class. Up to 69% discount vs. On-Demand. Apply to Multi-AZ deployments too.
Right-size instance class: Use Performance Insights + CloudWatch to find underutilized RDS instances. CPU <40% and low connections → downsize.
Stop dev/test instances: RDS can be stopped for up to 7 days — saves compute cost while retaining storage.
Aurora vs. RDS MySQL: Aurora storage is auto-scaled and billed per GB used; RDS requires pre-allocated storage. For large DBs with variable growth, Aurora storage billing can be more cost-effective.
Single-AZ for dev/test: Multi-AZ doubles instance cost. Use Single-AZ for non-production.
Aurora Serverless v2
Scales from 0.5 to 128 Aurora Capacity Units (ACUs) instantly
Billed per ACU-second — no cost for idle capacity between requests
Can auto-pause to 0 when idle (dev/test) — zero compute cost
Minimum ACU billing prevents true zero cost in production
DynamoDB: On-Demand vs. Provisioned
On-Demand: Pay per request. No capacity planning. Higher per-RCU/WCU cost but zero idle cost. Best for unpredictable traffic.
Provisioned + Auto Scaling: Set min/max RCUs/WCUs; auto scaling adjusts. Cheaper at sustained, predictable load. Best for consistent traffic patterns.
Reserved Capacity: Commit to provisioned throughput for 1–3 years. Up to 76% discount. Use for stable read/write-heavy tables.
"RDS running 24/7 for production, stable load" → RDS Reserved Instance. "Aurora database for a new SaaS app with unknown initial traffic" → Aurora Serverless v2. "DynamoDB table serving consistent high traffic" → Provisioned capacity + Reserved Capacity for max savings. Add ElastiCache in front of any read-heavy RDS/Aurora to dramatically reduce DB instance sizing.
4.4 · 4.5
Network Cost Optimization & Managed Services
Data Transfer Pricing · VPC Endpoints · CloudFront · Managed Service TCO
Task 4.4 — Network Costs
Data Transfer Cost Patterns
What's free · What costs money · How to reduce it
Free Data Transfer $0
Inbound to AWS from the internet (ingress is always free)
Within the same AZ (same AZ, same region) using private IPs
S3 → CloudFront (origin fetch from S3 is free)
EC2 ↔ S3 in same region (via internet endpoint or Gateway Endpoint)
Between services in the same region using Gateway Endpoints (S3, DynamoDB)
Direct Connect data-in from on-premises to AWS
Charged Data Transfer $$
Outbound from AWS to the internet (egress) — charged per GB
Cross-AZ traffic within the same region — charged both ways
Cross-region data transfer — charged per GB
NAT Gateway processing — charged per GB processed + hourly
VPC Peering cross-region — charged per GB
Direct Connect data-out from AWS to on-premises — charged per GB
VPC Endpoint Cost Savings
Gateway Endpoints (S3, DynamoDB): Free. Eliminates NAT Gateway processing charges for traffic to S3/DynamoDB from private subnets — potentially hundreds of dollars/month saved.
Interface Endpoints: Hourly charge per AZ + per GB. Cheaper than routing through NAT Gateway for high-volume service traffic.
CloudFront to Reduce Egress
CloudFront → internet egress is priced lower than direct EC2/S3 → internet
S3 → CloudFront origin fetch is free — only egress from CloudFront to users is charged
Cache hit ratio: every cache hit eliminates both origin compute and egress cost
Price Classes: restrict CloudFront to cheaper edge regions (e.g., North America only) if users are concentrated geographically
"EC2 in private subnet accessing S3 — high NAT Gateway costs" → Add S3 Gateway VPC Endpoint (free) to bypass NAT Gateway. "High internet egress costs for static assets" → Put CloudFront in front — cheaper egress rates + cache hits eliminate repeat egress. Cross-AZ traffic costs money — consolidate into fewer AZs only if HA requirements allow.
Task 4.5 — Managed Services
Managed Services vs. Self-Managed TCO
Reducing operational overhead as a cost optimization strategy
Amazon RDS vs. EC2 + MySQLRDS manages patching, backups, Multi-AZ failover, and parameter tuning. Self-managed MySQL on EC2 requires DBAs for these tasks. Even at higher sticker cost, RDS often has lower TCO when engineering time is valued.
AWS Lambda vs. EC2 workersLambda eliminates idle compute cost and all server management. No OS patching, capacity planning, or scaling configuration. Engineers focus on business logic. Best for variable, event-driven workloads.
Amazon ECS/Fargate vs. self-managed KubernetesFargate eliminates EC2 cluster management for containers. EKS with managed node groups reduces Kubernetes control plane overhead. Fargate has higher per-vCPU cost but zero cluster management cost — compare against team time to manage clusters.
Amazon OpenSearch vs. self-managed ElasticsearchOpenSearch Service handles cluster provisioning, patching, snapshots, and scaling. Self-managed Elasticsearch requires dedicated DevOps effort. Managed service pricing often justified by elimination of operational overhead.
AWS Cost Management Tools
AWS Cost Explorer: Visualize spend over time; forecast future costs; identify top cost drivers by service, region, tag
AWS Budgets: Set cost/usage/RI/Savings Plan thresholds; SNS alerts when exceeded
Cost Allocation Tags: Tag resources by team, project, environment; enable in billing console for per-tag cost breakdown
Compute Optimizer: Right-sizing recommendations for EC2, Lambda, ECS, EBS
Billing Alarms: CloudWatch alarm on EstimatedCharges metric
Cost Optimization Mindset
Adopt Cloud Financial Management (CFM) practices
Treat cost as a non-functional requirement
Right-size before reserving capacity
Use spot/serverless before reserving
Monitor continuously — cost drifts over time
"Reduce operational overhead" → managed service answer (RDS over EC2+MySQL, Fargate over self-managed Kubernetes, Lambda over EC2 workers). "Alert when monthly spend exceeds $1,000" → AWS Budgets + SNS. "Identify which team is spending the most" → Cost Allocation Tags + Cost Explorer.
Domain 4 — Decision Guide
Cost Optimization Scenario Decision Tree
Map the requirement to the cost-optimal solution
Steady-state EC2 workload running 24/7, 1–3 year horizon
→
Savings Plan or Reserved Instance
Batch / background jobs that can be interrupted
→
Spot Instances (up to 90% savings)
Event-driven, infrequent, or variable compute workload
→
AWS Lambda (pay only for invocations)
EC2 instances appear oversized / CPU consistently low
→
AWS Compute Optimizer → right-size to smaller type