Domain 1 — Comprehensive Study Guide
Task Statements 1.1 · 1.2 · 1.3
Domain 1 is 20% of scored content — approximately 13–14 questions on a 65-question exam. Strong on definitions and use-case selection.
Definitions · Comparisons · Data Types · Learning Paradigms
All deep learning is ML, all ML is AI — but not all AI is ML, and not all ML is deep learning.
| Type | How it works | Best for |
|---|---|---|
| Real-time | Synchronous — request waits for immediate response | Chatbots, fraud detection, live recommendations |
| Batch | Processes large datasets on a schedule, no live user | Monthly reports, offline scoring, bulk predictions |
| Asynchronous | Request submitted; result retrieved later via callback/poll | Long-running jobs, document processing, video analysis |
| Serverless | Scales to zero; compute spun up on demand | Sporadic/unpredictable traffic, cost-optimised inference |
SageMaker supports all four modes. Serverless inference is cost-effective for low/variable traffic; real-time endpoints suit latency-sensitive apps.
Each example has a known output/target. Required for supervised learning. More expensive to produce.
No output tag. Used in unsupervised and self-supervised learning. Abundant and cheaper.
Rows & columns (CSV, SQL). Classic ML territory.
Ordered by time; forecasting, anomaly detection.
Pixel arrays. Powers computer vision tasks.
Sequences of tokens. NLP, LLMs, classification.
Defined schema. Databases, spreadsheets.
No schema. PDFs, audio, video, social posts.
RLHF (Reinforcement Learning from Human Feedback) is used to fine-tune LLMs. Know that supervised needs labels; unsupervised doesn't.
When to Use · Technique Selection · AWS Services · FM vs Traditional ML
If the question says "a specific, deterministic outcome is needed" — the answer is likely NOT to use ML.
| Technique | Output | Use When | Example |
|---|---|---|---|
| Regression | Continuous number | Predicting a quantity | House price, sales forecast |
| Classification | Category / class | Assigning a label | Spam/not spam, image labeling |
| Clustering | Groups (unlabeled) | Finding natural groupings | Customer segments, document topics |
| Recommendation | Ranked items | Personalised suggestions | Product rec, content discovery |
| Forecasting | Future value (time) | Predicting time-series data | Demand planning, stock prices |
| Anomaly Detection | Normal / anomaly flag | Finding outliers | Fraud detection, network intrusion |
| Dimension | Traditional ML | Foundation Models (FMs) |
|---|---|---|
| Training data | Task-specific labeled dataset | Massive multi-domain corpus (pre-trained) |
| Explainability | Higher (e.g. decision trees, SHAP) | Lower — "black box" at scale |
| Customisation | Trained end-to-end per task | Fine-tuning, RAG, prompt engineering |
| Regulatory fit | Better for strict auditability | Needs extra work for compliance |
| Operational cost | Lower inference cost | Higher compute; managed APIs offset this |
| Use when… | Explainability required, structured data, regulated domain | Generative tasks, low labelled data, multi-modal output |
Pipeline · FM Sources · Deployment · MLOps · Metrics
Training, tuning, hosting, pipelines, Feature Store, Data Wrangler
FM access, fine-tuning, knowledge bases, agents
GenAI developer productivity tools within AWS
Track experiments with metrics, parameters, and artifacts. SageMaker Experiments automates this.
CI/CD pipelines for ML. Reproducible training runs with version-controlled code and data.
Auto-scaling endpoints; distributed training; Feature Store for shared feature reuse.
Poorly managed ML pipelines accumulate debt fast — unversioned models, undocumented features, manual steps.
Shadow deployments, canary releases, A/B testing before full rollout.
Monitor for data drift, concept drift, and performance degradation. Trigger retraining automatically.
SageMaker Model Monitor detects drift. SageMaker Pipelines provides the CI/CD for ML. Retraining is needed when data distribution changes (data drift) or the relationship between inputs and outputs changes (concept drift).
Domain 1 · Key Points to Lock In
20% of AIF-C01 · Fundamentals of AI & ML
Good luck on the exam!