AWS Certified AI Practitioner · AIF-C01

Fundamentals of
AI & Machine Learning

Domain 1 — Comprehensive Study Guide
Task Statements 1.1 · 1.2 · 1.3

20% of Exam Score

Domain 1 OverviewWhat You Need to Know

Task 1.1
  • Core AI/ML terminology
  • AI vs ML vs GenAI vs Deep Learning
  • Inferencing types
  • Data types & learning paradigms
Task 1.2
  • When to use AI/ML
  • Technique selection
  • Real-world applications
  • AWS managed AI services
  • Traditional ML vs FMs
Task 1.3
  • ML pipeline components
  • FM sources & deployment
  • MLOps fundamentals
  • Performance metrics
📋 Exam Weight

Domain 1 is 20% of scored content — approximately 13–14 questions on a 65-question exam. Strong on definitions and use-case selection.

1.1

Basic AI Concepts
& Terminology

Definitions · Comparisons · Data Types · Learning Paradigms

Task 1.1 — DefinitionsThe AI Vocabulary Stack

Artificial Intelligence Broad field enabling machines to simulate human-like intelligence and decision-making.
Machine Learning Subset of AI; systems learn patterns from data without being explicitly programmed.
Deep Learning Subset of ML using multi-layered neural networks to learn hierarchical representations.
Neural Network Interconnected layers of nodes (neurons) that transform inputs into outputs via learned weights.
Generative AI AI that creates new content (text, images, code, audio) by learning patterns from training data.
Large Language Model A GenAI model trained on massive text corpora; understands and generates human language.
Agentic AI AI that autonomously plans, uses tools, and executes multi-step tasks toward a goal.

Task 1.1 — DefinitionsModel Concepts & Quality Terms

Model A mathematical function trained on data to map inputs to predictions or outputs.
Algorithm The procedure or set of rules a model uses to learn from data (e.g. gradient descent).
Training The process of fitting model parameters to a dataset by minimising a loss function.
Inferencing Using a trained model to generate predictions on new, unseen data.
Bias Systematic error in predictions; can stem from skewed training data or model assumptions.
Fairness The property of a model producing equitable outcomes across different demographic groups.
Overfitting / Underfitting Overfit: memorises training data, fails on new data. Underfit: too simple, misses patterns.

Task 1.1 — ComparisonsThe AI Hierarchy

ARTIFICIAL INTELLIGENCE
MACHINE LEARNING
DEEP LEARNING
GENERATIVE AI · LLMs

Key distinctions

  • AI — any technique making machines "smart"
  • ML — learns from data without explicit rules
  • Deep Learning — uses neural nets with many layers
  • GenAI — creates new content; powered by DL
  • Agentic AI — autonomous multi-step task execution
⚡ Exam Note

All deep learning is ML, all ML is AI — but not all AI is ML, and not all ML is deep learning.

Task 1.1 — InferencingTypes of Inferencing

Type How it works Best for
Real-time Synchronous — request waits for immediate response Chatbots, fraud detection, live recommendations
Batch Processes large datasets on a schedule, no live user Monthly reports, offline scoring, bulk predictions
Asynchronous Request submitted; result retrieved later via callback/poll Long-running jobs, document processing, video analysis
Serverless Scales to zero; compute spun up on demand Sporadic/unpredictable traffic, cost-optimised inference
⚡ Exam Note

SageMaker supports all four modes. Serverless inference is cost-effective for low/variable traffic; real-time endpoints suit latency-sensitive apps.

Task 1.1 — DataData Types in AI/ML

By labelling

Labeled Data

Each example has a known output/target. Required for supervised learning. More expensive to produce.

Unlabeled Data

No output tag. Used in unsupervised and self-supervised learning. Abundant and cheaper.

By structure & modality

Tabular

Rows & columns (CSV, SQL). Classic ML territory.

Time-series

Ordered by time; forecasting, anomaly detection.

Image

Pixel arrays. Powers computer vision tasks.

Text

Sequences of tokens. NLP, LLMs, classification.

Structured

Defined schema. Databases, spreadsheets.

Unstructured

No schema. PDFs, audio, video, social posts.

Task 1.1 — LearningTypes of ML Learning

Supervised Learning
  • Uses labeled data
  • Learns input → output mapping
  • Tasks: classification, regression
  • Examples: spam filter, price prediction
Unsupervised Learning
  • Uses unlabeled data
  • Discovers hidden structure
  • Tasks: clustering, dimensionality reduction
  • Examples: customer segmentation, anomaly detection
Reinforcement Learning
  • Agent learns via reward signals
  • Trial-and-error in an environment
  • Tasks: game playing, robotics, RLHF
  • Examples: AlphaGo, chatbot alignment (RLHF)
⚡ Exam Note

RLHF (Reinforcement Learning from Human Feedback) is used to fine-tune LLMs. Know that supervised needs labels; unsupervised doesn't.

1.2

Practical Use Cases
for AI

When to Use · Technique Selection · AWS Services · FM vs Traditional ML

Task 1.2 — ApplicabilityWhen to Use (and Not Use) AI/ML

✅ Use AI/ML When…
  • Pattern recognition at scale is needed
  • Human decision augmentation required
  • Solution must scale beyond human capacity
  • Automating repetitive cognitive tasks
  • Problem has lots of historical data
  • Exact rules are unknown or too complex
🚫 Avoid AI/ML When…
  • A deterministic/exact output is required
  • Insufficient training data exists
  • Rules are simple enough to code directly
  • Full explainability is mandated (regulatory)
  • Cost of building > value delivered (ROI negative)
  • Low error tolerance with life-critical outcomes
⚡ Exam Note

If the question says "a specific, deterministic outcome is needed" — the answer is likely NOT to use ML.

Task 1.2 — TechniquesSelecting the Right ML Technique

TechniqueOutputUse WhenExample
Regression Continuous number Predicting a quantity House price, sales forecast
Classification Category / class Assigning a label Spam/not spam, image labeling
Clustering Groups (unlabeled) Finding natural groupings Customer segments, document topics
Recommendation Ranked items Personalised suggestions Product rec, content discovery
Forecasting Future value (time) Predicting time-series data Demand planning, stock prices
Anomaly Detection Normal / anomaly flag Finding outliers Fraud detection, network intrusion

Task 1.2 — ApplicationsReal-World AI Use Cases

Computer Vision
  • Object detection & recognition
  • Medical image analysis
  • Quality control in manufacturing
  • Autonomous vehicles
NLP / Speech
  • Sentiment analysis
  • Machine translation
  • Speech-to-text transcription
  • Chatbots & virtual assistants
Recommendation
  • E-commerce product suggestions
  • Streaming content recommendations
  • Personalised ads targeting
Fraud Detection
  • Real-time transaction scoring
  • Identity verification
  • Anomaly flagging in access logs
Forecasting
  • Demand & inventory planning
  • Energy load prediction
  • Financial market modelling
Knowledge Bases & Agentic
  • RAG-powered Q&A systems
  • Autonomous research agents
  • Multi-step workflow automation

Task 1.2 — AWS ServicesAWS Managed AI/ML Services

SageMaker AI End-to-end ML platform for building, training, and deploying models at scale.
Amazon Bedrock Fully managed API access to foundation models (Anthropic, Meta, Mistral, Amazon Titan, etc.).
Amazon Transcribe Automatic speech recognition (ASR) — converts audio/video to text.
Amazon Translate Neural machine translation between 75+ languages.
Amazon Comprehend NLP service — entity recognition, sentiment, key phrases, PII detection.
Amazon Lex Builds conversational chatbots with ASR + NLU (powers Alexa).
Amazon Polly Text-to-speech (TTS) — lifelike voice synthesis in multiple languages.
Amazon Q GenAI-powered assistant for business productivity, code, and analytics.

Task 1.2 — Model SelectionTraditional ML vs. Foundation Models

DimensionTraditional MLFoundation Models (FMs)
Training data Task-specific labeled dataset Massive multi-domain corpus (pre-trained)
Explainability Higher (e.g. decision trees, SHAP) Lower — "black box" at scale
Customisation Trained end-to-end per task Fine-tuning, RAG, prompt engineering
Regulatory fit Better for strict auditability Needs extra work for compliance
Operational cost Lower inference cost Higher compute; managed APIs offset this
Use when… Explainability required, structured data, regulated domain Generative tasks, low labelled data, multi-modal output
1.3

The AI/ML
Development Lifecycle

Pipeline · FM Sources · Deployment · MLOps · Metrics

Task 1.3 — PipelineThe ML Development Pipeline

01
Business
Problem
Define goal, success metrics, feasibility
02
Data
Collection
S3, databases, APIs, labeling tools
03
Data
Preparation
Cleaning, feature engineering, splitting
04
Model
Training
Algorithm selection, hyperparameter tuning
05
Model
Evaluation
Accuracy, precision, recall, F1, ROC-AUC
06
Deployment
Real-time / batch / serverless endpoint
07
Monitor
& Retrain
Drift detection, re-training triggers
SageMaker AI

Training, tuning, hosting, pipelines, Feature Store, Data Wrangler

Amazon Bedrock

FM access, fine-tuning, knowledge bases, agents

Amazon Q / Kiro

GenAI developer productivity tools within AWS

Task 1.3 — FM SourcesFoundation Model Sources & Deployment

FM Sources

Open Source Pre-trained Models
  • Hugging Face Hub, Meta Llama, Mistral
  • Available for download and self-hosting
  • Full control; requires infra management
Training Custom Models
  • Built from scratch on proprietary data
  • Maximum customisation; very high cost
  • Use SageMaker Training + large GPU clusters

Deployment Methods

Managed API Service
  • Amazon Bedrock — no infra management
  • Pay-per-token pricing model
  • Fastest path to production
Self-Hosted API
  • SageMaker Endpoint or EC2 + container
  • Full control over latency & cost
  • Required for data residency / air-gap

Task 1.3 — MLOpsML Operations (MLOps) Fundamentals

Experimentation

Track experiments with metrics, parameters, and artifacts. SageMaker Experiments automates this.

Repeatable Processes

CI/CD pipelines for ML. Reproducible training runs with version-controlled code and data.

Scalable Systems

Auto-scaling endpoints; distributed training; Feature Store for shared feature reuse.

Technical Debt

Poorly managed ML pipelines accumulate debt fast — unversioned models, undocumented features, manual steps.

Production Readiness

Shadow deployments, canary releases, A/B testing before full rollout.

Model Monitoring & Retraining

Monitor for data drift, concept drift, and performance degradation. Trigger retraining automatically.

⚡ Exam Note

SageMaker Model Monitor detects drift. SageMaker Pipelines provides the CI/CD for ML. Retraining is needed when data distribution changes (data drift) or the relationship between inputs and outputs changes (concept drift).

Task 1.3 — MetricsModel Performance Metrics

Accuracy
Correct / Total
Overall correctness. Misleading with imbalanced classes.
Precision
TP / (TP + FP)
Of predicted positives, how many are real? Optimise when false positives are costly.
Recall
TP / (TP + FN)
Of actual positives, how many did we catch? Optimise when false negatives are costly.
F1 Score
2 × P × R / (P + R)
Harmonic mean of Precision & Recall. Best for imbalanced datasets.
When to use each
  • Fraud detection — prioritise Recall (catch all fraud)
  • Spam filter — prioritise Precision (avoid false positives)
  • Balanced dataset — Accuracy is meaningful
Business Metrics
  • Cost per user · Development cost
  • Return on Investment (ROI)
  • Customer feedback / satisfaction (CSAT/NPS)
  • Time-to-market, model retraining frequency

Quick Review &
Exam Checklist

Domain 1 · Key Points to Lock In

Exam ChecklistCan You Answer These?

Task 1.1 — Must Know
  • Define AI, ML, DL, GenAI, LLM, Agentic AI
  • Hierarchy: AI ⊃ ML ⊃ DL ⊃ GenAI
  • Batch vs. real-time vs. async inferencing
  • Labeled vs. unlabeled data
  • Supervised vs. unsupervised vs. RL
Task 1.2 — Must Know
  • When NOT to use ML (deterministic/exact outcomes)
  • Regression → number; classification → category; clustering → groups
  • AWS service → task mapping (Transcribe, Translate, Comprehend, Lex, Polly)
  • FM vs traditional ML tradeoffs (explainability, cost, regulatory)
Task 1.3 — Must Know
  • 7-step ML pipeline order
  • Managed API (Bedrock) vs self-hosted (SageMaker)
  • Data drift vs. concept drift
  • Precision/Recall tradeoff and when to use F1
  • Business metrics: ROI, cost per user, customer feedback
AWS Services — Quick Map
  • SageMaker AI → full ML lifecycle
  • Bedrock → FM API access + agents
  • Amazon Q → GenAI productivity assistant
  • Transcribe → speech-to-text
  • Comprehend → NLP analytics
  • Lex → chatbot builder · Polly → TTS
Domain 1 Complete

You're ready for
Domain 1

20% of AIF-C01 · Fundamentals of AI & ML
Good luck on the exam!

Task 1.1 — Terminology
Task 1.2 — Use Cases
Task 1.3 — Lifecycle