AWS Certified AI Practitioner · AIF-C01

Fundamentals of
AI & Machine Learning

Domain 1 — Comprehensive Study Guide
Task Statements 1.1 · 1.2 · 1.3

20% of Exam Score

Domain 1 OverviewWhat You Need to Know

Task 1.1

Core AI/ML terminology
AI vs ML vs GenAI vs Deep Learning
Inferencing types
Data types & learning paradigms

      Task 1.2
      When to use AI/ML
Technique selection
Real-world applications
AWS managed AI services
Traditional ML vs FMs

    

Task 1.3

ML pipeline components
FM sources & deployment
MLOps fundamentals
Performance metrics

📋 Exam Weight

Domain 1 is 20% of scored content — approximately 13–14 questions on a 65-question exam. Strong on definitions and use-case selection.

1.1

Basic AI Concepts
& Terminology

Definitions · Comparisons · Data Types · Learning Paradigms

Task 1.1 — DefinitionsThe AI Vocabulary Stack

Artificial Intelligence Broad field enabling machines to simulate human-like intelligence and decision-making.

Machine Learning Subset of AI; systems learn patterns from data without being explicitly programmed.

Deep Learning Subset of ML using multi-layered neural networks to learn hierarchical representations.

Neural Network Interconnected layers of nodes (neurons) that transform inputs into outputs via learned weights.

Generative AI AI that creates new content (text, images, code, audio) by learning patterns from training data.

Large Language Model A GenAI model trained on massive text corpora; understands and generates human language.

Agentic AI AI that autonomously plans, uses tools, and executes multi-step tasks toward a goal.

Task 1.1 — DefinitionsModel Concepts & Quality Terms

Model A mathematical function trained on data to map inputs to predictions or outputs.

Algorithm The procedure or set of rules a model uses to learn from data (e.g. gradient descent).

Training The process of fitting model parameters to a dataset by minimising a loss function.

Inferencing Using a trained model to generate predictions on new, unseen data.

Bias Systematic error in predictions; can stem from skewed training data or model assumptions.

Fairness The property of a model producing equitable outcomes across different demographic groups.

Overfitting / Underfitting Overfit: memorises training data, fails on new data. Underfit: too simple, misses patterns.

Task 1.1 — ComparisonsThe AI Hierarchy

ARTIFICIAL INTELLIGENCE

MACHINE LEARNING

DEEP LEARNING

GENERATIVE AI · LLMs

Key distinctions

AI — any technique making machines "smart"
ML — learns from data without explicit rules
Deep Learning — uses neural nets with many layers
GenAI — creates new content; powered by DL
Agentic AI — autonomous multi-step task execution

⚡ Exam Note

All deep learning is ML, all ML is AI — but not all AI is ML, and not all ML is deep learning.

Task 1.1 — InferencingTypes of Inferencing

Type	How it works	Best for
Real-time	Synchronous — request waits for immediate response	Chatbots, fraud detection, live recommendations
Batch	Processes large datasets on a schedule, no live user	Monthly reports, offline scoring, bulk predictions
Asynchronous	Request submitted; result retrieved later via callback/poll	Long-running jobs, document processing, video analysis
Serverless	Scales to zero; compute spun up on demand	Sporadic/unpredictable traffic, cost-optimised inference

⚡ Exam Note

SageMaker supports all four modes. Serverless inference is cost-effective for low/variable traffic; real-time endpoints suit latency-sensitive apps.

Task 1.1 — DataData Types in AI/ML

By labelling

Labeled Data

Each example has a known output/target. Required for supervised learning. More expensive to produce.

Unlabeled Data

No output tag. Used in unsupervised and self-supervised learning. Abundant and cheaper.

By structure & modality

Tabular

Rows & columns (CSV, SQL). Classic ML territory.

Time-series

Ordered by time; forecasting, anomaly detection.

Image

Pixel arrays. Powers computer vision tasks.

Text

Sequences of tokens. NLP, LLMs, classification.

Structured

Defined schema. Databases, spreadsheets.

Unstructured

No schema. PDFs, audio, video, social posts.

Task 1.1 — LearningTypes of ML Learning

Supervised Learning

Uses labeled data
Learns input → output mapping
Tasks: classification, regression
Examples: spam filter, price prediction

      Unsupervised Learning
      Uses unlabeled data
Discovers hidden structure
Tasks: clustering, dimensionality reduction
Examples: customer segmentation, anomaly detection

    

Reinforcement Learning

Agent learns via reward signals
Trial-and-error in an environment
Tasks: game playing, robotics, RLHF
Examples: AlphaGo, chatbot alignment (RLHF)

⚡ Exam Note

RLHF (Reinforcement Learning from Human Feedback) is used to fine-tune LLMs. Know that supervised needs labels; unsupervised doesn't.

1.2

Practical Use Cases
for AI

When to Use · Technique Selection · AWS Services · FM vs Traditional ML

Task 1.2 — ApplicabilityWhen to Use (and Not Use) AI/ML

✅ Use AI/ML When…

Pattern recognition at scale is needed
Human decision augmentation required
Solution must scale beyond human capacity
Automating repetitive cognitive tasks
Problem has lots of historical data
Exact rules are unknown or too complex

      🚫 Avoid AI/ML When…
      A deterministic/exact output is required
Insufficient training data exists
Rules are simple enough to code directly
Full explainability is mandated (regulatory)
Cost of building > value delivered (ROI negative)
Low error tolerance with life-critical outcomes

    

⚡ Exam Note

If the question says "a specific, deterministic outcome is needed" — the answer is likely NOT to use ML.

Task 1.2 — TechniquesSelecting the Right ML Technique

Technique	Output	Use When	Example
Regression	Continuous number	Predicting a quantity	House price, sales forecast
Classification	Category / class	Assigning a label	Spam/not spam, image labeling
Clustering	Groups (unlabeled)	Finding natural groupings	Customer segments, document topics
Recommendation	Ranked items	Personalised suggestions	Product rec, content discovery
Forecasting	Future value (time)	Predicting time-series data	Demand planning, stock prices
Anomaly Detection	Normal / anomaly flag	Finding outliers	Fraud detection, network intrusion

Task 1.2 — ApplicationsReal-World AI Use Cases

Computer Vision

Object detection & recognition
Medical image analysis
Quality control in manufacturing
Autonomous vehicles

NLP / Speech

Sentiment analysis
Machine translation
Speech-to-text transcription
Chatbots & virtual assistants

Recommendation

E-commerce product suggestions
Streaming content recommendations
Personalised ads targeting

Fraud Detection

Real-time transaction scoring
Identity verification
Anomaly flagging in access logs

Forecasting

Demand & inventory planning
Energy load prediction
Financial market modelling

Knowledge Bases & Agentic

RAG-powered Q&A systems
Autonomous research agents
Multi-step workflow automation

Task 1.2 — AWS ServicesAWS Managed AI/ML Services

SageMaker AI End-to-end ML platform for building, training, and deploying models at scale.

Amazon Bedrock Fully managed API access to foundation models (Anthropic, Meta, Mistral, Amazon Titan, etc.).

Amazon Transcribe Automatic speech recognition (ASR) — converts audio/video to text.

Amazon Translate Neural machine translation between 75+ languages.

Amazon Comprehend NLP service — entity recognition, sentiment, key phrases, PII detection.

Amazon Lex Builds conversational chatbots with ASR + NLU (powers Alexa).

Amazon Polly Text-to-speech (TTS) — lifelike voice synthesis in multiple languages.

Amazon Q GenAI-powered assistant for business productivity, code, and analytics.

Task 1.2 — Model SelectionTraditional ML vs. Foundation Models

Dimension	Traditional ML	Foundation Models (FMs)
Training data	Task-specific labeled dataset	Massive multi-domain corpus (pre-trained)
Explainability	Higher (e.g. decision trees, SHAP)	Lower — "black box" at scale
Customisation	Trained end-to-end per task	Fine-tuning, RAG, prompt engineering
Regulatory fit	Better for strict auditability	Needs extra work for compliance
Operational cost	Lower inference cost	Higher compute; managed APIs offset this
Use when…	Explainability required, structured data, regulated domain	Generative tasks, low labelled data, multi-modal output

1.3

The AI/ML
Development Lifecycle

Pipeline · FM Sources · Deployment · MLOps · Metrics

Task 1.3 — PipelineThe ML Development Pipeline

Business
Problem

Define goal, success metrics, feasibility

Data
Collection

S3, databases, APIs, labeling tools

Data
Preparation

Cleaning, feature engineering, splitting

Model
Training

Algorithm selection, hyperparameter tuning

Model
Evaluation

Accuracy, precision, recall, F1, ROC-AUC

Deployment

Real-time / batch / serverless endpoint

Monitor
& Retrain

Drift detection, re-training triggers

SageMaker AI

Training, tuning, hosting, pipelines, Feature Store, Data Wrangler

Amazon Bedrock

FM access, fine-tuning, knowledge bases, agents

Amazon Q / Kiro

GenAI developer productivity tools within AWS

Task 1.3 — FM SourcesFoundation Model Sources & Deployment

FM Sources

Open Source Pre-trained Models

Hugging Face Hub, Meta Llama, Mistral
Available for download and self-hosting
Full control; requires infra management

        Training Custom Models
        Built from scratch on proprietary data
Maximum customisation; very high cost
Use SageMaker Training + large GPU clusters

      

Deployment Methods

Managed API Service

Amazon Bedrock — no infra management
Pay-per-token pricing model
Fastest path to production

Self-Hosted API

SageMaker Endpoint or EC2 + container
Full control over latency & cost
Required for data residency / air-gap

Task 1.3 — MLOpsML Operations (MLOps) Fundamentals

Experimentation

Track experiments with metrics, parameters, and artifacts. SageMaker Experiments automates this.

Repeatable Processes

CI/CD pipelines for ML. Reproducible training runs with version-controlled code and data.

Scalable Systems

Auto-scaling endpoints; distributed training; Feature Store for shared feature reuse.

Technical Debt

Poorly managed ML pipelines accumulate debt fast — unversioned models, undocumented features, manual steps.

Production Readiness

Shadow deployments, canary releases, A/B testing before full rollout.

Model Monitoring & Retraining

Monitor for data drift, concept drift, and performance degradation. Trigger retraining automatically.

⚡ Exam Note

SageMaker Model Monitor detects drift. SageMaker Pipelines provides the CI/CD for ML. Retraining is needed when data distribution changes (data drift) or the relationship between inputs and outputs changes (concept drift).

Task 1.3 — MetricsModel Performance Metrics

Accuracy

Correct / Total

Overall correctness. Misleading with imbalanced classes.

Precision

TP / (TP + FP)

Of predicted positives, how many are real? Optimise when false positives are costly.

Recall

TP / (TP + FN)

Of actual positives, how many did we catch? Optimise when false negatives are costly.

F1 Score

2 × P × R / (P + R)

Harmonic mean of Precision & Recall. Best for imbalanced datasets.

When to use each

Fraud detection — prioritise Recall (catch all fraud)
Spam filter — prioritise Precision (avoid false positives)
Balanced dataset — Accuracy is meaningful

      Business Metrics
      Cost per user · Development cost
Return on Investment (ROI)
Customer feedback / satisfaction (CSAT/NPS)
Time-to-market, model retraining frequency

    

✓

Quick Review &
Exam Checklist

Domain 1 · Key Points to Lock In

Exam ChecklistCan You Answer These?

Task 1.1 — Must Know

Define AI, ML, DL, GenAI, LLM, Agentic AI
Hierarchy: AI ⊃ ML ⊃ DL ⊃ GenAI
Batch vs. real-time vs. async inferencing
Labeled vs. unlabeled data
Supervised vs. unsupervised vs. RL

      Task 1.2 — Must Know
      When NOT to use ML (deterministic/exact outcomes)
Regression → number; classification → category; clustering → groups
AWS service → task mapping (Transcribe, Translate, Comprehend, Lex, Polly)
FM vs traditional ML tradeoffs (explainability, cost, regulatory)

    

Task 1.3 — Must Know

7-step ML pipeline order
Managed API (Bedrock) vs self-hosted (SageMaker)
Data drift vs. concept drift
Precision/Recall tradeoff and when to use F1
Business metrics: ROI, cost per user, customer feedback

AWS Services — Quick Map

SageMaker AI → full ML lifecycle
Bedrock → FM API access + agents
Amazon Q → GenAI productivity assistant
Transcribe → speech-to-text
Comprehend → NLP analytics
Lex → chatbot builder · Polly → TTS

Domain 1 Complete

You're ready for
Domain 1

20% of AIF-C01 · Fundamentals of AI & ML
Good luck on the exam!

Task 1.1 — Terminology

Task 1.2 — Use Cases

Task 1.3 — Lifecycle

Fundamentals ofAI & Machine Learning

Domain 1 OverviewWhat You Need to Know

Basic AI Concepts& Terminology

Task 1.1 — DefinitionsThe AI Vocabulary Stack

Task 1.1 — DefinitionsModel Concepts & Quality Terms

Task 1.1 — ComparisonsThe AI Hierarchy

Key distinctions

Task 1.1 — InferencingTypes of Inferencing

Task 1.1 — DataData Types in AI/ML

By labelling

By structure & modality

Task 1.1 — LearningTypes of ML Learning

Practical Use Casesfor AI

Task 1.2 — ApplicabilityWhen to Use (and Not Use) AI/ML

Task 1.2 — TechniquesSelecting the Right ML Technique

Task 1.2 — ApplicationsReal-World AI Use Cases

Task 1.2 — AWS ServicesAWS Managed AI/ML Services

Task 1.2 — Model SelectionTraditional ML vs. Foundation Models

The AI/MLDevelopment Lifecycle

Task 1.3 — PipelineThe ML Development Pipeline

Task 1.3 — FM SourcesFoundation Model Sources & Deployment

FM Sources

Deployment Methods

Task 1.3 — MLOpsML Operations (MLOps) Fundamentals

Task 1.3 — MetricsModel Performance Metrics

Quick Review &Exam Checklist

Exam ChecklistCan You Answer These?

You're ready forDomain 1

Fundamentals of
AI & Machine Learning

Basic AI Concepts
& Terminology

Practical Use Cases
for AI

The AI/ML
Development Lifecycle

Quick Review &
Exam Checklist

You're ready for
Domain 1