Domain 4 — Comprehensive Study Guide
Task Statements 4.1 · 4.2
Domain 4 is 14% of scored content — approximately 9 questions. Questions focus on recognising responsible AI features, picking the right AWS tool for bias/transparency, and understanding legal risks.
Pillars · Tools · Legal Risks · Datasets · Bias & Variance · Detection
Know all six pillars by name. Questions may describe a scenario (e.g. "model performs worse for one ethnic group") and ask which pillar is violated — that's fairness / bias. "Model generates false medical advice" → veracity / safety.
Sustainability questions may ask about choosing smaller models, using managed services to reduce idle compute, or selecting AWS regions with renewable energy. Always consider environmental cost alongside performance cost.
GenAI models trained on copyrighted content may reproduce protected text, code, or images. Risk of infringement claims from original creators. Mitigation: use models with clear training data provenance and IP indemnification policies.
Discriminatory outputs can violate anti-discrimination laws (e.g. Fair Housing Act, Equal Credit Opportunity Act). AI systems making hiring, lending, or housing decisions face regulatory scrutiny.
False outputs presented as fact — especially harmful in medical, legal, or financial contexts. May constitute misrepresentation or negligence. Mitigation: RAG, grounding, human review for high-stakes decisions.
Data privacy breaches, exposure of PII in outputs, or manipulative AI behaviour erode user trust and may violate GDPR/CCPA. Users may be harmed by over-reliance on AI advice without human oversight.
Four key legal risk categories: IP infringement, biased outputs, hallucinations, and privacy/trust. For each, know the mitigation: IP → licensing/indemnification; bias → Clarify + subgroup analysis; hallucinations → RAG + Guardrails; privacy → PII redaction + A2I.
Dataset represents all relevant demographic groups — age, gender, ethnicity, language, geography. Gaps lead to disparate model performance.
Wide variety of examples, styles, and scenarios. A dataset that's technically balanced but narrow in scope still produces brittle models.
Data comes from verified, licensed, and high-quality sources. Unvetted web scrapes can embed noise, bias, and copyright issues.
Avoids majority-class dominance. Imbalanced datasets cause models to under-perform on minority classes — a common source of fairness failures.
Distinguish statistical bias (underfitting/error) from data/societal bias (unfair outcomes for groups). The exam tests both. SageMaker Clarify handles data bias; model architecture choices handle statistical bias/variance.
| Tool | Purpose | When to Use |
|---|---|---|
| SageMaker Clarify | Bias detection in data & predictions; SHAP explainability | Pre-training data audit, post-training fairness evaluation, explainability reports |
| SageMaker Model Monitor | Live endpoint monitoring for data drift, model drift, bias drift | Production models — alert when real traffic diverges from training baseline |
| Amazon A2I | Human-in-the-loop review for low-confidence predictions | Safety-critical decisions; content moderation; cases requiring human judgment |
| Bedrock Guardrails | FM output safety: content filter, PII, grounding, topic controls | GenAI applications requiring policy enforcement at inference time |
| Label Quality Analysis | Validate annotator consistency; detect labeling errors | Before fine-tuning; after importing third-party labeled data |
| Human Audits | Expert review of model outputs for systematic errors or bias patterns | Periodic compliance reviews; regulated domains; high-stakes deployment |
| Subgroup Analysis | Compute metrics (accuracy, FPR, recall) per demographic subgroup | Fairness assessment for any model making decisions affecting people |
Transparency Spectrum · AWS Tools · Safety Tradeoffs · Human-Centred Design
The most accurate models (LLMs, deep nets) are the least interpretable. Choosing a simpler, explainable model may mean accepting lower accuracy — a real business tradeoff.
Publishing model weights enables inspection and research but also enables misuse (jailbreaking, fine-tuning for harmful purposes). Closed models reduce misuse risk but limit auditability.
Post-hoc explanation tools (SHAP, LIME) approximate what complex models "think" — they are not exact. A high-fidelity explanation may still not fully capture the model's true decision process.
| Use Case | Interpretability need | Recommended approach |
|---|---|---|
| Credit scoring / lending | High — legal requirement to explain decisions | Logistic regression or XGBoost + SHAP; avoid LLMs |
| Medical diagnosis support | High — clinician must understand reasoning | Explainable model + Clarify + human-in-the-loop (A2I) |
| Customer chatbot | Low — conversational fluency matters more | LLM via Bedrock + Guardrails for safety |
| Fraud detection | Medium — need to investigate flagged cases | XGBoost + SHAP explanations per prediction |
Domain 4 · Key Points to Lock In
Credit / hiring / healthcare → explainable model required (logistic regression, XGBoost + SHAP). LLMs alone are not sufficient where "right to explanation" laws apply.
14% of AIF-C01 · Guidelines for Responsible AI
All four domains covered — good luck on the exam!