Leaving presentation mode.

CLD120 Module 15 Knowledge Check

Planning for Disaster

Keyboard Shortcuts

Question 1

Which scenario describes a challenge to data velocity?

This represents a data velocity challenge. The application cannot process the incoming data fast enough to keep up with the pace of requests.

Question 2

Which statement describes the goal of a modern data architecture?

What is a Data Architecture? Modern Data Architectures Explained

Question 3

Which analytic workload scenario can be a use case for the batch ingestion process?

This is a good use case for batch ingestion because the data is pooled together and processed overnight.

Question 4

A medical research company has a ribonucleic acid (RNA) sequencing machine that stores its private results to the lab’s on-premises networked-attached storage. Their data science team wants to ingest these results into their AWS account. How should they ingest this data?

DataSync is a purpose-built service to transfer data from file stores and can write to Amazon S3

Question 5

A data engineer has ingested a new JSON file into an Amazon S3 bucket in their data lake. An AWS Glue Data Catalog maintains metadata about data in the lake. Which feature of AWS Glue can the data engineer use to discover the JSON data schema with the fewest steps in a code-free way?

When an AWS Glue crawler runs on an S3 bucket, the crawler automatically generates a schema and stores it with other metadata to the Data Catalog. This makes it discoverable for data lake consumers.

Question 6

A data pipeline will ingest clickstream data from a shopping website. The data engineer must transform data as it arrives to feed a real-time analytics Amazon OpenSearch Service dashboard. They must also generate a monthly report based on the dashboard. Which configuration meets this need?

Kinesis Data Streams can ingest the data. Amazon Managed Service for Apache Flink can also consume the data from the data stream and process it immediately to feed the OpenSearch Service dashboard by using Kinesis Data Streams. Firehose can deliver data to storage and analytics destinations such as OpenSearch Service, where the report can be produced.

Question 7

Which statement accurately describes a consideration for designing pipeline storage?

It is common practice to archive data out of a relational database into a more cost-efficient storage option. Amazon S3 storage classes are purpose-built for varying access patterns at corresponding costs.

Question 8

A data engineer is designing a low-cost infrastructure to store data directly from a central repository for both structured and unstructured data. Which option meets the data engineer’s needs?

This scenario describes a data lake. Data lakes are centralized repositories that developers can use to store structured and unstructured data regardless of scale. Amazon S3 is a good choice for this purpose.

Question 9

A DevOps engineer is migrating an on-premises Apache Hadoop cluster to AWS. The cluster runs scheduled jobs by using parallel processing. Which AWS service is the MOST appropriate choice?

Amazon EMR supports multiple Apache Hadoop applications, including frameworks such as Hadoop MapReduce and Apache Spark.

Question 10

A marketing manager quickly needs one-time insights about the number of leads and closed deals across multiple postal codes. Which service would be the MOST cost-effective method to query daily aggregates of sales data stored in Amazon S3?

Athena can be used for one-time querying and quick analysis of data directly on Amazon S3 and uses a pay-as-you-go pricing model.

Created 17 February 2025 by Dennis Kibbe. Last modified $Date: 2025/02/26 10:33:11 $ by $Author: dnk $.