MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
Correct Answer: B. Resilient Distributed Dataset
Explanation:
An RDD is an immutable distributed collection divided into partitions.
Its lineage allows failed partitions to be recomputed.
Choose an option to check your answer.
Correct Answer: B. Produces zero or more output elements per input and flattens them
Explanation:
flatMap is useful for expanding records, such as splitting lines into words.
Empty outputs can also remove records.
Choose an option to check your answer.
Correct Answer: B. An RDD containing the keys
Explanation:
The pair structure is projected onto its first component.
The result remains distributed.
Choose an option to check your answer.
Correct Answer: B. A distributed collection of rows organized into named columns
Explanation:
A DataFrame has a schema describing column names and data types.
Its structured representation enables query optimization.
Choose an option to check your answer.
Correct Answer: C. Spark's library for scalable machine learning and related utilities
Explanation:
MLlib provides algorithms, feature tools, pipelines, and evaluators.
Its APIs operate on distributed Spark data structures.
Choose an option to check your answer.
Correct Answer: C. The time at which an event actually occurred at its source
Explanation:
Event time is embedded in or derived from each event.
It may differ from arrival and processing time because of delays.
Choose an option to check your answer.
Correct Answer: C. A class parameterized by one or more types
Explanation:
Type parameters allow one class design to work safely with multiple data types.
For example, Box[T] can contain values of type T.
Choose an option to check your answer.
Correct Answer: C. Lost partitions can be recomputed from lineage
Explanation:
Spark records how an RDD was derived from earlier datasets.
It reruns the required transformations after data loss.
Choose an option to check your answer.
Correct Answer: C. Retains records for which a predicate returns true
Explanation:
filter is a narrow transformation.
It preserves only records satisfying the condition.
Choose an option to check your answer.
Correct Answer: C. Combines values having the same key
Explanation:
A standard join returns each key with pairs of matching values.
It usually requires compatible partitioning or a shuffle.
Choose an option to check your answer.
Correct Answer: C. A Dataset can provide compile-time type information for its records
Explanation:
Typed Datasets use encoders and Scala types.
A DataFrame is conceptually a Dataset of generic Row objects.
Choose an option to check your answer.
Correct Answer: D. A component that converts one DataFrame into another using an existing model or rule
Explanation:
Transformers add predictions or derived features through transform().
Models and feature-processing stages can both be Transformers.