MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
Correct Answer: D. Its partitions can reside and be processed across multiple executors
Explanation:
Partitioning enables parallel computation over cluster workers.
Each task normally handles one partition.
Choose an option to check your answer.
Correct Answer: D. Removes duplicate elements from an RDD
Explanation:
Spark must group equal elements to determine uniqueness.
Therefore distinct commonly requires a shuffle.
Choose an option to check your answer.
Correct Answer: D. A read-only value efficiently distributed to executors
Explanation:
Broadcast variables avoid sending the same large reference object with every task.
Executors cache one local copy per application.
Choose an option to check your answer.
Correct Answer: D. A definition of column names, data types, and nullability
Explanation:
The schema gives structure to each row.
Spark uses it for validation, planning, and optimized execution.
Choose an option to check your answer.
Correct Answer: A. A component that fits a model or Transformer from a DataFrame
Explanation:
An Estimator exposes fit() and learns parameters from input data.
The resulting object is usually a Transformer.
Choose an option to check your answer.
Correct Answer: A. A bounded time range over which events are grouped or aggregated
Explanation:
Windows turn an unbounded stream into finite analytical groups.
They may be tumbling, sliding, or session based.
Choose an option to check your answer.
Correct Answer: A. If A is a subtype of B, then F[B] is a subtype of F[A]
Explanation:
Contravariance reverses the subtype relation.
It is commonly useful for consumer-like abstractions.
Choose an option to check your answer.
Correct Answer: A. Transformations create new RDDs rather than modifying existing ones
Explanation:
Immutability simplifies fault recovery and concurrent execution.
Lineage represents a functional chain of dataset versions.
Choose an option to check your answer.
Correct Answer: A. Returns an RDD containing elements from both inputs
Explanation:
Union concatenates the distributed collections logically.
Duplicates are retained unless distinct is applied.
Choose an option to check your answer.
Correct Answer: A. When one side of the join is small enough to distribute to all executors
Explanation:
Broadcasting the small relation avoids shuffling the large relation.
Executor memory must still be sufficient.
Choose an option to check your answer.
Correct Answer: A. Automatically determining column types from input data
Explanation:
Spark can inspect data or file metadata to infer a schema.
Explicit schemas are often faster and more reliable for production.
Choose an option to check your answer.
Correct Answer: B. To organize multiple feature and model stages into one reproducible workflow
Explanation:
Pipelines preserve the sequence of transformations and model fitting.
They reduce training-serving inconsistencies.