MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
Correct Answer: D. It documents intent and prevents accidental interface changes
Explanation:
A clear signature helps users and tools understand an API.
It also keeps internal implementation changes from altering the exposed type unexpectedly.
Choose an option to check your answer.
Correct Answer: D. A unit of work that processes one partition within a stage
Explanation:
The scheduler launches tasks on executors.
Parallelism within a stage is largely determined by partition count.
Choose an option to check your answer.
Correct Answer: D. fold supplies a neutral initial value used within partitions and across results
Explanation:
The zero value must behave as an identity for correct parallel folding.
It may be applied more than once in a distributed computation.
Choose an option to check your answer.
Correct Answer: D. Saving it to reliable storage and truncating its lineage
Explanation:
Checkpointing helps when lineage becomes long or cyclic algorithms run many iterations.
It requires materialization to durable storage.
Choose an option to check your answer.
Correct Answer: A. Reading or processing only columns required by the query
Explanation:
Structured formats can avoid reading unused columns.
This lowers I/O, memory use, and CPU work.
Choose an option to check your answer.
Correct Answer: A. BinaryClassificationEvaluator
Explanation:
It evaluates score-based binary classification metrics such as area under ROC or PR.
The metric should match the application's priorities.
Choose an option to check your answer.
Correct Answer: A. A distributed computing engine designed for fast, general-purpose data processing
Explanation:
Spark supports batch, SQL, machine learning, and streaming workloads.
It can run on clusters managed by YARN, Kubernetes, or standalone mode.
Choose an option to check your answer.
Correct Answer: A. Each child partition depends on a small number of parent partitions
Explanation:
Operations such as map and filter are usually narrow.
They can often be pipelined without shuffling data.
Choose an option to check your answer.
Correct Answer: A. Different types for input elements and the final accumulator
Explanation:
aggregate uses one function within partitions and another across partition results.
Its zero value can have a different type from RDD elements.
Choose an option to check your answer.
Correct Answer: A. Checkpointing cuts lineage and uses reliable storage, while caching keeps lineage for recomputation
Explanation:
Cached blocks may be lost and rebuilt from lineage.
Checkpoint data becomes a new reliable starting point.
Choose an option to check your answer.
Correct Answer: B. They are columnar and support compression, column pruning, and predicate statistics
Explanation:
Columnar storage reads only relevant fields and compresses similar values well.
Metadata can also help skip unnecessary data.
Choose an option to check your answer.
Correct Answer: B. A silhouette-based measure of cluster cohesion and separation
Explanation:
Silhouette compares within-cluster similarity with separation from other clusters.
Higher values generally indicate clearer assignments.