MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
Correct Answer: B. Blocked threads cannot perform other useful work
Explanation:
Composing callbacks or transformations preserves asynchronous execution.
Excessive blocking limits throughput and can cause deadlocks.
Choose an option to check your answer.
Correct Answer: B. The computation triggered by an action
Explanation:
An action submits a job to the scheduler.
A Spark application can run many jobs.
Choose an option to check your answer.
Correct Answer: B. Returns the number of elements in an RDD
Explanation:
count triggers distributed evaluation of the required partitions.
Partial counts are aggregated into one driver result.
Choose an option to check your answer.
Correct Answer: B. When the same expensive dataset is reused by multiple actions
Explanation:
Caching trades storage for reduced recomputation.
It is valuable in iterative algorithms and interactive analysis.
Choose an option to check your answer.
Correct Answer: B. Grouped aggregation over one or more columns
Explanation:
Rows sharing grouping values are combined for aggregate functions.
The operation generally requires a shuffle.
Choose an option to check your answer.
Correct Answer: C. To compare hyperparameter settings using held-out performance
Explanation:
These tools fit candidate pipelines on training subsets and score validation data.
They help select settings that generalize.
Choose an option to check your answer.
Correct Answer: C. Converting an object into a form that can be stored or transmitted
Explanation:
Distributed frameworks serialize closures and data for network transfer.
Nonserializable fields can cause runtime failures.
Choose an option to check your answer.
Correct Answer: C. A set of tasks that can run without crossing a shuffle boundary
Explanation:
Wide dependencies divide a job into stages.
Within a stage, narrow transformations can be pipelined.
Choose an option to check your answer.
Correct Answer: C. Aggregates elements using an associative binary function
Explanation:
Partial results are combined across partitions.
Associativity is required because execution order can vary.
Choose an option to check your answer.
Correct Answer: C. Removes cached RDD blocks from executor storage
Explanation:
Unpersist frees memory or disk occupied by cached partitions.
It should be used when reuse is complete.
Choose an option to check your answer.
Correct Answer: C. The framework's logical and physical query optimizer
Explanation:
Catalyst applies analysis and optimization rules to query plans.
It selects an executable physical strategy.
Choose an option to check your answer.
Correct Answer: D. Defining combinations of hyperparameter values to evaluate
Explanation:
The grid supplies candidate parameter maps to validation tools.
Large grids can be computationally expensive.