MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
Correct Answer: D. A placeholder for a result that may become available asynchronously
Explanation:
Futures allow nonblocking composition of asynchronous computations.
An execution context schedules the underlying work.
Choose an option to check your answer.
Correct Answer: D. Spark delays transformations until an action is requested
Explanation:
Laziness allows Spark to optimize and pipeline operations.
Unused transformation chains do not perform work.
Choose an option to check your answer.
Correct Answer: D. Returns all RDD elements to the driver
Explanation:
collect is safe only when the complete result fits in driver memory.
Large results can crash or overload the driver.
Choose an option to check your answer.
Correct Answer: D. Keeps an RDD using a selected storage level for reuse
Explanation:
Persistence avoids recomputing reused RDDs across actions.
Storage levels can use memory, disk, or serialized forms.
Choose an option to check your answer.
Correct Answer: D. Projects specified columns or expressions
Explanation:
Projection chooses and computes output columns.
It is analogous to the SELECT list in SQL.
Choose an option to check your answer.
Correct Answer: A. Mapping categorical string labels to numeric indices
Explanation:
Many machine-learning algorithms require numeric representations.
The fitted indexer preserves a consistent category mapping.
Choose an option to check your answer.
Correct Answer: A. Scheduling asynchronous tasks such as Future computations
Explanation:
An ExecutionContext provides threads or an execution service.
Poorly chosen contexts can cause starvation or excessive concurrency.
Choose an option to check your answer.
Correct Answer: A. The sequence of transformations used to derive an RDD
Explanation:
Lineage is a logical dependency graph.
Spark uses it for planning and fault recovery.
Choose an option to check your answer.
Correct Answer: A. It returns only a limited number of elements to the driver
Explanation:
take limits driver-side data volume.
It is useful for checking records without retrieving the whole RDD.
Choose an option to check your answer.
Correct Answer: A. Persists an RDD using the default storage level
Explanation:
cache is shorthand for a default persist choice.
Actual materialization occurs when an action evaluates the RDD.
Choose an option to check your answer.
Correct Answer: A. Retains rows satisfying a Boolean condition
Explanation:
Filtering reduces the row set.
Spark may push supported predicates down to the data source.
Choose an option to check your answer.
Correct Answer: B. A sparse binary vector representing categorical levels
Explanation:
One-hot encoding avoids imposing arbitrary numeric order on categories.
Sparse storage is efficient when there are many levels.