MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
A.
A collection containing exactly two records
B.
A computation with one of two possible result types, often error or success
C.
A Boolean feature
D.
A pair of HDFS replicas
Show Answer
Correct Answer: B. A computation with one of two possible result types, often error or success
Explanation:
By convention, Left often contains an error and Right a success value.
Either preserves error information without throwing immediately.
Choose an option to check your answer.
A.
An operation that always returns a driver-side value
B.
A lazy operation that defines a new RDD
C.
A command that starts a YARN cluster
D.
A method that changes an RDD in place
Show Answer
Correct Answer: B. A lazy operation that defines a new RDD
Explanation:
Transformations such as map and filter add steps to the lineage graph.
They are not executed until an action requires results.
Choose an option to check your answer.
A.
Returns all elements including duplicates
B.
Returns elements present in both RDDs
C.
Pairs records by key
D.
Subtracts numeric values
Show Answer
Correct Answer: B. Returns elements present in both RDDs
Explanation:
Intersection identifies common elements.
It normally requires data exchange to compare values across partitions.
Choose an option to check your answer.
A.
A mutable value executors can reliably read and update for program logic
B.
A variable executors can add to for aggregated metrics
C.
A cached RDD partition
D.
A SQL table
Show Answer
Correct Answer: B. A variable executors can add to for aggregated metrics
Explanation:
Accumulators are useful for counters and diagnostics.
Their updates should not determine core computation because task retries can complicate semantics.
Choose an option to check your answer.
A.
It disables query optimization
B.
It avoids an inference pass and prevents incorrect type guesses
C.
It forces every field to be nullable
D.
It collects the file to the driver
Show Answer
Correct Answer: B. It avoids an inference pass and prevents incorrect type guesses
Explanation:
Inference may scan data and misinterpret unusual values.
An explicit schema gives predictable types and lower startup cost.
Choose an option to check your answer.
A.
Splits one vector into separate files
B.
Assigns class labels
C.
Combines multiple input columns into one feature vector
D.
Creates YARN containers
Show Answer
Correct Answer: C. Combines multiple input columns into one feature vector
Explanation:
Many MLlib algorithms expect a single vector-valued features column.
VectorAssembler constructs it from selected fields.
Choose an option to check your answer.
A.
Repeating a computation forever
B.
Creating a YARN task attempt
C.
Representing a computation that may succeed or throw an exception
D.
Testing an association rule
Show Answer
Correct Answer: C. Representing a computation that may succeed or throw an exception
Explanation:
Try wraps results as Success or Failure.
Functional operations can then handle failures explicitly.
Choose an option to check your answer.
A.
A lazy definition with no computation
B.
A Scala class declaration
C.
An operation that triggers execution and returns or writes a result
D.
An HDFS replication command
Show Answer
Correct Answer: C. An operation that triggers execution and returns or writes a result
Explanation:
Actions include count, collect, reduce, and save operations.
They cause Spark to evaluate the necessary lineage.
Choose an option to check your answer.
A.
Computes numeric subtraction for paired records
B.
Combines every key
C.
Returns elements of one RDD that are not present in another
D.
Returns only shared elements
Show Answer
Correct Answer: C. Returns elements of one RDD that are not present in another
Explanation:
subtract performs a set-difference-style operation.
Duplicates and partitioning behavior depend on the API implementation.
Choose an option to check your answer.
A.
Accumulators cannot contain numbers
B.
Executors cannot access them
C.
Task retries or speculation can cause update behavior that is unsuitable for exact logic
D.
They always reset after each record
Show Answer
Correct Answer: C. Task retries or speculation can cause update behavior that is unsuitable for exact logic
Explanation:
Accumulators are best for observation rather than dataflow decisions.
The RDD result itself should encode required computation.
Choose an option to check your answer.
A.
A permanently replicated database
B.
A cached RDD on every cluster
C.
A session-scoped table-like name associated with a DataFrame
D.
A YARN application report
Show Answer
Correct Answer: C. A session-scoped table-like name associated with a DataFrame
Explanation:
Temporary views allow SQL queries over DataFrame data.
They normally disappear when the Spark session ends.
Choose an option to check your answer.
A.
Scaling adds more training examples
B.
It guarantees perfect accuracy
C.
It removes all categorical variables
D.
Features with large numeric ranges can dominate distances or optimization
Show Answer
Correct Answer: D. Features with large numeric ranges can dominate distances or optimization
Explanation:
Comparable scales improve numerical behavior and feature balance.
Tree-based algorithms are generally less sensitive to scaling.