MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
A.
A fixed-size chunk into which a file is divided
B.
A complete database row
C.
A YARN container
D.
A Scala object
Show Answer
Correct Answer: A. A fixed-size chunk into which a file is divided
Explanation:
Files are split into blocks that can be stored on different DataNodes.
The final block may be smaller than the configured block size.
Choose an option to check your answer.
A.
Obtaining block locations from the NameNode and reading data directly from DataNodes
B.
Reading all bytes through the NameNode
C.
Allocating YARN containers
D.
Compiling the mapper code
Show Answer
Correct Answer: A. Obtaining block locations from the NameNode and reading data directly from DataNodes
Explanation:
The NameNode supplies metadata rather than streaming user data.
The client contacts suitable DataNodes for each block.
Choose an option to check your answer.
A.
The framework does not guarantee that a combiner will run
B.
The combiner always runs after the reducer
C.
Combiners cannot emit keys
D.
Combiners operate on final output only
Show Answer
Correct Answer: A. The framework does not guarantee that a combiner will run
Explanation:
A combiner is an optimization rather than a required semantic stage.
Job correctness must not depend on its execution.
Choose an option to check your answer.
A.
Adding an extra value to a hot key to spread its records across partitions
B.
Encrypting keys for security
C.
Sorting keys by length
D.
Removing repeated keys
Show Answer
Correct Answer: A. Adding an extra value to a hot key to spread its records across partitions
Explanation:
Salting creates multiple temporary versions of a frequent key.
A later stage combines the partial results.
Choose an option to check your answer.
A.
A key made from multiple fields
B.
A key shared by all records
C.
An encrypted HDFS path
D.
A reducer identifier
Show Answer
Correct Answer: A. A key made from multiple fields
Explanation:
Composite keys support partitioning, grouping, and secondary sort logic.
For example, they may contain customer ID and timestamp.
Choose an option to check your answer.
A.
Splitting a record into columns
B.
Storing multiple copies of data on different nodes
C.
Sorting all data globally
D.
Replacing raw data with summaries
Show Answer
Correct Answer: B. Storing multiple copies of data on different nodes
Explanation:
Replication improves availability and fault tolerance.
If one node fails, another copy can still be used.
Choose an option to check your answer.
A.
The number of blocks in a file
B.
The number of copies maintained for each block
C.
The number of NameNodes
D.
The number of users accessing a file
Show Answer
Correct Answer: B. The number of copies maintained for each block
Explanation:
Replication protects data from disk and node failures.
A common default in production clusters has historically been three.
Choose an option to check your answer.
A.
A SQL execution plan
B.
A chain of DataNodes through which a block and its replicas are written
C.
A list of NameNode commands only
D.
A Scala collection transformation
Show Answer
Correct Answer: B. A chain of DataNodes through which a block and its replicas are written
Explanation:
The client sends packets to the first DataNode, which forwards them along the pipeline.
Acknowledgments flow back after replicas receive the data.
Choose an option to check your answer.
A.
Computing an unweighted average from partial averages
B.
Summing counts for the same key
C.
Selecting a globally ordered median
D.
Assigning unique sequence numbers
Show Answer
Correct Answer: B. Summing counts for the same key
Explanation:
Addition is associative and commutative, so partial sums can be merged safely.
The same logic works whether aggregation occurs locally or only in reducers.
Choose an option to check your answer.
A.
Executing every job twice from the start
B.
Running duplicate attempts of unusually slow tasks
C.
Running reducers before mappers finish
D.
Copying HDFS data to every node
Show Answer
Correct Answer: B. Running duplicate attempts of unusually slow tasks
Explanation:
The framework may launch another attempt on a different node.
The first successful attempt is accepted, reducing straggler impact.
Choose an option to check your answer.
A.
To compare job durations
B.
To define which sorted intermediate keys are treated as one reducer group
C.
To choose the number of mappers
D.
To verify HDFS checksums
Show Answer
Correct Answer: B. To define which sorted intermediate keys are treated as one reducer group
Explanation:
Grouping can ignore part of a composite key while sort order uses all fields.
This is central to secondary-sort patterns.
Choose an option to check your answer.
A.
Defining a rigid schema before storing any data
B.
Rejecting semi-structured data
C.
Applying structure when data is queried or analyzed
D.
Encrypting data before use
Show Answer
Correct Answer: C. Applying structure when data is queried or analyzed
Explanation:
Schema-on-read keeps raw data flexible until analysis time.
It is common in data lakes and heterogeneous Big Data environments.