MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
Correct Answer: C. The NameNode schedules creation of another replica from an existing copy
Explanation:
The NameNode monitors replica counts using DataNode reports.
Under-replicated blocks are copied to restore the target factor.
Choose an option to check your answer.
Correct Answer: C. It prefers a nearby replica to reduce network cost
Explanation:
Topology-aware selection can favor local or same-rack copies.
This improves throughput and reduces cross-rack traffic.
Choose an option to check your answer.
Correct Answer: C. Groups may have different counts, so partial averages need weights
Explanation:
An average of averages is wrong when group sizes differ.
Emitting partial sums and counts enables a correct final average.
Choose an option to check your answer.
Correct Answer: C. When tasks write to external systems with non-idempotent side effects
Explanation:
Duplicate attempts can repeat external updates or messages.
Side effects must be idempotent or speculation should be controlled.
Choose an option to check your answer.
Correct Answer: C. To define the ordering of intermediate keys
Explanation:
The sort comparator controls key sequence within each partition.
It may use all fields of a composite key.
Choose an option to check your answer.
Correct Answer: D. Transforming data into a predefined structure before storage
Explanation:
Schema-on-write validates and organizes data during ingestion.
It supports predictable queries but is less flexible.
Choose an option to check your answer.
Correct Answer: D. Evidence that the DataNode is alive and available
Explanation:
DataNodes send regular heartbeats to the NameNode.
Missing heartbeats can cause the node to be marked unavailable.
Choose an option to check your answer.
Correct Answer: D. Detecting data corruption during storage or transfer
Explanation:
Checksums allow clients and DataNodes to verify block integrity.
Corrupt replicas can be replaced from healthy copies.
Choose an option to check your answer.
Correct Answer: D. A logical portion of input assigned to one map task
Explanation:
InputFormat creates splits that define parallel map work.
A split may correspond closely to an HDFS block but is a logical concept.
Choose an option to check your answer.
Correct Answer: D. One execution instance of a map or reduce task
Explanation:
A task can have multiple attempts because of failures or speculation.
Only a successful committed attempt contributes final output.
Choose an option to check your answer.
Correct Answer: D. Keys that belong to the same reducer group must reach the same reducer
Explanation:
If grouped keys are split across reducers, no reducer sees the complete group.
The partitioner commonly uses the primary grouping fields.
Choose an option to check your answer.
Correct Answer: A. A repository that stores large amounts of raw data in diverse formats
Explanation:
Data lakes retain structured, semi-structured, and unstructured data.
Governance and metadata are needed to prevent them becoming disorganized.