MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
Correct Answer: A. Many tiny files consume excessive NameNode metadata and reduce processing efficiency
Explanation:
Each file and block requires metadata in NameNode memory.
Large numbers of small files also create scheduling and I/O overhead.
Choose an option to check your answer.
Correct Answer: A. Key-value pairs
Explanation:
MapReduce represents input and output records as key-value pairs.
This common abstraction supports many data formats and algorithms.
Choose an option to check your answer.
Correct Answer: A. Writing buffered intermediate mapper data to local disk
Explanation:
Mapper output is buffered, partitioned, and sorted in memory.
When thresholds are reached, sorted runs spill to local storage.
Choose an option to check your answer.
Correct Answer: A. Making read-only reference files available to task nodes
Explanation:
Small lookup tables or configuration files can be localized with tasks.
This supports efficient map-side enrichment.
Choose an option to check your answer.
Correct Answer: A. Parallel aggregation of occurrences by key
Explanation:
Mappers emit each word with count one.
Reducers sum the counts for identical words.
Choose an option to check your answer.
Correct Answer: B. Algorithms must account for distribution, failures, data movement, and parallel execution
Explanation:
Distributed execution introduces coordination, partitioning, serialization, and recovery concerns.
Efficient designs minimize communication and exploit parallelism.
Choose an option to check your answer.
Correct Answer: B. File and block metadata are kept primarily in memory for fast access
Explanation:
Metadata memory limits the practical number of files and blocks.
This is one reason small files can be problematic.
Choose an option to check your answer.
Correct Answer: B. Transforms input key-value pairs into intermediate key-value pairs
Explanation:
The map function processes input records independently.
It emits zero or more intermediate pairs for later grouping.
Choose an option to check your answer.
Correct Answer: B. It is temporary data that can be regenerated if the mapper fails
Explanation:
Intermediate output is consumed by reducers and is not final durable data.
Re-execution is often cheaper than HDFS replication.
Choose an option to check your answer.
Correct Answer: B. A join performed by mappers without sending both datasets through reducers
Explanation:
Map-side joins can avoid shuffle when inputs meet suitable conditions.
One common case uses a small reference dataset cached on each mapper.
Choose an option to check your answer.
Correct Answer: B. Producing each unique key or value once
Explanation:
The shuffle groups duplicate occurrences under the same key.
A reducer can emit one result per distinct group.
Choose an option to check your answer.
Correct Answer: C. An ecosystem for distributed storage and processing of large datasets
Explanation:
Hadoop provides HDFS for storage and frameworks such as MapReduce and YARN.
It is designed for clusters of commodity hardware.