MCQ Collection
Big Data Analytics MCQs
Practice Big Data Analytics questions with answers and explanations.
Choose an option to check your answer.
Correct Answer: B. A threshold used to limit how long late event-time data are retained for stateful operations
Explanation:
Watermarks express an assumption about maximum lateness.
They allow old state to be removed while still accepting reasonably late events.
Choose an option to check your answer.
Correct Answer: C. It stores progress and state needed to resume after failure
Explanation:
Checkpoint data records offsets, metadata, and state-store information.
A restarted query can continue consistently from the saved location.
Choose an option to check your answer.
Correct Answer: D. The destination to which processed results are written
Explanation:
Sinks include files, tables, message systems, and consoles.
Their transactional capabilities affect end-to-end guarantees.
Choose an option to check your answer.
Correct Answer: A. A condition where incoming data arrives faster than the system can process it
Explanation:
Backlog and latency grow when processing capacity is insufficient.
Rate control, scaling, and optimization help restore stability.
Choose an option to check your answer.
Correct Answer: B. SQL-like querying and data warehousing on distributed storage
Explanation:
Hive provides tables, schemas, and a SQL interface over large datasets.
Queries may execute through engines such as Tez or Spark.
Choose an option to check your answer.
Correct Answer: C. A distributed column-family database for low-latency random access on Hadoop
Explanation:
HBase stores sparse tables over HDFS and supports row-key reads and writes.
It complements HDFS's large sequential access model.
Choose an option to check your answer.
Correct Answer: D. Transfer bulk data between relational databases and Hadoop storage
Explanation:
Sqoop parallelized imports and exports using database connectors.
It was commonly used to move structured enterprise data into Hadoop.
Choose an option to check your answer.
Correct Answer: A. Collecting and transporting high-volume event and log data
Explanation:
Flume uses sources, channels, and sinks to build ingestion pipelines.
It was widely used for log delivery into HDFS or HBase.
Choose an option to check your answer.
Correct Answer: B. A workflow scheduler for coordinating Hadoop ecosystem jobs
Explanation:
Oozie defines workflows and time- or data-triggered coordinators.
It can sequence MapReduce, Hive, Sqoop, and related actions.
Choose an option to check your answer.
Correct Answer: C. Distributed coordination, naming, configuration, and leader election
Explanation:
ZooKeeper provides a small, consistent coordination service.
Distributed applications use it to manage shared state safely.
Choose an option to check your answer.
Correct Answer: D. A high-level dataflow language for large-scale data transformation
Explanation:
Pig Latin expresses pipelines such as load, filter, group, and join.
The system translates them into distributed execution jobs.
Choose an option to check your answer.
Correct Answer: D. If A is a subtype of B, then F[A] is a subtype of F[B]
Explanation:
Covariance is marked with a plus sign on the type parameter.
It is appropriate for producer-like immutable types.