MCQ Collection
Data Mining MCQs
Practice Data Mining questions with answers and explanations.
Choose an option to check your answer.
A.
Data mining can only use text files
B.
Data mining searches for hidden patterns, while querying retrieves explicitly requested records
C.
Database querying always requires machine learning
D.
Data mining never accesses databases
Show Answer
Correct Answer: B. Data mining searches for hidden patterns, while querying retrieves explicitly requested records
Explanation:
A query answers a predefined question about stored data.
Data mining often explores the data to uncover relationships not specified in advance.
Choose an option to check your answer.
A.
An attribute stored in two tables
B.
An attribute with exactly two possible states
C.
A continuous variable split into many bins
D.
A variable with missing values only
Show Answer
Correct Answer: B. An attribute with exactly two possible states
Explanation:
Binary attributes take two values such as yes/no or 0/1.
They may be symmetric or asymmetric depending on whether both states are equally important.
Choose an option to check your answer.
A.
Kernel Distribution Design
B.
Key Data Duplication
C.
Knowledge Discovery in Databases
D.
Knowledge-Driven Debugging
Show Answer
Correct Answer: C. Knowledge Discovery in Databases
Explanation:
KDD refers to the broader process of turning raw data into useful knowledge.
Data mining is one central step within that process.
Choose an option to check your answer.
A.
It determines the sample mean
B.
It changes the number of records
C.
It affects how similarity should treat shared zeros
D.
It makes the attribute continuous
Show Answer
Correct Answer: C. It affects how similarity should treat shared zeros
Explanation:
For asymmetric attributes, the presence state is often more informative than absence.
Measures such as Jaccard similarity ignore shared zeros for this reason.
Choose an option to check your answer.
A.
Model deployment
B.
Rule pruning after evaluation
C.
Visualization of final results only
D.
Data preprocessing
Show Answer
Correct Answer: D. Data preprocessing
Explanation:
Raw data usually contain missing values, noise, and inconsistent formats.
Preprocessing prepares the data so mining algorithms can operate reliably.
Choose an option to check your answer.
A.
A transaction set only
B.
A graph adjacency list only
C.
A dendrogram
D.
A data matrix
Show Answer
Correct Answer: D. A data matrix
Explanation:
A data matrix is the standard tabular representation used by many algorithms.
Each row is an object and each column is a feature.
Choose an option to check your answer.
A.
Assessing whether discovered patterns are valid and useful
B.
Collecting every possible variable
C.
Converting all data to images
D.
Deleting the target variable
Show Answer
Correct Answer: A. Assessing whether discovered patterns are valid and useful
Explanation:
Patterns must be judged for correctness, novelty, usefulness, and relevance.
A technically valid pattern may still have little practical value.
Choose an option to check your answer.
A.
A transaction-item representation
B.
A single continuous time series
C.
A covariance matrix only
D.
A labeled image grid
Show Answer
Correct Answer: A. A transaction-item representation
Explanation:
Each transaction contains a set of purchased items.
This representation supports frequent itemset and association rule mining.
Choose an option to check your answer.
A.
Clustering
B.
Classification
C.
Association analysis
D.
Dimensionality reduction
Show Answer
Correct Answer: B. Classification
Explanation:
Classification learns from labeled examples to assign observations to predefined categories.
Examples include spam versus non-spam and disease versus no disease.
Choose an option to check your answer.
A.
A one-dimensional histogram
B.
A graph of nodes and edges
C.
A simple ordered list only
D.
A single regression coefficient
Show Answer
Correct Answer: B. A graph of nodes and edges
Explanation:
Social actors are represented as nodes and relationships as edges.
Graph structure enables analysis of communities, influence, and connectivity.
Choose an option to check your answer.
A.
Frequent itemset mining
B.
Hierarchical clustering
C.
Regression or numeric prediction
D.
Community detection only
Show Answer
Correct Answer: C. Regression or numeric prediction
Explanation:
Regression estimates a numeric response such as price, demand, or temperature.
It differs from classification, which predicts discrete labels.
Choose an option to check your answer.
A.
The number of class labels only
B.
The number of algorithms tested
C.
The number of attributes or features
D.
The physical file size
Show Answer
Correct Answer: C. The number of attributes or features
Explanation:
Dimensionality counts the variables used to describe each observation.
High-dimensional data can create computational and statistical challenges.