MCQ Collection
Data Mining MCQs
Practice Data Mining questions with answers and explanations.
Choose an option to check your answer.
A.
Business usefulness
B.
Support
C.
Novelty to a manager
D.
Ease of explanation to a specific audience
Show Answer
Correct Answer: B. Support
Explanation:
Support is computed directly from observed frequencies.
Usefulness and novelty depend partly on stakeholder goals and prior knowledge.
Choose an option to check your answer.
A.
Converting all variables to categories
B.
Rescaling numeric attributes to a common range or scale
C.
Removing every duplicate row
D.
Replacing missing values with labels
Show Answer
Correct Answer: B. Rescaling numeric attributes to a common range or scale
Explanation:
Normalization prevents large-scale variables from dominating scale-sensitive algorithms.
Common methods include min-max scaling and z-score standardization.
Choose an option to check your answer.
A.
Creating synthetic class labels
B.
Choosing a relevant subset of available attributes
C.
Increasing all feature values
D.
Duplicating informative columns
Show Answer
Correct Answer: B. Choosing a relevant subset of available attributes
Explanation:
Feature selection reduces dimensionality without transforming the retained variables.
It can improve speed, interpretability, and generalization.
Choose an option to check your answer.
A.
A frequent itemset
B.
A negative association
C.
A perfect rule
D.
An invalid probability
Show Answer
Correct Answer: B. A negative association
Explanation:
The consequent occurs less often with the antecedent than expected from its baseline.
Such a rule describes avoidance or negative dependence.
Choose an option to check your answer.
A.
Using no support threshold
B.
Searching only for patterns that satisfy user-specified conditions
C.
Mining only one-itemsets
D.
Replacing transactions with clusters
Show Answer
Correct Answer: B. Searching only for patterns that satisfy user-specified conditions
Explanation:
Constraints can restrict items, rule length, aggregate values, or business conditions.
They reduce search and focus the output on relevant patterns.
Choose an option to check your answer.
A.
Splitting items into classes for prediction
B.
Mining local frequent itemsets in partitions and verifying their union globally
C.
Partitioning features for normalization
D.
Clustering transactions without support
Show Answer
Correct Answer: B. Mining local frequent itemsets in partitions and verifying their union globally
Explanation:
Any globally frequent itemset must be frequent in at least one partition.
This property limits global candidates and can reduce database scans.
Choose an option to check your answer.
A.
It guarantees zero model error
B.
It replaces preprocessing
C.
It helps define meaningful problems and interpret discovered patterns
D.
It removes the need for validation
Show Answer
Correct Answer: C. It helps define meaningful problems and interpret discovered patterns
Explanation:
Domain knowledge guides feature selection, constraints, and practical interpretation.
Without context, an algorithm may find patterns that are statistically real but irrelevant.
Choose an option to check your answer.
A.
Centers values at the median only
B.
Makes the variance exactly one without shifting
C.
Maps values into a specified interval such as 0 to 1
D.
Converts numeric values to ranks
Show Answer
Correct Answer: C. Maps values into a specified interval such as 0 to 1
Explanation:
Min-max scaling uses the observed minimum and maximum.
It preserves order but can be sensitive to extreme values.
Choose an option to check your answer.
A.
Removing duplicate records
B.
Scaling training features
C.
Using information during training that would not be available at prediction time
D.
Using cross-validation
Show Answer
Correct Answer: C. Using information during training that would not be available at prediction time
Explanation:
Leakage creates unrealistically high evaluation results.
Examples include preprocessing on the full dataset or using future information.
Choose an option to check your answer.
A.
The ratio of confidence to transaction length
B.
The number of candidate itemsets
C.
The difference between observed joint support and support expected under independence
D.
The average number of rule items
Show Answer
Correct Answer: C. The difference between observed joint support and support expected under independence
Explanation:
Leverage is support(XY) minus support(X)support(Y).
Zero leverage corresponds to independence.
Choose an option to check your answer.
A.
Every superset of an infrequent itemset is frequent
B.
Confidence always decreases with rule length
C.
Every subset of a frequent itemset must also be frequent
D.
All frequent itemsets have equal support
Show Answer
Correct Answer: C. Every subset of a frequent itemset must also be frequent
Explanation:
The Apriori property follows from support anti-monotonicity.
An itemset cannot occur more often than any of its subsets.
Choose an option to check your answer.
A.
Replacing support with confidence
B.
Using only the last transaction
C.
Mining a representative subset to obtain candidate patterns more cheaply
D.
Generating random item names
Show Answer
Correct Answer: C. Mining a representative subset to obtain candidate patterns more cheaply
Explanation:
Sampling reduces computation by working on fewer transactions.
A verification step may be needed because some true patterns can be missed.