MCQ Collection
Data Mining MCQs
Practice Data Mining questions with answers and explanations.
Choose an option to check your answer.
A.
The number of nonempty subset splits grows exponentially
B.
Support must be negative
C.
Long itemsets contain no subsets
D.
Confidence cannot be computed for them
Show Answer
Correct Answer: A. The number of nonempty subset splits grows exponentially
Explanation:
An itemset with k items has many possible antecedent subsets.
Efficient pruning is needed to avoid evaluating every possible rule.
Choose an option to check your answer.
A.
By recursively combining suffix items with frequent patterns in conditional trees
B.
By testing random itemsets
C.
By fitting a classifier for each item
D.
By sorting confidence values only
Show Answer
Correct Answer: A. By recursively combining suffix items with frequent patterns in conditional trees
Explanation:
The algorithm mines from shorter suffix-based conditional structures.
Each recursive step expands valid frequent itemsets.
Choose an option to check your answer.
A.
Learning training-specific noise that does not generalize
B.
Using too little training accuracy
C.
Choosing a simple model intentionally
D.
Removing irrelevant features
Show Answer
Correct Answer: A. Learning training-specific noise that does not generalize
Explanation:
An overfit model performs well on training data but poorly on new data.
Excessive complexity and leakage are common causes.
Choose an option to check your answer.
A.
A test on an attribute or feature
B.
A final class prediction only
C.
A transaction itemset
D.
A cluster centroid
Show Answer
Correct Answer: A. A test on an attribute or feature
Explanation:
Internal nodes split observations according to feature conditions.
Branches represent the outcomes of those tests.
Choose an option to check your answer.
A.
An alternative split that approximates the chosen split when its feature is missing
B.
A split generated from the target label
C.
A rule with zero information gain
D.
A branch used only for pruning
Show Answer
Correct Answer: A. An alternative split that approximates the chosen split when its feature is missing
Explanation:
Surrogate splits preserve routing when the primary split value is unavailable.
They are based on other features that produce similar partitions.
Choose an option to check your answer.
A.
Accurate class ranking may still result despite imperfect probability estimates
B.
The algorithm tests dependence explicitly
C.
It removes correlated features automatically
D.
Independence is never used
Show Answer
Correct Answer: A. Accurate class ranking may still result despite imperfect probability estimates
Explanation:
Classification only requires the correct posterior ordering, not perfect probability calibration.
Errors from dependence may partially cancel across classes.
Choose an option to check your answer.
A.
If an antecedent is frequent, every rule is valid
B.
If a consequent fails minimum confidence, its supersets can be pruned in standard rule generation
C.
Confidence rises whenever the consequent grows
D.
All consequents have equal confidence
Show Answer
Correct Answer: B. If a consequent fails minimum confidence, its supersets can be pruned in standard rule generation
Explanation:
For rules from a fixed frequent itemset, enlarging the consequent shrinks the antecedent and can reduce confidence.
This supports structured pruning of consequent candidates.
Choose an option to check your answer.
A.
When every transaction contains completely unique items
B.
When many transactions share common item prefixes
C.
When no item is frequent
D.
When items are continuous measurements
Show Answer
Correct Answer: B. When many transactions share common item prefixes
Explanation:
Shared prefixes are stored once with aggregated counts.
Dense overlap therefore produces a compact tree.
Choose an option to check your answer.
A.
Achieving perfect training accuracy
B.
Using a model too simple to capture important patterns
C.
Using too many validation folds
D.
Having a large test set
Show Answer
Correct Answer: B. Using a model too simple to capture important patterns
Explanation:
An underfit model has high bias and performs poorly even on training data.
More appropriate features or model complexity may be needed.
Choose an option to check your answer.
A.
A feature split
B.
A predicted class or class distribution
C.
A support threshold
D.
A missing-value mechanism
Show Answer
Correct Answer: B. A predicted class or class distribution
Explanation:
Leaves terminate decision paths and provide predictions.
They may store the majority class or estimated class probabilities.
Choose an option to check your answer.
A.
The maximum number of classes
B.
The smallest number of training observations allowed in a leaf
C.
The minimum number of features
D.
The number of cross-validation folds
Show Answer
Correct Answer: B. The smallest number of training observations allowed in a leaf
Explanation:
Larger leaf-size requirements reduce highly specific partitions.
This acts as a regularization parameter.
Choose an option to check your answer.
A.
It always finds nonlinear clusters
B.
Training and prediction are fast with relatively few parameters
C.
It needs no labeled data
D.
It searches every feature subset
Show Answer
Correct Answer: B. Training and prediction are fast with relatively few parameters
Explanation:
Conditional independence permits separate feature statistics per class.
This scales well to high-dimensional sparse data.