MCQ Collection
Data Mining MCQs
Practice Data Mining questions with answers and explanations.
Choose an option to check your answer.
A.
A guarantee that more features improve accuracy
B.
A database storage error
C.
A rule that all features must be binary
D.
The deterioration of distance, density, and sample coverage as feature count grows
Show Answer
Correct Answer: D. The deterioration of distance, density, and sample coverage as feature count grows
Explanation:
In high dimensions, data become sparse and distances often become less informative.
Algorithms may require much more data and careful feature selection.
Choose an option to check your answer.
A.
A shortage of observations
B.
A model with low variance
C.
A variable with a true zero
D.
Unnecessary duplication of information across attributes or records
Show Answer
Correct Answer: D. Unnecessary duplication of information across attributes or records
Explanation:
Redundancy can waste storage and overweight repeated information.
It may also create multicollinearity or duplicate counting.
Choose an option to check your answer.
A.
The variables are always independent
B.
The variables have zero variance
C.
The data contain no noise
D.
There is no linear association, though a nonlinear relationship may exist
Show Answer
Correct Answer: D. There is no linear association, though a nonlinear relationship may exist
Explanation:
Pearson correlation measures linear dependence only.
Curved relationships can be strong while producing a near-zero coefficient.
Choose an option to check your answer.
A.
Perfect independence
B.
A negative association
C.
A rule with zero support
D.
A positive association between antecedent and consequent
Show Answer
Correct Answer: D. A positive association between antecedent and consequent
Explanation:
Lift above one means Y is more common when X occurs than in the dataset overall.
The rule reflects positive dependence.
Choose an option to check your answer.
A.
The rule is strongly predictive
B.
The consequent is rare
C.
The itemset is invalid
D.
The items are common but nearly independent
Show Answer
Correct Answer: D. The items are common but nearly independent
Explanation:
High support reflects frequent co-occurrence in absolute terms.
Lift near one shows the co-occurrence is close to what marginal popularity predicts.
Choose an option to check your answer.
A.
A confusion matrix
B.
A dendrogram
C.
A support vector
D.
A hash tree
Show Answer
Correct Answer: D. A hash tree
Explanation:
Hash trees organize candidates into buckets for efficient matching.
Transactions can be mapped to relevant candidate subsets during scans.
Choose an option to check your answer.
A.
A pattern that is valid, novel, useful, and understandable
B.
Any pattern with many variables
C.
A rule with the longest text
D.
A result produced by the most complex algorithm
Show Answer
Correct Answer: A. A pattern that is valid, novel, useful, and understandable
Explanation:
Interestingness combines statistical strength with practical relevance.
A frequent pattern may still be unhelpful if it is obvious or unactionable.
Choose an option to check your answer.
A.
Obtaining a smaller representation that preserves important information
B.
Deleting all rare classes
C.
Converting every value to zero
D.
Increasing the number of dimensions
Show Answer
Correct Answer: A. Obtaining a smaller representation that preserves important information
Explanation:
Reduction can lower storage, computation, and noise.
Examples include sampling, aggregation, feature selection, and dimensionality reduction.
Choose an option to check your answer.
A.
Creating new attributes from existing data
B.
Deleting the target variable
C.
Changing row order only
D.
Selecting a random algorithm
Show Answer
Correct Answer: A. Creating new attributes from existing data
Explanation:
Constructed features may expose useful relationships not directly present in raw variables.
Examples include ratios, interactions, and time-based indicators.
Choose an option to check your answer.
A.
The antecedent and consequent are independent under the observed frequencies
B.
The rule is perfectly predictive
C.
The items never co-occur
D.
The confidence is zero
Show Answer
Correct Answer: A. The antecedent and consequent are independent under the observed frequencies
Explanation:
A lift of one means confidence equals the baseline support of the consequent.
Knowing X provides no association-based advantage for predicting Y.
Choose an option to check your answer.
A.
Lift
B.
Antecedent length
C.
Transaction ID
D.
Support count of X alone
Show Answer
Correct Answer: A. Lift
Explanation:
Lift compares rule confidence with the consequent's baseline support.
A value above one indicates improvement over the baseline frequency.
Choose an option to check your answer.
A.
Discarding transactions that cannot contain any later frequent itemset
B.
Reducing every transaction to one item
C.
Removing all frequent transactions
D.
Replacing transaction IDs with means
Show Answer
Correct Answer: A. Discarding transactions that cannot contain any later frequent itemset
Explanation:
As itemset size grows, short or nonmatching transactions cannot contribute support.
Removing them reduces later scan cost.