MCQ Collection
Data Mining MCQs
Practice Data Mining questions with answers and explanations.
Choose an option to check your answer.
A.
Removes rare categories
B.
Adds a small count to category frequencies before probability estimation
C.
Standardizes continuous features
D.
Increases the number of classes
Show Answer
Correct Answer: B. Adds a small count to category frequencies before probability estimation
Explanation:
Add-one smoothing prevents zero conditional probabilities.
It is common in multinomial and Bernoulli Naive Bayes.
Choose an option to check your answer.
A.
By setting k equal to the number of features automatically
B.
By comparing validation or cross-validation performance
C.
By using the test set repeatedly
D.
By choosing the largest possible k
Show Answer
Correct Answer: B. By comparing validation or cross-validation performance
Explanation:
K is a hyperparameter controlling local smoothness.
Validation estimates which value generalizes best.
Choose an option to check your answer.
A.
Yes, adding items always lowers confidence
B.
Yes, every subset has identical confidence
C.
No, confidence is not anti-monotone in the same way as support
D.
No, because confidence cannot be calculated
Show Answer
Correct Answer: C. No, confidence is not anti-monotone in the same way as support
Explanation:
Support has a subset monotonicity property, but confidence depends on a changing denominator.
Adding or removing antecedent items can raise or lower confidence.
Choose an option to check your answer.
A.
All transactions not containing the item
B.
The set of class labels predicted by the item
C.
The collection of prefix paths leading to a selected item
D.
A matrix of pairwise distances
Show Answer
Correct Answer: C. The collection of prefix paths leading to a selected item
Explanation:
Prefix paths summarize contexts in which the suffix item appears.
Their counts support construction of a conditional FP-tree.
Choose an option to check your answer.
A.
Selecting every hyperparameter repeatedly
B.
Fitting preprocessing statistics
C.
Estimating performance on unseen data after model selection
D.
Increasing the training sample
Show Answer
Correct Answer: C. Estimating performance on unseen data after model selection
Explanation:
The test set simulates future cases not used in development.
Using it during tuning makes the estimate optimistically biased.
Choose an option to check your answer.
A.
Training k models on the same full dataset
B.
Using k nearest neighbors only
C.
Repeatedly training on k-1 folds and validating on the remaining fold
D.
Testing on the training set k times
Show Answer
Correct Answer: C. Repeatedly training on k-1 folds and validating on the remaining fold
Explanation:
Every fold serves once as validation data.
The average score provides a more stable estimate than a single split.
Choose an option to check your answer.
A.
Replace entropy with Euclidean distance
B.
Increase tree depth automatically
C.
Adjust information gain for the intrinsic information of a split
D.
Calculate rule confidence
Show Answer
Correct Answer: C. Adjust information gain for the intrinsic information of a split
Explanation:
Gain ratio penalizes splits that fragment data into many small groups.
It is used in algorithms such as C4.5.
Choose an option to check your answer.
A.
They guarantee independent features
B.
They remove class imbalance
C.
They prevent numerical underflow and turn products into sums
D.
They convert continuous values to categories
Show Answer
Correct Answer: C. They prevent numerical underflow and turn products into sums
Explanation:
Multiplying many small probabilities can underflow to zero.
Taking logs gives stable sums while preserving the ranking.
Choose an option to check your answer.
A.
A numeric variable with a large range
B.
A balanced binary target
C.
A table sorted by date
D.
The same country recorded as 'Pakistan', 'PK', and 'Pak'
Show Answer
Correct Answer: D. The same country recorded as 'Pakistan', 'PK', and 'Pak'
Explanation:
Inconsistent coding represents the same concept in multiple forms.
Standardization is needed so equivalent values are treated as one category.
Choose an option to check your answer.
A.
Using identical numeric widths
B.
Assigning every value to one bin
C.
Making all bin means equal
D.
Creating intervals with approximately equal numbers of observations
Show Answer
Correct Answer: D. Creating intervals with approximately equal numbers of observations
Explanation:
Quantile-based bins balance counts rather than widths.
Their numeric ranges may differ substantially.
Choose an option to check your answer.
A.
The probability of the consequent only
B.
The number of items divided by features
C.
The average transaction length
D.
The proportion of transactions containing the itemset
Show Answer
Correct Answer: D. The proportion of transactions containing the itemset
Explanation:
Support normalizes support count by the total number of transactions.
It therefore ranges from zero to one.
Choose an option to check your answer.
A.
An itemset with maximum confidence only
B.
The largest transaction in the database
C.
An itemset with all available items
D.
A frequent itemset with no frequent proper superset
Show Answer
Correct Answer: D. A frequent itemset with no frequent proper superset
Explanation:
Maximal itemsets mark the boundary of frequency.
They provide stronger compression but do not preserve supports of all subsets.