MCQ Collection
Data Mining MCQs
Practice Data Mining questions with answers and explanations.
Choose an option to check your answer.
A.
A confusion matrix for classification
B.
A table of itemset unions
C.
A matrix of class priors
D.
A visualization of distances between neighboring map units
Show Answer
Correct Answer: D. A visualization of distances between neighboring map units
Explanation:
Large neighboring distances can indicate cluster boundaries.
Small distances suggest homogeneous regions of the map.
Choose an option to check your answer.
A.
It measures only true negatives
B.
It counts model parameters
C.
It prevents class imbalance
D.
Missed anomalies may cause serious loss or risk
Show Answer
Correct Answer: D. Missed anomalies may cause serious loss or risk
Explanation:
Recall measures the fraction of actual anomalies detected.
High recall is vital when false negatives are costly.
Choose an option to check your answer.
A.
Find a separating hyperplane with the maximum margin
B.
Minimize itemset support
C.
Create the deepest decision tree
D.
Choose the nearest centroid
Show Answer
Correct Answer: A. Find a separating hyperplane with the maximum margin
Explanation:
SVM seeks a boundary that leaves the widest gap between classes.
A larger margin is associated with better robustness.
Choose an option to check your answer.
A.
Spreading labels through a similarity graph from labeled to unlabeled points
B.
Generating labels randomly
C.
Removing all unlabeled observations
D.
Using association rules as class labels
Show Answer
Correct Answer: A. Spreading labels through a similarity graph from labeled to unlabeled points
Explanation:
Nearby or strongly connected observations are assumed likely to share labels.
The graph structure helps exploit unlabeled data.
Choose an option to check your answer.
A.
Assign all points to the empty cluster
B.
Reduce every feature to zero
C.
Reinitialize that centroid or use an implementation with an empty-cluster strategy
D.
Treat it as a new class label
Show Answer
Correct Answer: C. Reinitialize that centroid or use an implementation with an empty-cluster strategy
Explanation:
An empty cluster has no points from which to compute a mean.
Algorithms handle this by relocating or replacing the centroid.
Choose an option to check your answer.
A.
Nearby inputs tend to map to nearby grid units
B.
All input distances are preserved exactly
C.
Every class gets one unit
D.
The grid has no boundaries
Show Answer
Correct Answer: A. Nearby inputs tend to map to nearby grid units
Explanation:
SOM aims to retain neighborhood relationships rather than exact geometry.
This makes the map useful for visual exploration.
Choose an option to check your answer.
A.
Anomalies are often extremely rare
B.
Accuracy cannot be computed for binary labels
C.
Normal cases are always missing
D.
Anomaly scores are categorical
Show Answer
Correct Answer: A. Anomalies are often extremely rare
Explanation:
Predicting every case as normal can achieve high accuracy.
Minority-focused metrics provide a more meaningful evaluation.
Choose an option to check your answer.
A.
All correctly classified points
B.
Training points that determine the position of the decision boundary
C.
The class centroids
D.
The most frequent items
Show Answer
Correct Answer: B. Training points that determine the position of the decision boundary
Explanation:
Support vectors lie on or inside the margin boundaries.
Moving non-support-vector points slightly often does not change the classifier.
Choose an option to check your answer.
A.
Unlabeled data are always uniformly distributed
B.
Nearby points or points on the same data manifold tend to share labels
C.
Every feature is independent
D.
Classes have identical sizes
Show Answer
Correct Answer: B. Nearby points or points on the same data manifold tend to share labels
Explanation:
Unlabeled observations reveal the geometry and density of the input space.
Labels can then be extended along that structure.
Choose an option to check your answer.
A.
With all observations in one cluster
B.
With k random centroids
C.
With known class labels
D.
With each observation as its own cluster
Show Answer
Correct Answer: D. With each observation as its own cluster
Explanation:
Agglomerative clustering repeatedly merges the closest clusters.
The process forms a bottom-up hierarchy.
Choose an option to check your answer.
A.
The number of mislabeled classes
B.
The average distance from inputs to their best matching units
C.
The support lost during pruning
D.
The variance of grid coordinates only
Show Answer
Correct Answer: B. The average distance from inputs to their best matching units
Explanation:
Quantization error measures how well prototypes represent the observations.
Lower values indicate closer representation, though map complexity also matters.
Choose an option to check your answer.
A.
Changing feature names
B.
Choosing the anomaly-score cutoff that balances operational costs
C.
Selecting the number of transactions
D.
Making every score binary during training
Show Answer
Correct Answer: B. Choosing the anomaly-score cutoff that balances operational costs
Explanation:
Different thresholds trade false positives against missed anomalies.
The right choice depends on risk, capacity, and class prevalence.