MCQ Collection
Data Mining MCQs
Practice Data Mining questions with answers and explanations.
Choose an option to check your answer.
A.
When all features are class labels
B.
When coordinate-wise absolute differences are more appropriate or outlier sensitivity should be reduced
C.
When no features are numeric
D.
When association support is required
Show Answer
Correct Answer: B. When coordinate-wise absolute differences are more appropriate or outlier sensitivity should be reduced
Explanation:
Manhattan distance sums absolute differences rather than squared contributions.
It can be more robust in some high-dimensional settings.
Choose an option to check your answer.
A.
The number of clusters
B.
The penalty for margin violations
C.
The number of support vectors fixed in advance
D.
The kernel dimension only
Show Answer
Correct Answer: B. The penalty for margin violations
Explanation:
Large C emphasizes fitting training data, while small C permits more violations for a wider margin.
It controls regularization strength inversely.
Choose an option to check your answer.
A.
The true label of every observation
B.
The number of clusters k
C.
The number of outliers
D.
The association confidence threshold
Show Answer
Correct Answer: B. The number of clusters k
Explanation:
K-means creates exactly the requested number of clusters.
Validation methods can help choose a reasonable k.
Choose an option to check your answer.
A.
The largest distance between any cross-cluster pair
B.
The smallest pairwise distance
C.
The distance to the global mean
D.
The median class label
Show Answer
Correct Answer: A. The largest distance between any cross-cluster pair
Explanation:
Complete linkage considers the farthest members of two clusters.
It favors compact groups with smaller diameters.
Choose an option to check your answer.
A.
A single extreme numeric value
B.
A group of observations whose combined pattern is abnormal
C.
The largest cluster
D.
A missing transaction
Show Answer
Correct Answer: B. A group of observations whose combined pattern is abnormal
Explanation:
Individual members may appear normal in isolation.
Their sequence or joint configuration creates the anomaly.
Choose an option to check your answer.
A.
The number of words on the page only
B.
The quantity and quality of incoming links
C.
The page's class label
D.
The Euclidean distance between users
Show Answer
Correct Answer: B. The quantity and quality of incoming links
Explanation:
A link from an important page contributes more than a link from an unimportant page.
The method models a random surfer moving through the link graph.
Choose an option to check your answer.
A.
Pearson covariance
B.
Euclidean similarity without modification
C.
Jaccard similarity
D.
Entropy gain
Show Answer
Correct Answer: C. Jaccard similarity
Explanation:
Jaccard focuses on shared presences and ignores shared absences.
This suits applications such as document-term or purchase data.
Choose an option to check your answer.
A.
A very smooth boundary with many allowed errors
B.
No support vectors
C.
A narrower margin and increased risk of overfitting
D.
Automatic feature selection
Show Answer
Correct Answer: C. A narrower margin and increased risk of overfitting
Explanation:
A large penalty forces the model to avoid training violations aggressively.
This can make the boundary sensitive to noise.
Choose an option to check your answer.
A.
Each centroid is assigned a class label
B.
Every point becomes a new cluster
C.
Outliers are deleted
D.
Each point is assigned to its nearest centroid
Show Answer
Correct Answer: D. Each point is assigned to its nearest centroid
Explanation:
Distance to current centroids determines cluster membership.
This step reduces or preserves the objective for fixed centroids.
Choose an option to check your answer.
A.
The number of dendrogram leaves
B.
The SVM hinge loss
C.
The increase in within-cluster variance caused by a merge
D.
The association rule length
Show Answer
Correct Answer: C. The increase in within-cluster variance caused by a merge
Explanation:
Ward's method selects merges that cause the smallest loss of compactness.
It often produces relatively balanced, spherical clusters.
Choose an option to check your answer.
A.
Clustering entirely unlabeled data
B.
Using only frequent items
C.
Learning from examples labeled normal and anomalous
D.
Detecting anomalies without features
Show Answer
Correct Answer: C. Learning from examples labeled normal and anomalous
Explanation:
The task becomes an imbalanced classification problem when labels are available.
The main challenge is often scarce and changing anomaly examples.
Choose an option to check your answer.
A.
Predicting a continuous target
B.
Removing duplicate user records only
C.
Finding groups with dense internal connections and sparser external connections
D.
Ranking features by variance
Show Answer
Correct Answer: C. Finding groups with dense internal connections and sparser external connections
Explanation:
Communities often correspond to social circles, interests, or functional groups.
They are based on network connectivity rather than ordinary feature distance alone.