MCQ Collection
Data Mining MCQs
Practice Data Mining questions with answers and explanations.
Choose an option to check your answer.
Correct Answer: B. It may need to compare a query with many training observations
Explanation:
A brute-force search scans the stored training set for each query.
Indexes or approximate neighbor methods can improve speed.
Choose an option to check your answer.
Correct Answer: B. A flexible nonlinear boundary
Explanation:
The radial basis kernel measures localized similarity between points.
Combining support-vector influences yields curved decision regions.
Choose an option to check your answer.
Correct Answer: C. Choose well-separated initial centroids probabilistically
Explanation:
K-means++ spreads initial centers across the data.
It usually improves convergence and solution quality over naive random starts.
Choose an option to check your answer.
Correct Answer: B. Early merge decisions generally cannot be undone
Explanation:
A mistaken early merge remains embedded in later clusters.
The algorithm's greedy nature affects the full hierarchy.
Choose an option to check your answer.
Correct Answer: B. It flags points far from their neighbors
Explanation:
Anomalies often occupy isolated regions of feature space.
Distance thresholds or k-neighbor distances quantify isolation.
Choose an option to check your answer.
Correct Answer: B. Using personal data beyond users' reasonable expectations or consent
Explanation:
Online data can be technically accessible yet still sensitive.
Responsible mining requires privacy protection, purpose limitation, and governance.
Choose an option to check your answer.
Correct Answer: C. Majority-class points are more likely to dominate local votes
Explanation:
A dense majority class can surround minority observations.
Resampling, class weights, or local threshold adjustments may help.
Choose an option to check your answer.
Correct Answer: C. How far the influence of each training point extends
Explanation:
Large gamma gives narrow local influence and complex boundaries.
Small gamma creates smoother, broader effects.
Choose an option to check your answer.
Correct Answer: D. Large-scale variables otherwise dominate Euclidean distance
Explanation:
K-means depends directly on geometric distances.
Scaling ensures units do not determine feature importance unintentionally.
Choose an option to check your answer.
Correct Answer: C. It requires many pairwise distances and repeated cluster comparisons
Explanation:
A full distance matrix grows quadratically with the number of observations.
Memory and update costs limit naive implementations.
Choose an option to check your answer.
Correct Answer: C. A point's local density with the densities of its neighbors
Explanation:
A point is suspicious when it lies in a much sparser neighborhood than nearby points.
LOF adapts to regions with different densities.
Choose an option to check your answer.
Correct Answer: C. It records preprocessing, modeling, and evaluation steps for verification
Explanation:
A scripted pipeline can be rerun, reviewed, and updated consistently.
This supports collaboration and reliable deployment.