Introductory Data Mining Exam + Solutions

Question 1: RMSE Calculation (12 pts)

Using the model $$\hat{y} = 3.2 + 1.4X_1 - 0.9X_2 + 0.6X_3$$, compute the RMSE for the following new points:

$X_1$	$X_2$	$X_3$	$y$ (actual)
2	6	0	1
3	5	2	4
0	3	0	0.3

a) (6 pts) Fill in the table below with $\hat{y}$ and $(y - \hat{y})^2$.

b) (6 pts) Compute the RMSE and box your answer.

$X_1$	$X_2$	$X_3$	$y$	$\hat{y}$	$(y - \hat{y})^2$
2	6	0	1	?	?
3	5	2	4	?	?
0	3	0	0.3	?	?
Sum =					?

RMSE = $\sqrt{\frac{1}{3} \sum (y - \hat{y})^2}$ = ?

Question 2: Logistic Regression (14 pts)

Model: $$\log\left(\frac{P(Y=1)}{1-P(Y=1)}\right) = -1.8 + 0.7X_1 - 1.2X_2$$

a) (6 pts) Compute $P(Y=1)$ when $X_1 = 3$, $X_2 = 1$. Use $e \approx 2.718$.

b) (4 pts) Compute the odds $P(Y=1)/(1-P(Y=1))$.

c) (4 pts) By how much do the odds multiply when $X_1$ increases by 1 (holding $X_2$ fixed)?

Question 3: Lasso vs OLS (10 pts)

Predictor	OLS Coef	Lasso Coef
$X_1$	0.45	0.42
$X_2$	0.03	0.00
$X_3$	−0.71	−0.68

a) (4 pts) Which variable was shrunk to zero by Lasso?

b) (6 pts) The Lasso objective adds $\lambda \sum | \beta_j |$. With $\lambda = 0.1$, compute the $L_1$ penalty term using the Lasso coefficients.

Question 4: K-Nearest Neighbors (12 pts)

The following 5 points are given with their coordinates and class labels:

Point	$X_1$	$X_2$	Class
A	1	1	Red
B	2	1	Blue
C	5	5	Red
D	6	5	Blue
E	3	3	Red

A new point is observed at: $P = (3, 3)$.

a) (6 pts) Compute the Euclidean distance from $P$ to each of the 5 points. Then, list the 3 nearest neighbors (in order of increasing distance).

b) (6 pts) Using $k=3$ and majority vote, predict the class of point $P$. Box your final answer.

Show your distance calculations below:

Point	Distance to $P$
A	?
B	?
C	?
D	?
E	?

Predicted class of $P$: ?

Question 5: K-Means Clustering (14 pts)

Assume $K=2$. At a certain iteration, cluster 1 has 25 observations with centroid at (2, 3) and cluster 2 has 20 observations with centroid at (8, 7).
New point: P = (5, 4)

a) (6 pts) Find the Euclidean distance from the new point to each centroid.

b) (4 pts) Assign the new point to the closer cluster.

c) (4 pts) After assigning P, can you compute the new centroids? If yes, give the coordinates (rounded to two decimals). If not, explain why.

Question 6: Hierarchical Clustering (10 pts)

The table below shows the pairwise Euclidean distances between points A, B, and C:

	A	B	C
A	–	2	5
B	2	–	3
C	5	3	–

a) (6 pts) Using single linkage, which two points are merged first?

b) (4 pts) After the first merge, what is the single-linkage distance between the new cluster and the remaining point?

Question 7: Classification Tree – Entropy (12 pts)

Parent node: 60 Class 0, 40 Class 1 (total 100).

a) (4 pts) Compute the entropy of the parent node. Use natural logarithm (base $e$). Round to three decimals.

b) (8 pts) A split produces:
Left child: 50 Class 0, 10 Class 1
Right child: 10 Class 0, 30 Class 1
Compute the weighted entropy of the children (round to three decimals).

Question 8: SVM (10 pts)

Hyperplane: $w = (1, -1)$, $b = -3$. Decision function $f(x) = w \cdot x + b$.

a) (6 pts) Compute $f(3,3)$.

b) (4 pts) Predict the label (sign of $f$).

Question 9: Neural Network – Forward Pass (16 pts)

We train a neural network with two inputs $x_1=2$, $x_2=-1$ and a binary output ($Y=0$ or $Y=1$). The architecture has:

One hidden layer with one node (ReLU activation)
Two output nodes: one for $Y=1$, one for $Y=0$
Final probabilities are obtained via softmax to ensure they sum to 1.

Weights and biases:

From	To	Weight	Bias (at target)
$x_1$	Hidden	$w_1 = 0.5$	$b_h = 0.3$
$x_2$	Hidden	$w_2 = -1.2$	$b_h = 0.3$
Hidden	$Y=1$ node	$w_{o1} = 1.5$	$b_o = 0.0$
Hidden	$Y=0$ node	$w_{o0} = 0.8$	$b_o = 0.0$

Draw the neural network below with all weights and biases clearly labeled. Include input nodes, one hidden node (ReLU), and two output nodes (softmax). Show arrows and label each connection with its weight, and each node with its bias (if any).

[Draw your neural network diagram here]

a) (4 pts) Compute the pre-activation $z_h$ at the hidden node.

b) (3 pts) Compute the hidden node output after ReLU.

c) (4 pts) Compute the pre-softmax values $a_1$ and $a_0$ at the two output nodes.

d) (5 pts) Apply softmax to get $P(Y=1)$ and $P(Y=0)$. Round to three decimals and box both.

Introductory Data Mining Exam

Question 1: RMSE Calculation (12 pts)

Question 2: Logistic Regression (14 pts)

Question 3: Lasso vs OLS (10 pts)

Question 4: K-Nearest Neighbors (12 pts)

Question 5: K-Means Clustering (14 pts)

Question 6: Hierarchical Clustering (10 pts)

Question 7: Classification Tree – Entropy (12 pts)

Question 8: SVM (10 pts)

Question 9: Neural Network – Forward Pass (16 pts)

SOLUTIONS

Question 1: RMSE Calculation

Question 2: Logistic Regression

Question 3: Lasso vs OLS

Question 4: K-Nearest Neighbors

Question 5: K-Means Clustering

Question 6: Hierarchical Clustering

Question 7: Classification Tree – Entropy (Base $e$)

Question 8: SVM

Question 9: Neural Network – Forward Pass (Softmax Output)