Using the model $$\hat{y} = 3.2 + 1.4X_1 - 0.9X_2 + 0.6X_3$$, compute the RMSE for the following new points:
| $X_1$ | $X_2$ | $X_3$ | $y$ (actual) |
|---|---|---|---|
| 2 | 6 | 0 | 1 |
| 3 | 5 | 2 | 4 |
| 0 | 3 | 0 | 0.3 |
a) (6 pts) Fill in the table below with $\hat{y}$ and $(y - \hat{y})^2$.
b) (6 pts) Compute the RMSE and box your answer.
| $X_1$ | $X_2$ | $X_3$ | $y$ | $\hat{y}$ | $(y - \hat{y})^2$ |
|---|---|---|---|---|---|
| 2 | 6 | 0 | 1 | ? | ? |
| 3 | 5 | 2 | 4 | ? | ? |
| 0 | 3 | 0 | 0.3 | ? | ? |
| Sum = | ? | ||||
RMSE = $\sqrt{\frac{1}{3} \sum (y - \hat{y})^2}$ = ?
Model: $$\log\left(\frac{P(Y=1)}{1-P(Y=1)}\right) = -1.8 + 0.7X_1 - 1.2X_2$$
a) (6 pts) Compute $P(Y=1)$ when $X_1 = 3$, $X_2 = 1$. Use $e \approx 2.718$.
b) (4 pts) Compute the odds $P(Y=1)/(1-P(Y=1))$.
c) (4 pts) By how much do the odds multiply when $X_1$ increases by 1 (holding $X_2$ fixed)?
| Predictor | OLS Coef | Lasso Coef |
|---|---|---|
| $X_1$ | 0.45 | 0.42 |
| $X_2$ | 0.03 | 0.00 |
| $X_3$ | −0.71 | −0.68 |
a) (4 pts) Which variable was shrunk to zero by Lasso?
b) (6 pts) The Lasso objective adds $\lambda \sum | \beta_j |$. With $\lambda = 0.1$, compute the $L_1$ penalty term using the Lasso coefficients.
The following 5 points are given with their coordinates and class labels:
| Point | $X_1$ | $X_2$ | Class |
|---|---|---|---|
| A | 1 | 1 | Red |
| B | 2 | 1 | Blue |
| C | 5 | 5 | Red |
| D | 6 | 5 | Blue |
| E | 3 | 3 | Red |
A new point is observed at: $P = (3, 3)$.
a) (6 pts) Compute the Euclidean distance from $P$ to each of the 5 points. Then, list the 3 nearest neighbors (in order of increasing distance).
b) (6 pts) Using $k=3$ and majority vote, predict the class of point $P$. Box your final answer.
Show your distance calculations below:
| Point | Distance to $P$ |
|---|---|
| A | ? |
| B | ? |
| C | ? |
| D | ? |
| E | ? |
Predicted class of $P$: ?
Assume $K=2$. At a certain iteration, cluster 1 has 25 observations with centroid at (2, 3) and
cluster 2 has 20 observations with centroid at (8, 7).
New point: P = (5, 4)
a) (6 pts) Find the Euclidean distance from the new point to each centroid.
b) (4 pts) Assign the new point to the closer cluster.
c) (4 pts) After assigning P, can you compute the new centroids? If yes, give the coordinates (rounded to two decimals). If not, explain why.
The table below shows the pairwise Euclidean distances between points A, B, and C:
| A | B | C | |
|---|---|---|---|
| A | – | 2 | 5 |
| B | 2 | – | 3 |
| C | 5 | 3 | – |
a) (6 pts) Using single linkage, which two points are merged first?
b) (4 pts) After the first merge, what is the single-linkage distance between the new cluster and the remaining point?
Parent node: 60 Class 0, 40 Class 1 (total 100).
a) (4 pts) Compute the entropy of the parent node. Use natural logarithm (base $e$). Round to three decimals.
b) (8 pts) A split produces:
Left child: 50 Class 0, 10 Class 1
Right child: 10 Class 0, 30 Class 1
Compute the weighted entropy of the children (round to three decimals).
Hyperplane: $w = (1, -1)$, $b = -3$. Decision function $f(x) = w \cdot x + b$.
a) (6 pts) Compute $f(3,3)$.
b) (4 pts) Predict the label (sign of $f$).
We train a neural network with two inputs $x_1=2$, $x_2=-1$ and a binary output ($Y=0$ or $Y=1$). The architecture has:
Weights and biases:
| From | To | Weight | Bias (at target) |
|---|---|---|---|
| $x_1$ | Hidden | $w_1 = 0.5$ | $b_h = 0.3$ |
| $x_2$ | Hidden | $w_2 = -1.2$ | |
| Hidden | $Y=1$ node | $w_{o1} = 1.5$ | $b_o = 0.0$ |
| Hidden | $Y=0$ node | $w_{o0} = 0.8$ |
Draw the neural network below with all weights and biases clearly labeled. Include input nodes, one hidden node (ReLU), and two output nodes (softmax). Show arrows and label each connection with its weight, and each node with its bias (if any).
a) (4 pts) Compute the pre-activation $z_h$ at the hidden node.
b) (3 pts) Compute the hidden node output after ReLU.
c) (4 pts) Compute the pre-softmax values $a_1$ and $a_0$ at the two output nodes.
d) (5 pts) Apply softmax to get $P(Y=1)$ and $P(Y=0)$. Round to three decimals and box both.
a)
| $X_1$ | $X_2$ | $X_3$ | $y$ | $\hat{y}$ | $(y - \hat{y})^2$ |
|---|---|---|---|---|---|
| 2 | 6 | 0 | 1 | 3.0 | 4 |
| 3 | 5 | 2 | 4 | 4.9 | 0.01 |
| 0 | 3 | 0 | 0.3 | 0.5 | 0.04 |
| Sum = | 4.05 | ||||
b) $$\text{RMSE}=\sqrt{\frac{4.05}{3}}\approx 1.16$$ 1.16
logit = −0.9
a) $$P\approx 0.289$$ 0.289
b) $$\text{Odds}\approx 0.406$$ 0.406
c) $$e^{0.7}\approx 2.014$$ 2.014
a) $X_2$ $X_2$
b) $$0.1 \times 1.10 = 0.11$$ 0.11
a) E (0), B (≈2.236), A/C (≈2.828)
b) Red Red
a) $d_1\approx3.162$, $d_2\approx4.243$
b) Cluster 1 Cluster 1
c) $(2.12, 3.04)$ (2.12, 3.04)
a) A and B A and B
b) 3 3
a) $$H = -0.6 \ln 0.6 - 0.4 \ln 0.4 \approx 0.673$$ 0.673
b) Left: $H_L \approx 0.451$, Right: $H_R \approx 0.562$, Weighted: $0.497$ 0.497
a) $f(3,3) = -3$ -3
b) $-1$ -1
Neural Network Diagram (for reference):
a) $z_h = 0.5·2 + (-1.2)·(-1) + 0.3 = 2.5$ 2.5
b) ReLU(2.5) = 2.5 2.5
c)
$a_1 = 1.5·2.5 + 0.0 = 3.75$
$a_0 = 0.8·2.5 + 0.0 = 2.0$
d)
$e^{3.75} \approx 42.52$, $e^{2.0} \approx 7.39$
Sum = 42.52 + 7.39 = 49.91
$P(Y=1) = 42.52 / 49.91 \approx 0.852$
$P(Y=0) = 7.39 / 49.91 \approx 0.148$