Support Vector Machine (SVM): Hand‑Calculated Example + Theory
Linear hard‑margin SVM – step‑by‑step + equal axes + full rationale
Step 1 – Dataset
| Point | \(\mathbf{x}_i\) | \(y_i\) |
| 1 | (1,1) | +1 |
| 2 | (2,2) | +1 |
| 3 | (0,0) | -1 |
| 4 | (-1,-1) | -1 |
Step 2 – Objective
\[ y_i (\mathbf{w} \cdot \mathbf{x}_i + b) \geq 1 \quad \forall i \]
\[ \text{minimize } \frac{1}{2} \|\mathbf{w}\|^2 \]
Step 3 – Hyperplane Guess
\[ x_1 + x_2 - 1 = 0 \quad\Rightarrow\quad
\mathbf{w} = (1,1),\ b = -1 \]
Step 4 – Support Vectors
- \(\mathbf{x}_1 = (1,1),\ y_1 = +1\) → support vector
- \(\mathbf{x}_3 = (0,0),\ y_3 = -1\) → support vector
Step 5 – Dual Solution
\[ \alpha_1 = \alpha_3 = 1 \quad\Rightarrow\quad
\mathbf{w} = (1,1),\ b = -1 \]
Step 6 – Margin
\[ \|\mathbf{w}\| = \sqrt{2},\quad
\text{Margin} = \frac{2}{\sqrt{2}} = \sqrt{2} \]
Step 7 – Plot (Equal Axes)
Step 8 – Test Point
Point \((1.5, 0)\):
\[ \mathbf{w} \cdot (1.5, 0) + b = 1.5 - 1 = 0.5 > 0 \quad\Rightarrow\quad +1 \]
Final Answer
Hyperplane: \(x_1 + x_2 - 1 = 0\)
\(\mathbf{w} = (1,1),\ b = -1\)
Support vectors: \((1,1)\) and \((0,0)\)
Margin: \(\sqrt{2}\)
Test point \((1.5, 0)\) → class +1
Rationale Behind the Hand‑Calculated Example
Why this dataset? Four points on two parallel lines in opposite quadrants → intuitive geometry, simple algebra.
Hard‑margin: Perfect separation allows strict constraints → dual is a 2‑variable QP, solvable by hand.
Support vectors: Only \((1,1)\) and \((0,0)\) lie on margin planes → others have \(\alpha_i = 0\).
Dual: \(\alpha_1 = \alpha_3\), objective \(2\alpha - \alpha^2 \to \alpha = 1\).
Margin: \(\|\mathbf{w}\| = \sqrt{2} \Rightarrow\) width \(\sqrt{2}\), matches distance between \(x_1 + x_2 = 0\) and \(x_1 + x_2 = 2\).
General Rationale Behind SVM for Classification
Core Idea: Find the maximum‑margin hyperplane that separates classes. Why maximum margin?
- Geometric intuition: The farthest possible decision boundary from both classes → most robust to noise.
- Statistical learning theory: Larger margin → lower generalization error (Vapnik‑Chervonenkis theory).
- Optimization: Maximizing margin = minimizing \(\|\mathbf{w}\|^2\) → convex quadratic program → unique global solution.
- Reference: https://en.wikipedia.org/wiki/Support_vector_machine
Key Components
| Concept | Purpose |
| Support Vectors | Only points on the margin matter. SVM is sparse — ignores non‑support data. |
| Dual Form | Depends only on dot products \(\mathbf{x}_i \cdot \mathbf{x}_j\) → enables kernel trick for non‑linear data. |
| Hard vs Soft Margin | Hard: assumes perfect separation. Soft (C‑parameter): allows misclassification for robustness. |
Why SVM Works So Well
- Regularization via margin: Controls model complexity without explicit feature selection.
- Kernel flexibility: Maps data to high‑dimensional space where classes become separable.
- Theoretical guarantees: Bounds on test error based on margin and support vector count.
In one sentence:
SVM finds the safest possible decision boundary by pushing it as far from the data as possible, using only the most critical points (support vectors) to define it.