\[ \textrm{dist}(\mathbf{x}, \mathbf{y}) = \lVert \mathbf{x} - \mathbf{y} \rVert \]
A looks like B but not vice versaA like B, B like C, but A not like C at all”


Assign data points to the closest cluster center
Change the cluster center to the average of the assigned points
Repeat until convergence
https://commons.wikimedia.org/wiki/User:Chire
Objective: \(\min_{\mu}\min_{C}\sum_{i=1}^k \sum_{x\in C_i} \lVert x - \mu_i \rVert^2\)
Kmeans takes an alternating optimization approach. Each step is guaranteed to decrease the objective—thus guaranteed to converge.

Local optima dependent on how the problem was specified:

How should we define “closest” for clusters with multiple elements?
Mouse tumor data from Hastie et al.
n_clusters: Number of clusters to forminit: Initialization methodn_init: Number of initializationsmax_iter: Maximum steps algorithm can takesklearn.cluster.AgglomerativeClustering
n_clusters: Number of clusters to returnmetric: Distance metric to uselinkage: Linkage metric to usehttps://en.wikipedia.org/wiki/Iris_flower_data_set