\[ \textrm{dist}(\mathbf{x}, \mathbf{y}) = \lVert \mathbf{x} - \mathbf{y} \rVert \]
A
looks like B
but not vice versaA
like B
, B
like C
, but A
not like C
at all”Assign data points to the closest cluster center
Change the cluster center to the average of the assigned points
Repeat until convergence
https://commons.wikimedia.org/wiki/User:Chire
Objective: \(\min_{\mu}\min_{C}\sum_{i=1}^k \sum_{x\in C_i} \lVert x - \mu_i \rVert^2\)
Kmeans takes an alternating optimization approach. Each step is guaranteed to decrease the objective—thus guaranteed to converge.
Local optima dependent on how the problem was specified:
How should we define “closest” for clusters with multiple elements?
Mouse tumor data from Hastie et al.
n_clusters
: Number of clusters to forminit
: Initialization methodn_init
: Number of initializationsmax_iter
: Maximum steps algorithm can takesklearn.cluster.AgglomerativeClustering
n_clusters
: Number of clusters to returnmetric
: Distance metric to uselinkage
: Linkage metric to usehttps://en.wikipedia.org/wiki/Iris_flower_data_set