Hierarchical Clustering

This post is a lecture summary of  course by University of Washington : https://www.coursera.org/learn/ml-clustering-and-retrieval/home/welcome

 

Why one might want to use hierarchical clustering ?

  • A common problem is clustering is to predefined no of clusters (k)
    • Hierarchical clustering does not need that
  • Deprogram provides effective visualization of clusters at various levels without re-running the algorithm
  • Any distance metric can work, while k-mean required euclidean distance [0][1]
  • We can construct more complex shape compared to k-means/EM

 

Divisive clustering

  • Example algorithm is recursive k-mean
  • We first cluster with k = 2 and then recursive apply k-means on these two clustersdivisive_clustering
  • Various choices we need to make :
    • Which algorithm to use (k-means)
    • How many split at each level (k=2, 3)
    • When to stop? Here are some heuristics:
      • Number of points in a cluster falls below certain threshold (Minimum point in each cluster)
      • Distance to farthest point falls below certain threshold (Minimum cluster radius)
      • Until per-specified no of clusters are reached

 

Agglomerative clustering

  • Example would single linkage algorithm
  • We start with assigning each point to its own cluster
  • Next task is how to merge this cluster
    • How do we define the distance between cluster?
      • Linkage function
        • It gives us two points from each cluster
        • Single linkage given points which has minimum distance across all possible pairs
        • Cluster A has points (a1, a2, a3) and B has (b1, b2, b3)
        • Linkage will give say (a2, b3) if it has the minimum distance across all combination
      • Distance Function
        • Can be euclidean or cosine anything
    • We merge the cluster which has minimum distance
  • Dendrogram provides efficient visualization [4]
    • Data-points of x-axis carefully ordered
    • Height (y-axis) represents distance between two cluster as per specified linkage
    • It gives us the complete visualization of which nodes were merged first followed by which one etc.
  • How to cut dendrogram ? [4]
    • Suppose we are cutting the dendrogram at y=D*
    • There are no pair of cluster which has distance < D* and which is not merged
    • All the pairs of clusters having distance < D* are merged
    • D* is the minimum distance between cluster at this level
    • Distance between clusters identified > D*
  • Various linkage functions [2][3]
    • Complete linkage takes two farthest points
    • Average linkage considers all pairs and takes an average
    • Ward linkage focus on increasing sum of squares
    • These linkage however brings down the shaping flexibility
      • Single linage would allow for complex shapes in general
      • Single linkage however can result in chaining
  • Choice we need to make in agglomerative clustering
    • Which linkage function to use
    • Which distance metric to use
    • Should we cut dendrogram flat or some weird cut

dendrogram

  • Complexity
    • For brute-force it is O(N^2 log N)
    • We can also prune search space by kd-trees or using triangular inequality
    • Best know is O(N^2) [5]

 

Linkage Functions

linkage

References

[0]: https://stats.stackexchange.com/questions/81481/why-does-k-means-clustering-algorithm-use-only-euclidean-distance-metric

[1]: https://pdfs.semanticscholar.org/a630/316f9c98839098747007753a9bb6d05f752e.pdf

[2] : https://www.saedsayad.com/clustering_hierarchical.htm

[3] : https://stats.stackexchange.com/questions/195446/choosing-the-right-linkage-method-for-hierarchical-clustering

[4] : http://www.econ.upf.edu/~michael/stanford/maeb7.pdf

[5] : https://www.cs.ucsb.edu/~veronika/MAE/summary_SLINK_Sibson72.pdf

Leave a comment