K-means Implementation

Professor
蔡崇煒

Members
4103056002 杜 杰
4103056029 黃 
4103056041 陳仲彥
4104056009 洪浩祐

2017/04/18

Iris

Original Data

Our result

Drawback

Animation

Procedure

randomly chose k centroids.

assign each observation to the nearest centroid (cluster).

calculate new means to be the centroids in the new clusters.

Initialization:

Assignment:

 

        Update:

Initialization

random.shuffle(observations)

centers = dict((c,[c]) for c in observations[:k])

centers[observations[k-1]] += observations[k:]

Assignment

for j in centers:
        for x in centers[j]:
            best = min(centers, key=lambda c: dist(x,c))
            new_centers[best] += [x]

Update

for j in centers:
        new_centers[mean(centers[j])] = centers[j]

Watch again!

\sum_{j=1}^{k} \sum_{ x_i \in Sj} \left \| x_i-\mu_{j} \right \|^{2}
j=1kxiSjxiμj2\sum_{j=1}^{k} \sum_{ x_i \in Sj} \left \| x_i-\mu_{j} \right \|^{2}

Within-cluster Sum of Squares (WCSS)

Variance

Variance - Iteration chart

¡Gracias!

K-means Implementation

By jd615645

K-means Implementation

105-2 Data Mining & Machine Learning

  • 784