UNSPUPERVISED LEARNING: CLUSTERING

k-means

This section describes an IoTPy agent that implements the k-means algorithm applied to a sliding window of a stream. The output of the agent is a partition of points in the window into clusters. For a description of the k-means algorithm see https://en.wikipedia.org/wiki/K-means_clustering. This code was written by Rahul Bachal with Mani Chandy.

Clustering points in a sliding window of a stream is a straightforward application of the clustering algorithm to each window. The algorithm uses the centroids of the clusters in each window as the initial estimates of the centroids in the following window. When the step size is small compared to the window size, the centroids of successive windows are likely to be close to each other. So, the algorithm converges more rapidly by starting with the previous window’s centroids rather than by starting with random points.

Many machine-learning algorithms operating on streams can use the results from a stream as a starting estimate for an extension of the stream, and thus converge more rapidly that starting with other estimates.

The IoTPy code is a straightforward application of the map_window agent.

def kmeans_sliding_windows(
        in_stream, out_stream, window_size, step_size, 
        num_clusters):
    kmeans_object = KMeansForSlidingWindows(num_clusters)
    map_window(
        kmeans_object.func, in_stream, out_stream,
        window_size, step_size)

num_clusters is the number of clusters. The object, kmeans_object, keeps track of the centroids of the most recent window into the stream. The function, kmeans_object.func, computes the clusters of the next window by starting with the centroids of the previous window.