Clustering is a unsupervised algorithm that looks at data points and automatically finds data points that are related or similar to each other. It is very similar to classification problems, like Logistic Regression, in supervised learning. However, clustering (and unsupervised learning as a whole) does not provide the target labels $y$. The algorithm is tasked with finding the relationship between the data instead.

Untitled

Clustering is useful where the data being classified is less intuitive to predict, such as grouping similar news articles, where it is hard to predict what article contents will be printed in the future.

Defining the K-means Algorithm

Initializing K-means