K- Means Clusters Example

K- Means Clusters Example

Problem Statement

We have collected data on several different models of cars with respect to several parameters. Use a Kmeans analysis to cluster the different vehicles together.

How to perform analysis

Step 1: Open Sigma Magic
  1. Click on the Sigma Magic button on the Excel toolbar.
  2. Click on the New button to create a new project.
Step 2: Add the analysis template
  1. Click on the Tool Wizard to add the analysis template.
  2. Click on Analytics and then on K-Means Clusters.


Step 3: Specify analysis options
A new worksheet will be added to your workbook. Analysis Setup will be automatically openedin the setup tab specify the survey results.


Click on Data to specify the data required for this analysis.



Click the Verify tab to ensure all the inputs are okay and shown in a green checkmark.



Step 4: Generate analysis result
Click OK and then click Compute Outputs to get the final results.




Interpretation of Results

1.  Cluster Formation:
  • The K-Means algorithm identified 3 clusters in the dataset.
  • Cluster sizes are 7, 9, and 16 data points respectively, indicating an uneven distribution.

2.  Distance Metric & Algorithm:

  • The clustering is based on Euclidean distance, which measures similarity between points.
  • The Hartigan-Wong algorithm (Auto-selected) was used for clustering.

3.  Cluster Visualization:

  • The cluster plot shows well-separated groups, suggesting a reasonable clustering structure.
  • Overlapping regions indicate some similarity between certain data points in different clusters.

4.  Within and Between Cluster Sum of Squares (SSQ):

  • Within-cluster SSQ (sum of squared distances within each cluster) values are 11816.35, 46582.69, and 32462.91.
  • Between-cluster SSQ is 53139.2, meaning a significant portion of variance is due to differences between clusters.
  • A lower within-cluster SSQ and higher between-cluster SSQ indicate well-separated clusters.

5.  Elbow Method Interpretation:

  • The "Optimal Number of Clusters" plot (Elbow curve) suggests 3 clusters, as the curve shows a noticeable bend at k=3k = 3k=3.
  • This validates the choice of k=3k = 3k=3, meaning further increasing clusters may not significantly improve the model.

6.  Conclusion & Business Insight:

  • The dataset has been effectively segmented into three groups.
  • Depending on the application (e.g., customer segmentation, market research), each cluster could represent distinct categories based on common characteristics.
  • Further interpretation requires understanding the feature variables used in clustering.  
    • Related Articles

    • K-Means Clusters Overview

      K-Means is an unsupervised machine learning algorithm used for clustering data into distinct groups based on similarity. It is widely used in pattern recognition, market segmentation, and anomaly detection. How K-Means Works Initialize Clusters: ...
    • K means frequently asked questions

      What is K-Means clustering in? K-Means clustering is an unsupervised machine learning technique available in Sigma Magic that groups similar data points into kkk clusters by minimizing intra-cluster variance. How does Sigma Magic perform K-Means ...
    • Hierarchical Clusters Overview

      Hierarchical clustering is a clustering algorithm that builds a hierarchy of clusters through a tree-like structure called a dendrogram. It is widely used for exploratory data analysis and pattern recognition. Types of Hierarchical Clustering ...
    • Hierarchical Clusters Example

      Problem Statement We have collected data on several different models of cars with respect to several parameters. Create a dendogram that displays the hierarchical relationship between the vehicles. How to perform analysis Step 1: Open Sigma Magic ...
    • Hierarchical clusters frequently asked questions

      What is Hierarchical Clustering? Hierarchical clustering in Sigma Magic is a data analysis technique used to group similar data points into a hierarchy of clusters. It builds a tree-like structure called a dendrogram to visualize relationships ...