K- Means Clusters Example
Problem Statement
We have collected data on several different models of cars with respect to several parameters. Use a Kmeans analysis to cluster the different vehicles together.
Step 1: Open Sigma Magic
- Click on the Sigma Magic button on the Excel toolbar.
- Click on the New button to create a new project.
Step 2: Add the analysis template
- Click on the Tool Wizard to add the analysis template.
- Click on Analytics and then on K-Means Clusters.
Step 3: Specify analysis options
A new worksheet will be added to your workbook. Analysis Setup will be automatically opened, in the setup tab specify the survey results.

Click on Data to specify the data required for this analysis.

Click the Verify tab to ensure all the inputs are okay and shown in a green checkmark.
Step 4: Generate analysis result
Click OK and then click Compute Outputs to get the final results.
Interpretation of Results
1. Cluster
Formation:
- The
K-Means algorithm identified 3 clusters in the dataset.
- Cluster
sizes are 7, 9, and 16 data points respectively, indicating an uneven
distribution.
2. Distance Metric
& Algorithm:
- The
clustering is based on Euclidean distance, which measures similarity
between points.
- The Hartigan-Wong
algorithm (Auto-selected) was used for clustering.
3. Cluster
Visualization:
- The cluster
plot shows well-separated groups, suggesting a reasonable clustering
structure.
- Overlapping
regions indicate some similarity between certain data points in different
clusters.
4. Within and
Between Cluster Sum of Squares (SSQ):
- Within-cluster
SSQ (sum of squared distances within each cluster) values are 11816.35,
46582.69, and 32462.91.
- Between-cluster
SSQ is 53139.2, meaning a significant portion of variance is due to
differences between clusters.
- A lower
within-cluster SSQ and higher between-cluster SSQ indicate well-separated
clusters.
5. Elbow Method
Interpretation:
- The "Optimal
Number of Clusters" plot (Elbow curve) suggests 3 clusters, as the
curve shows a noticeable bend at k=3k = 3k=3.
- This
validates the choice of k=3k = 3k=3, meaning further increasing clusters
may not significantly improve the model.
6. Conclusion
& Business Insight:
- The
dataset has been effectively segmented into three groups.
- Depending
on the application (e.g., customer segmentation, market research), each
cluster could represent distinct categories based on common
characteristics.
- Further
interpretation requires understanding the feature variables used in
clustering.
Related Articles
K-Means Clusters Overview
K-Means is an unsupervised machine learning algorithm used for clustering data into distinct groups based on similarity. It is widely used in pattern recognition, market segmentation, and anomaly detection. How K-Means Works Initialize Clusters: ...
K means frequently asked questions
What is K-Means clustering in? K-Means clustering is an unsupervised machine learning technique available in Sigma Magic that groups similar data points into kkk clusters by minimizing intra-cluster variance. How does Sigma Magic perform K-Means ...
Hierarchical Clusters Overview
Hierarchical clustering is a clustering algorithm that builds a hierarchy of clusters through a tree-like structure called a dendrogram. It is widely used for exploratory data analysis and pattern recognition. Types of Hierarchical Clustering ...
Hierarchical Clusters Example
Problem Statement We have collected data on several different models of cars with respect to several parameters. Create a dendogram that displays the hierarchical relationship between the vehicles. How to perform analysis Step 1: Open Sigma Magic ...
Hierarchical clusters frequently asked questions
What is Hierarchical Clustering? Hierarchical clustering in Sigma Magic is a data analysis technique used to group similar data points into a hierarchy of clusters. It builds a tree-like structure called a dendrogram to visualize relationships ...