Model Selection Example

Model Selection Example

Problem Statement

Use the multiple models to predict the values for gear based on the other variables. The data for the exercise is shown in the Data tab.

How to perform analysis

Step 1: Open Sigma Magic
  1. Click on the Sigma Magic button on the Excel toolbar.
  2. Click on the New button to create a new project.
Step 2: Add the analysis template
  1. Click on the Tool Wizard to add the analysis template.
  2. Click on Analytics and then Model Selection.

Step 3: Specify analysis options
A new worksheet will be added to your workbook. Analysis Setup will be automatically openedin the setup tab specify the survey results.


Click on Data to specify the data required for this analysis. 


Click on the Train the software will let you pick the options for training the given model. Training is a step where we split the data into groups a train data set and a test data set.


Click the Verify tab to ensure all the inputs are okay and shown in a green checkmark. 



Step 4: Generate analysis result
Click OK and then click Compute Outputs to get the final results.


Interpretation of Results

  • The Parallel Random Forest model (M1) has been identified as the best-performing model.
  • It achieved an accuracy of 81.37% and a Kappa statistic of 72.71%, which indicates strong classification agreement.
  • Three models were compared:
    • M1 (Parallel Random Forest) → 81.37% Accuracy, 72.71% Kappa
    • M2 (Random Generalized Boosted Model) → 81.12% Accuracy, 54.74% Kappa
    • M3 (Random Classification and Regression Tree) → 80.39% Accuracy, 57.42% Kappa
  • M1 performed the best in both accuracy and Kappa, making it the most reliable choice.
  • The model was trained as a classification problem.
  • Features used: gear, carb, vs, am, disp, qsec.
  • Resampling methods were used to improve model robustness.
  • Statistical differences between models were analyzed using hypothesis testing.
  • The lowest variation and best statistical fit was found in the Parallel Random Forest model.
  • The faults section suggests some model assumptions were tested.
  • 84 resampling bootstraps were used to validate model stability.
  • The resampling process was randomly initialized with different seeds (e.g., 23, 29).
  • Hyperparameters were tuned, ensuring optimized performance.
  • Parallel Random Forest was finalized as the best model.
  • It achieved an accuracy of 81.37%, which is the highest among the compared models.
  • The Kappa statistic of 72.71% indicates a strong classification agreement.
  • Further improvements could involve fine-tuning the number of trees, depth, and feature selection for better accuracy.
    • Related Articles

    • Model Selection Overview

      Model selection is the process of choosing the best statistical, machine learning, or econometric model from a set of candidate models based on their performance on given data. It is crucial for ensuring that a model generalizes well to unseen data ...
    • Model selection frequently asked questions

      What is Model Selection? Model selection refers to the process of choosing the most appropriate statistical or machine learning model based on a given dataset and specific business objectives. The tool provides various techniques such as regression ...
    • Boosted Model Example

      Problem Statement Use the XGBoost model to predict the values for gear based on the other variables. The data for the exercise is shown in the Data tab. How to perform analysis Step 1: Open Sigma Magic Click on the Sigma Magic button on the Excel ...
    • Bayesian Model Example

      Problem Statement Use the Naïve Bayes approach to estimate the values for the missing cyl values (last 2 rows). How to perform analysis Step 1: Open Sigma Magic Click on the Sigma Magic button on the Excel toolbar. Click on the New button to create a ...
    • Coaching Model Example

      Problem Statement Use the coaching model to coach a person on one issue. Pick a coachee who has an issue he/she wants to work with and use this form to help coach this person on the issue they are facing. How to perform analysis Step 1: Open Sigma ...