Pre-Process Data Example

Problem Statement

While collecting data on Length, Width, Height, we are missing one value. Can you pre-process this data to address this issue.

a) Delete this record and make the set complete

b) Use the central value to estimate the missing value

Step 1: Open Sigma Magic

Step 2: Add the analysis template

Step 3: Specify analysis options

A new worksheet will be added to your workbook. Analysis Setup will be automatically opened, in the setup tab specify the survey results.

Click on Data to specify the data required for this analysis.

Click the Verify tab to ensure all the inputs are okay and shown in a green checkmark.

Step 4: Generate analysis result

Click OK and then click Compute Outputs to get the final results.

1. Presence of Missing or Invalid Data (NA Columns):

Some columns labeled as NA might indicate missing or placeholder values.
Certain rows contain irregular values like -1.5, which could be an error or outlier.
These missing or incorrect values may need imputation or removal.

2. Repetitive Data Patterns:

Some rows have identical values in the NA columns, such as 3707553304 and 7864787372621, which appear multiple times.
This suggests either redundancy in data collection or duplication in entries.

3. Possible Outliers in Height Column:

The Height column values are mostly 10, but some have 10.5 or 11.
This variation might be acceptable, but further analysis is needed to determine if it is significant or an anomaly.

4. Format and Structure Issues:

The dataset is structured in a tabular format, but some values appear inconsistent.
Checking data types (numeric or categorical) is necessary to ensure proper analysis.

5. Potential Data Cleaning Required:

Data validation is needed to confirm whether NA columns contain useful information or should be removed.
If NA represents missing values, imputation (mean/median/mode) or deletion may be required.

6. Need for Normalization and Transformation:

If numerical columns have large-scale differences (e.g., NA values appearing as large numbers), scaling techniques like Min-Max or Standardization may be necessary.
Encoding categorical variables (if any) should also be considered before applying machine learning models.

Related Articles
Pre process data frequently asked questions
What is Pre-Process Data ? Pre-Process Data in Sigma Magic is a feature used to clean, transform, and prepare raw data for analysis. It includes handling missing values, outliers, duplicates, normalization, and encoding to ensure data quality and ...
Pre-Process Data Overview
Data preprocessing is a crucial step in data analysis and machine learning, as raw data often contains inconsistencies, missing values, and noise that can impact model performance. The process involves several key steps: 1. Data Collection Gathering ...
Control Plan Example
Problem Statement Select a project that you have just completed and create a control plan for this project. # Critical Process Variables 1 All employees trained in the new process 2 Employee performance management scores How to perform analysis Step ...
Process Mapping Example
Problem Statement Create a process map for the credit approval process. The key process steps are shown below. Feel free to add other steps as appropriate. # Key Steps in Process 1 Receive a call 2 Check if the credit is ok 3 If credit is ok, approve ...
VSM Example
Problem Statement Create a VSM for the sub-assembly process with the following details given in the attachment. How to perform analysis Step 1: Open Sigma Magic Click on the Sigma Magic button on the Excel toolbar. Click on the New button to create a ...