Tutorial | Numel Ai

Tutorials

What are Machine Learning Models?

Machine learning models are algorithms that learn from data to make predictions or find patterns. They analyze past information and apply it to new data to classify objects, predict outcomes, or group similar items.

There are three main types of machine learning models:

Classification – Predicts categories or labels (e.g., spam vs. not spam).
Regression – Predicts continuous numerical values (e.g., house prices).
Clustering – Groups similar data points without predefined labels (e.g., customer segmentation).

These models are widely used in applications like fraud detection, medical diagnosis, and recommendation systems. 🚀

Types of Machine Learning Models

Model	Definition:	When to Use:	Real-World Examples:	Popular Algorithms:
Classification	Classification models predict categorical outcomes by assigning data to predefined labels or classes.	When the outcome is discrete or belongs to specific categories. When you need to determine "which group" a data point falls into.	Spam Detection (Spam vs. Not Spam) Credit Approval (Approve/Reject) Disease Diagnosis (Healthy vs. Diseased) Sentiment Analysis (Positive, Neutral, Negative)	Logistic Regression Decision Trees & Random Forest Naive Bayes Support Vector Machines (SVM) k-Nearest Neighbors (kNN) Neural Networks (for multi-class classification)
Regression	Regression models predict continuous numerical outputs. The target variable has an infinite range of possible values.	When the outcome is numerical or continuous. When you need to predict a specific value.	House Price Prediction (based on size, location, etc.) Stock Market Forecasting Estimating Insurance Premiums Predicting Weather Metrics	Linear Regression Decision Trees & Random Forest (for regression) Support Vector Regression (SVR) Ridge and Lasso Regression Gradient Boosting (XGBoost, LightGBM) Neural Networks (for time-series or complex regression tasks)
Clustering	Clustering is an unsupervised learning technique that groups similar data points together. Unlike classification, clustering does not require predefined labels—the model discovers patterns in the data.	When you don’t have labeled data. When you need to find hidden patterns or groupings.	Customer Segmentation (based on purchasing behavior) Document Clustering (grouping similar documents) Image Segmentation (grouping pixels with similar colors/textures)	K-Means Clustering Hierarchical Clustering DBSCAN (Density-Based Clustering) Gaussian Mixture Models (GMM)

Subcategories of Machine Learning Models

Model	What it Does	Example	What You Tweak	Key Hyperparameters	Input Data	Output
Regression Models (Predict Continuous Values)
Simple Linear Regression	Draws a straight line to predict one value based on another.	Predicting sales based on ad spend.	The slope of the line.	None	Numeric (1 independent, 1 dependent variable)	Numeric (predicted value)
Multiple Linear Regression	Uses multiple inputs to predict a single value.	Car price prediction (based on age, mileage, etc.).	Choosing the right inputs.	None	Numeric (multiple independent variables)	Numeric (predicted value)
Decision Tree Regression	Uses decision rules to predict numbers.	House price prediction based on size.	Tree depth, split criteria.	max_depth, min_samples_split, criterion	Numeric or categorical (labeled data)	Numeric (predicted value)
Random Forest Regression	Uses multiple decision trees for better accuracy.	Predicting monthly electricity usage.	Number of trees, tree depth.	n_estimators, max_depth, criterion	Numeric or categorical (labeled data)	Numeric (predicted value)
KNN Regression	Averages the values of the closest data points.	Predicting house prices based on nearby houses.	Number of neighbors.	n_neighbors, weights, metric	Numeric or categorical (labeled data)	Numeric (predicted value)
SVM Regression	Predicts numbers within a margin (similar to classification).	Predicting rental prices with a margin for error.	Same as SVM classification with margin settings.	C, kernel, gamma, epsilon	Numeric (labeled data with target variable)	Numeric (predicted value within a margin)
Classification Models (Predict Categorical Outputs)
Logistic Regression	Predicts probability for binary classification (yes/no).	Will a loan be approved?	Feature weights.	penalty, C, solver	Numeric (binary labeled data)	Categorical (0 or 1)
Decision Tree Classification	Uses decision rules to classify data.	Spam vs. Not Spam	Tree depth, splitting rules.	max_depth, min_samples_split, criterion	Numeric or categorical (labeled data)	Categorical (predicted class)
Random Forest Classification	Uses multiple trees to improve classification accuracy.	Diagnosing diseases (e.g., flu or not flu).	Number of trees, tree depth.	n_estimators, max_depth, criterion	Numeric or categorical (labeled data)	Categorical (predicted class)
KNN Classification	Assigns a category based on nearest neighbors.	Identifying fruit type based on attributes.	Number of neighbors.	n_neighbors, weights, metric	Numeric or categorical (labeled data)	Categorical (predicted class)
SVM Classification	Finds the best boundary to separate categories.	Sorting emails into spam or not spam.	Shape of the boundary, margin.	C, kernel, gamma	Numeric or categorical (labeled data)	Categorical (predicted class)
Clustering Models (Unsupervised Learning)
K-Means Clustering	Groups numerical data into clusters based on similarity.	Customer segmentation by spending habits.	Number of clusters.	n_clusters, init, max_iter, tol	Numeric (unlabeled data)	Categorical (cluster assignments)
K-Modes Clustering	Groups categorical data into clusters based on similarity.	Customer segmentation based on categorical traits like gender, job.	Number of clusters, initialization method	n_clusters, init, n_init	Categorical	Categorical (cluster assignments)
K-Prototypes Clustering	Groups mixed data (numerical + categorical) into clusters based on similarity.	Customer segmentation by age, salary, and job type.	Number of clusters, initialization method	n_clusters, init, random_state	Mixed (numerical + categorical data)	Categorical (cluster assignments)
Text Processing
Text Classification (Natural Language Processing or NLP)	Groups text into categories.	Grouping product reviews as positive, negative, or neutral.	Preprocessing, vectorization, model type	vectorizer, model_type, max_features	Raw or preprocessed text	Predicted class labels (e.g., "positive")

Choosing Between Classification, Regression, and Clustering

To determine the right model, consider the following:

What type of target variable do you have?
- Categorical (labels): Use classification (e.g., spam vs. not spam).
- Continuous (numerical values): Use regression (e.g., predicting house prices).
- No predefined target variable: Use clustering to discover patterns and create labels.
What question are you trying to answer?
- Predicting a category? → Classification
- Predicting a numerical value? → Regression
- Grouping similar data points? → Clustering

How to use Data Prep

Step 1: Upload a Data File

Click the "Upload CSV or Excel file" input field.
Select a CSV or Excel file from your device.

Click the "Upload" button to send the file for processing.

Step 2: View Uploaded Data

After uploading, you will see:

Data Information: Metadata about the dataset (e.g., column names, data types).
Data Preview - Head: Displays the first few rows of the dataset.
Data Preview - Tail: Displays the last few rows of the dataset.
Null Values: A list of columns with null values and their counts.
Unique Value Counts: A list of columns and the number of unique values they contain.
Duplicate Rows: The count of duplicate rows in the dataset.

Step 3: Perform Operations on the Data

Add Column: Specify the column name and either a static value or a derived expression (e.g., col1 + col2).
Handle Null Values: Choose a strategy (drop rows or fill null values) and provide a fill value if needed.
Change Data Type: Specify the column name and the new data type (e.g., int, float, or str).
Sort Column: Specify the column and sort order (ascending or descending).
Find and Replace: Specify the column, the value to find, and the value to replace it with.

Concatenate: Combine the files vertically.
Merge: Merge the files based on a specific column. You’ll need to:

Step 4: Export Processed Data

If you've processed the data and want to download it:
In the Export Data section, choose whether to export the data (Yes/No).
Select the export format (CSV or Excel).
Click the Download button to save the processed file to your computer.

How to use Data Explore

Step 1: Upload a Data File

Click the "Upload CSV or Excel file" input field.
Select a CSV or Excel file from your device.
Click the "Upload" button to send the file for processing.

Step 2: View Data Analysis Results

After submitting, the page will refresh and display various insights about your data:

Data Information:

Number of rows and columns.
Dataset structure (e.g., data types, null values, etc.)

Dataset Preview:

A preview of the first 5 and last 5 rows of the dataset.

Summary Statistics:

Statistical summaries like mean, median, etc., for numerical columns.

Null Values:

Columns with missing values and their respective counts.

Unique Values:

Count of unique values in each column.

Duplicates:

Number of duplicate rows in the dataset.

Skewness and Kurtosis:

Measures of data distribution for numerical columns.

Step 3: Generate Graphs

Select Graph Type:
Options include scatter plot, line graph, bar graph, pair plot, box plot, and histogram.

Choose X and Y Columns:
Use the dropdowns to select the column(s) for the X and (optional) Y axes.
Y-axis is optional for certain graph types like histograms.

Generate the Graph:
Click "Generate Graph" to produce the visualization.

If successful, the graph will appear below this section.

Build an AI Model

Choose the AI model that you'd like to build. You may choose from the various Classification, Regression and Clustering models available with us.
In this example we are using Simple Linear Regression Model to explain the UI and functionality of the platform. However, all models are designed similarly and you will have no trouble following the steps, with the exception of Hyperparameter tuning, which is different for each AI model. Please see the AI Models Info page for more information on Hyperparatmers. In case you are not familiar with this, do not worry as you the platform is design to function even if you don't key in the hyperparameters. You may use the default as is to build your AI Model.

Simple Linear Regression

Step 1: Upload a Data File

Select a file (Excel or CSV format) using the "Select File" input.
Enter the number of rows to display in the "Number of Rows to Display" field (default is 5).
Click the "Upload" button.

This will send the file to the server for processing.
After uploading, you will see the data's information (e.g., structure, column names) and a sample preview.

Step 2: Train the Model

Input Independent Variable Columns:
Specify which columns from your dataset will be used as input features.
Example: Enter 0,1,2 for the 1st, 2nd, and 3rd columns.

Import Note: The column number always starts with 0. Refer to the column number in Data Info section for clarity

Output Variable Column (Y): Specify the column representing the target variable. Example: Enter 3 for the 4th column.

Scaling and Encoding (Optional): Choose a scaling or encoding method from the dropdown.
Options include "Standard Scalar," "MinMax Scalar," or encoding techniques like "OneHot Encoder"
If encoding is selected, specify the column to be encoded in the "Enter Column for Encoding" field.

Test Size: Define the proportion of data to be used for testing (e.g., 0.25 for 25%).

n_jobs (Optional): Specify the number of parallel jobs to use during model training (default is 0 for no parallelism).

Evaluation Method:
Select how you want the model to be evaluated:
Options include RMSE, R-squared, Coefficients, Intercept, etc.

Click the "Train Model" button.

This triggers the server to train the model using the specified parameters.
If errors occur, they will be displayed in the "Model Evaluation" section.

Step 3: Make Predictions

Ensure the model is trained successfully.
If successful, the "Predict with Real Data" section will become visible.
This means that the model has been trained and ready for use with new data (unseen data)

Upload a file which has new data and for which you want the trained model to predict the output:

Select a file (Excel or CSV format) containing the data for which you want predictions.
Enter the input variable columns for prediction (e.g., 1,2,3).

Note: If you are using one of the Encodings, the encoding will be applied automatically to the same input columns as in the model training, above step.

Click the "Make Prediction" button.

Predictions will be generated and displayed for the top 5 rows.
To see the full data, proceed to export the predictions.

Step 4: Export Predictions

Choose whether to export predictions:
Select "Yes" from the "Export Choice" dropdown if you want to save the predictions.
Select Export Format:
Choose "CSV" or "Excel" as the file format.
Click the "Download" button.

This will download the prediction results in the chosen format.