Tutorials
What are Machine Learning Models?
Machine learning models are algorithms that learn from data to make predictions or find patterns. They analyze past information and apply it to new data to classify objects, predict outcomes, or group similar items.
There are three main types of machine learning models:
- Classification – Predicts categories or labels (e.g., spam vs. not spam).
- Regression – Predicts continuous numerical values (e.g., house prices).
- Clustering – Groups similar data points without predefined labels (e.g., customer segmentation).
These models are widely used in applications like fraud detection, medical diagnosis, and recommendation systems. 🚀
Types of Machine Learning Models
Model | Definition: | When to Use: | Real-World Examples: | Popular Algorithms: |
---|---|---|---|---|
Classification | Classification models predict categorical outcomes by assigning data to predefined labels or classes. |
|
|
|
Regression | Regression models predict continuous numerical outputs. The target variable has an infinite range of possible values. |
|
|
|
Clustering | Clustering is an unsupervised learning technique that groups similar data points together. Unlike classification, clustering does not require predefined labels—the model discovers patterns in the data. |
|
|
|
Subcategories of Machine Learning Models
Model | What it Does | Example | What You Tweak | Key Hyperparameters | Input Data | Output |
---|---|---|---|---|---|---|
Regression Models (Predict Continuous Values) | ||||||
Simple Linear Regression | Draws a straight line to predict one value based on another. | Predicting sales based on ad spend. | The slope of the line. | None | Numeric (1 independent, 1 dependent variable) | Numeric (predicted value) |
Multiple Linear Regression | Uses multiple inputs to predict a single value. | Car price prediction (based on age, mileage, etc.). | Choosing the right inputs. | None | Numeric (multiple independent variables) | Numeric (predicted value) |
Decision Tree Regression | Uses decision rules to predict numbers. | House price prediction based on size. | Tree depth, split criteria. | max_depth, min_samples_split, criterion | Numeric or categorical (labeled data) | Numeric (predicted value) |
Random Forest Regression | Uses multiple decision trees for better accuracy. | Predicting monthly electricity usage. | Number of trees, tree depth. | n_estimators, max_depth, criterion | Numeric or categorical (labeled data) | Numeric (predicted value) |
KNN Regression | Averages the values of the closest data points. | Predicting house prices based on nearby houses. | Number of neighbors. | n_neighbors, weights, metric | Numeric or categorical (labeled data) | Numeric (predicted value) |
SVM Regression | Predicts numbers within a margin (similar to classification). | Predicting rental prices with a margin for error. | Same as SVM classification with margin settings. | C, kernel, gamma, epsilon | Numeric (labeled data with target variable) | Numeric (predicted value within a margin) |
Classification Models (Predict Categorical Outputs) | ||||||
Logistic Regression | Predicts probability for binary classification (yes/no). | Will a loan be approved? | Feature weights. | penalty, C, solver | Numeric (binary labeled data) | Categorical (0 or 1) |
Decision Tree Classification | Uses decision rules to classify data. | Spam vs. Not Spam | Tree depth, splitting rules. | max_depth, min_samples_split, criterion | Numeric or categorical (labeled data) | Categorical (predicted class) |
Random Forest Classification | Uses multiple trees to improve classification accuracy. | Diagnosing diseases (e.g., flu or not flu). | Number of trees, tree depth. | n_estimators, max_depth, criterion | Numeric or categorical (labeled data) | Categorical (predicted class) |
KNN Classification | Assigns a category based on nearest neighbors. | Identifying fruit type based on attributes. | Number of neighbors. | n_neighbors, weights, metric | Numeric or categorical (labeled data) | Categorical (predicted class) |
SVM Classification | Finds the best boundary to separate categories. | Sorting emails into spam or not spam. | Shape of the boundary, margin. | C, kernel, gamma | Numeric or categorical (labeled data) | Categorical (predicted class) |
Clustering Models (Unsupervised Learning) | ||||||
K-Means Clustering | Groups numerical data into clusters based on similarity. | Customer segmentation by spending habits. | Number of clusters. | n_clusters, init, max_iter, tol | Numeric (unlabeled data) | Categorical (cluster assignments) |
K-Modes Clustering | Groups categorical data into clusters based on similarity. | Customer segmentation based on categorical traits like gender, job. | Number of clusters, initialization method | n_clusters, init, n_init | Categorical | Categorical (cluster assignments) |
K-Prototypes Clustering | Groups mixed data (numerical + categorical) into clusters based on similarity. | Customer segmentation by age, salary, and job type. | Number of clusters, initialization method | n_clusters, init, random_state | Mixed (numerical + categorical data) | Categorical (cluster assignments) |
Text Processing | ||||||
Text Classification (Natural Language Processing or NLP) | Groups text into categories. | Grouping product reviews as positive, negative, or neutral. | Preprocessing, vectorization, model type | vectorizer, model_type, max_features | Raw or preprocessed text | Predicted class labels (e.g., "positive") |
Choosing Between Classification, Regression, and Clustering
To determine the right model, consider the following:
- What type of target variable do you have?
- Categorical (labels): Use classification (e.g., spam vs. not spam).
- Continuous (numerical values): Use regression (e.g., predicting house prices).
- No predefined target variable: Use clustering to discover patterns and create labels.
- What question are you trying to answer?
- Predicting a category? → Classification
- Predicting a numerical value? → Regression
- Grouping similar data points? → Clustering
How to use Data Prep
Step 1: Upload a Data File
Click the "Upload CSV or Excel file" input field.
Select a CSV or Excel file from your device.
Click the "Upload" button to send the file for processing.
Step 2: View Uploaded Data
After uploading, you will see:- Data Information: Metadata about the dataset (e.g., column names, data types).
- Data Preview - Head: Displays the first few rows of the dataset.
- Data Preview - Tail: Displays the last few rows of the dataset.
- Null Values: A list of columns with null values and their counts.
- Unique Value Counts: A list of columns and the number of unique values they contain.
- Duplicate Rows: The count of duplicate rows in the dataset.
- Add Column: Specify the column name and either a static value or a derived expression (e.g., col1 + col2).
- Handle Null Values: Choose a strategy (drop rows or fill null values) and provide a fill value if needed.
- Change Data Type: Specify the column name and the new data type (e.g., int, float, or str).
- Sort Column: Specify the column and sort order (ascending or descending).
- Find and Replace: Specify the column, the value to find, and the value to replace it with.
- Concatenate: Combine the files vertically.
- Merge: Merge the files based on a specific column. You’ll need to: Provide the column name to merge on.
Step 3: Perform Operations on the Data
Use the dropdown menu to choose an operation (e.g., remove duplicates, change data type, handle null values, etc.).Depending on your selection, additional input fields will appear. Fill in the required details:
Click the Apply Operation button to execute the selected operation.
To Concatenate or Merge Files:
Use the Concatenate or Merge Files section if you want to combine more data files to the initially uploaded data file
Upload using the Choose File button. You may choose multiple files.
Choose an operation:
Select a merge type (inner, outer, left, or right).
Click the Upload and Process button to process the files.
Step 4: Export Processed Data
If you've processed the data and want to download it:In the Export Data section, choose whether to export the data (Yes/No).
Select the export format (CSV or Excel).
Click the Download button to save the processed file to your computer.
How to use Data Explore
Step 1: Upload a Data File
Click the "Upload CSV or Excel file" input field.Select a CSV or Excel file from your device.
Click the "Upload" button to send the file for processing.
Step 2: View Data Analysis Results
After submitting, the page will refresh and display various insights about your data:Data Information:
- Number of rows and columns.
- Dataset structure (e.g., data types, null values, etc.)
Dataset Preview:
- A preview of the first 5 and last 5 rows of the dataset.
Summary Statistics:
- Statistical summaries like mean, median, etc., for numerical columns.
Null Values:
- Columns with missing values and their respective counts.
Unique Values:
- Count of unique values in each column.
Duplicates:
- Number of duplicate rows in the dataset.
Skewness and Kurtosis:
- Measures of data distribution for numerical columns.
Step 3: Generate Graphs
Select Graph Type:Options include scatter plot, line graph, bar graph, pair plot, box plot, and histogram.
Choose X and Y Columns:
Use the dropdowns to select the column(s) for the X and (optional) Y axes.
Y-axis is optional for certain graph types like histograms.
Generate the Graph:
Click "Generate Graph" to produce the visualization.
If successful, the graph will appear below this section.
Build an AI Model
Choose the AI model that you'd like to build. You may choose from the various Classification, Regression and Clustering models available with us.In this example we are using Simple Linear Regression Model to explain the UI and functionality of the platform. However, all models are designed similarly and you will have no trouble following the steps, with the exception of Hyperparameter tuning, which is different for each AI model. Please see the AI Models Info page for more information on Hyperparatmers. In case you are not familiar with this, do not worry as you the platform is design to function even if you don't key in the hyperparameters. You may use the default as is to build your AI Model.
Simple Linear Regression
Step 1: Upload a Data File
Select a file (Excel or CSV format) using the "Select File" input.Enter the number of rows to display in the "Number of Rows to Display" field (default is 5).
Click the "Upload" button.
This will send the file to the server for processing.
After uploading, you will see the data's information (e.g., structure, column names) and a sample preview.
Step 2: Train the Model
Input Independent Variable Columns:Specify which columns from your dataset will be used as input features.
Example: Enter 0,1,2 for the 1st, 2nd, and 3rd columns.
Import Note: The column number always starts with 0. Refer to the column number in Data Info section for clarity
Output Variable Column (Y): Specify the column representing the target variable. Example: Enter 3 for the 4th column.
Scaling and Encoding (Optional): Choose a scaling or encoding method from the dropdown.
Options include "Standard Scalar," "MinMax Scalar," or encoding techniques like "OneHot Encoder"
If encoding is selected, specify the column to be encoded in the "Enter Column for Encoding" field.
Test Size: Define the proportion of data to be used for testing (e.g., 0.25 for 25%).
n_jobs (Optional): Specify the number of parallel jobs to use during model training (default is 0 for no parallelism).
Evaluation Method:
Select how you want the model to be evaluated:
Options include RMSE, R-squared, Coefficients, Intercept, etc.
Click the "Train Model" button.
This triggers the server to train the model using the specified parameters.
If errors occur, they will be displayed in the "Model Evaluation" section.
Step 3: Make Predictions
Ensure the model is trained successfully.If successful, the "Predict with Real Data" section will become visible.
This means that the model has been trained and ready for use with new data (unseen data)
Upload a file which has new data and for which you want the trained model to predict the output:
Select a file (Excel or CSV format) containing the data for which you want predictions.
Enter the input variable columns for prediction (e.g., 1,2,3).
Note: If you are using one of the Encodings, the encoding will be applied automatically to the same input columns as in the model training, above step.
Click the "Make Prediction" button.
Predictions will be generated and displayed for the top 5 rows.
To see the full data, proceed to export the predictions.
Step 4: Export Predictions
Choose whether to export predictions:Select "Yes" from the "Export Choice" dropdown if you want to save the predictions.
Select Export Format:
Choose "CSV" or "Excel" as the file format.
Click the "Download" button.
This will download the prediction results in the chosen format.