Motivation and Machine Learning (Lesson 3) Part 2


Feature Selection:

Helps you answer the question: "What are the features that are most useful for a given model?" One of the reasons we need to apply this is that the number of features in your original dataset may be very very high.


  • Eliminates irrelevant, redundant and highly correlated features 
  • Reduce dimensionality for increased performance. As many ML models do not cope well on data with very large dimensions(many features).
  • We can improve the situation of having too many features through dimensionality reduction.

Commonly used techniques are:

PCA (Principal Component Analysis)

t-SNE (t-Distributed Stochastic Neighboring Entities)

Feature embedding

Azure ML prebuilt modules:

Filter-based feature selection: identify columns in the input dataset that have the greatest predictive power

Permutation feature importance: determine the best features to use by computing the feature importance scores


Data Drift

Data drift is change in the input data for a model. Over time, data drift causes degradation in the model's performance, as the input data drifts farther and farther from the data on which the model was trained. Monitoring and idenfifying this data drift helps to improve model performance as a lack of doing so allows decreased accuracy over time.


Changes in the upstream process: data input source changed, measurement unit changed, equipment calibration for data gathering changed etc

Changes in the quality of the data: Changes in customer behaviour, changes in seasonality etc

Monitoring for Data Drift

Azure Machine Learning allows you to set up dataset monitors that can alert you about data drift and even take automatic actions to correct data drift.

The process of monitoring for data drift involves:

Specifying a baseline dataset – usually the training dataset

Specifying a target dataset – usually the input data for the model

Comparing these two datasets over time, to monitor for differences

Here are different types of comparisons you might want to make when monitoring for data drift:

Comparing input data vs. training data: This is a proxy for model accuracy; that is, an increased difference between the input vs. training data is likely to result in a decrease in model accuracy.

Comparing different samples of time series data: In this case, you are checking for a difference between one time period and another. For example, a model trained on data collected during one season may perform differently when given data from another time of year. Detecting this seasonal drift in the data will alert you to potential issues with your model's accuracy.


Model Training Basics 

In Model Training, the goal is to be able to give the model a set of input features, X, and have it predict the value of some output feature, y. First establish the type of problem. Is it a classification or regression problem? Decide whether you need to scale or encode the data. Then identify input features needed or create new ones through feature engineering.

Model training is the iterative process that involves selecting the hyperparameters, training the model and then evaluating the model performance. Once you've trained your model, you can now run it on the test dataset to see how it performs.


Given a regression problem modeled as y=bx+c

where y is the expected output and x is the input, the values b and c represent the slope and intercept respectively which in machine learning will be the parameters which will be learnt from the data during model training. Examples: weights, bias, costs etc.


In contrast to parameters, hyperparameters are not values that are learned from the data during training. Rather, they are values that we set before the training. Examples: learning rate, batch size, number of clusters, number of layers for deep network. 

Because we do not know the best values for these before training, we usually start with a best guess, run the training, adjust the parameters and retrain to get optimal hyperparameter values.

Splitting Data:

 Data for ML is typically into three parts:

  • Training data
  • Validation data
  • Test data

Training data is used to learn the values for the parameters. Model's performance is checked on the validation data and we adjust the hyperparameters until the model performs well with the validation data. Finally, we do a final check on our model's performance with the test data which was never seen by our model.


The Process For Model Training on Azure:

Collect Data - Prepare Model - Train Model - Evaluate Model - Deploy Model - Retraining 

Terms to know:

Workspace: This is the very first thing you need to create. It is the the centralized place for working with all the components of the machine learning process.

Experiment: This is just a container that helps you group various artifacts that are related to your machine learning processes

Run: This is one of those artifacts in experiment. It is a process that is delivered and executed in one of the compute resources. Examples: the training/validation of models, the feature engineering codes.

Model registry: This is a service that provides snapshots and versioning for your trained models.

Compute Instances: This refers to a cloud-based workstation that gives you access to various development environments, such as Jupyter Notebooks.


Training Classifiers 

Classification problems are involved when expected outputs are categorical or discrete. There are 3 categories involved:

  1. Binary Classification: This results only in a binary or 2 class value example: 0 and 1, true and false etc. Binary Classification, Spam Email detection and Fraud detection are common applications.
  2. Multi Class Single label classification: Unlike the output from binary classification, the output contains multiple classes(three or more values) example: recognize written numbers from 1-10 or recognize days of the week.
  3. Multt Class Multi Label Classification: Here we have multiple categories, but the output can belong to more than one class. Unlike the previous whose output must belong to a single class, in this case, your output can belong multiple output classes. 
Training Regressors

The regression problem occurs when the output needed is a continuous numerical value.  For example: average time between failure for equipment, time taken for a volcano to erupt, amount of rainfall, stock market prices etc

Evaluating Model Performance
It is not good to simply train a model on all your data and then assume that the model will subsequently perform well on future data. This is why it is important to split off a portion of the data and reserve it for evaluating the model's performance.

"When splitting the available data, it is important to preserve the statistical properties of that data. This means that the data in the training, validation, and test datasets need to have similar statistical properties as the original data to prevent bias in the trained model."

Confusion Matrices

What are they? Why should we care?

Well, a confusion matrix is a table of two rows and two columns and comprises of 4 cells representing True Positives, True Negatives. False Positives and False negatives. A confusion matrix gets its name from the fact that it is easy to see at a galnce whether the model is getting confused and misclassifying the data or whether it performs okay. It is an evaluation technique. 

You will often see the confusion matrix represented in a more general, abstract form that uses the terms positive and negative. 

So the key to understanding it is this:

The first word(True or False) tells you the correctness of the model's prediction, while the second word(positive or negative) tells you the value that was predicted by your model. So for example if you are solving a classification problem of "is an email spam?", Yes and No would be Positive and Negative respectively. A True Positive will mean this: Model predicted Positive and Yes(True) model is correct and email is actually spam.. False Positive will mean this: Model predicted Positive Value but No(False) this is wrong, mail is not spam. A True Negative will mean: Model predicted Negative and Yes(True) model is correct. Email not spam. I want you to pause a while and figure out what False Negative means.

And yep, you got it right. False Negative means your model has predicted Negative, but No(False) this is wrong, so mail is actually spam. 
Evaluation Metrics for Classification

Accuracy: This is the proportion of correct predictions. sum of true positive and true negative divided by sum of all cases.
Precision: Is the proportion of positive cases that were correctly identified. The true positives divided by the sum of the true positive and false positive
Recall:  Is the proportion of  actual positive cases that were correctly identified. The true positives divided by the sum of the true positive and false negative
F1-score: Measures the balance between precision and recall

The formulas are shown here:

Model Evaluation Charts

Model Evaluation Charts are easy ways to get a quick and easy evaluation of a model's performance. One of the most important type of charts used in classification metrics is the ROC Chart. It is graph showing the rate of true positives against the rate of false positives. AUC(Area under the Curve) is the area under the ROC curve. Always falls somewhere between 0.5 and 1. 1 being that 100% of cases were correctly classified.

There is also the Gain and Lift Chart. This deals with rank ordering the prediction probabilities and measures how better your model performs compared with random guessing. 


Strength in Numbers

A single individual model could produce wrong results no matter how well trained it is. It led to the question "what if we train multiple instances of ML models and then somehow capture their collective wisdom to enable us alleviate the limitation of just using individual models?"

So two schools of thought: 
1. Is theoretical, we create algorithms that within their own inner workings produce this multiple training of model instances. This is known as ensemble learning. 
2. Here we try to automate the training of various individual ML algorithms as much as possible. This approach is known as automated machine learning.

Automated ML instead of relying on inner workings of models aims to scale up the process of training models. We combine the results.

Ensemble Learning:
Ensemble learning combines multiple machine learning models to produce one predictive model. There are three main types of ensemble algorithms:

Bagging or bootstrap aggregation

Helps reduce overfitting for models that tend to have high variance (such as decision trees)
Uses random subsampling of the training data to produce a bag of trained models.
The resulting trained models are homogeneous
The final prediction is an average prediction from individual models


Helps reduce bias for models.
In contrast to bagging, boosting uses the same input data to train multiple models using different hyperparameters.
Boosting trains model in sequence by training weak learners one by one, with each new learner correcting errors from previous learners
The final predictions are a weighted average from the individual models


Trains a large number of completely different (heterogeneous) models
Combines the outputs of the individual models into a meta-model that yields more accurate predictions

Automated ML
Automated machine learning, like the name suggests, automates many of the iterative, time-consuming, tasks involved in model development (such as selecting the best features, scaling features optimally, choosing the best algorithms, and tuning hyperparameters). Automated ML allows data scientists, analysts, and developers to build models with greater scale, efficiency, and productivity—all while sustaining model quality.


This was a lot of typing. Let me take a break. Please feel free to add your comments if you have any. Till Next time!


Post a Comment

Leave your comment or share

Popular on this Blog

Why You Should Be Careful With An "I don't Care" Attitude

What Happened To Victor Pride of Bold and Determined?

The Definition Of A True Man

Love Someone with Similar Energy Levels or Expectations

Nothing Comes For Free

Don't Sacrifice Your Own Happiness

The Definition Of A True Woman

Don't Beg For Anything

Thoughts on Freewill and Predestination