### Final Summary Machine Learning with Azure via Udacity - Phase 1 (Motivation and Machine Learning)

**Lesson 4 Revision**

**Supervised Learning Classification**

Recall that in classification the outputs are categorical or discrete.

**Types of Classification Problems:**

1. Classification on Tabular data: where data is in form of rows and columns

2. Classification on Image and Sound data: where training data consists of images or audio sounds

3. Classification on text Data: consists of texts whose categories are known

**Categories of algorithms are:**

1. Two class classification: where the prediction has to be in two categories

2. Multi class classification: where predicted has to generate results having more than 2 categories

There are 2 key algorithms for optimizing Multi class logistic regression, they are:

1. Optimization tolerance

2. Regularization weight

**Optimization tolerance** controls when to stop iterations when improvements between iterations are lost. So if the improvements between iterations go below a specified threshold, the algorithm stops and returns the current model.

**Regularization weight:** recall that regularization is a method for preventing overfitting by penalizing models with extreme coefficient values. Regularization weights are what controls how much to penalize the models at each iteration.

Multi Class Neural Network consists of Input layer, hidden layer(s) and output layer. The relationship between the input and output is learned from training on the input data.

There are 3 key parameter for optimizing this one:

1. Number of hidden nodes

2. Learning rate: controls size of each step taken at each iteration before correction

3. Number of learning iterations: controls the maximum number of times the algorithm should process the training cases

**Multi Class Decision Forest:** it's and ensemble of decision trees.

There are 5 parameters to optimize this:

1. Resampling method: method used to create the individual trees

2. Number of decision trees: specified Max number of decision trees that can be created

3. Maximum depth: the number to limit the maximum depth of any decision tree

4. Number of random splits per node: number of splits to use when building each node of the tree

5. Min number of samples per leaf node: controls the minimum number of cases required to create any terminal node in a leaf.

**Supervised Learning Regression**

Recall that in a regression problem, the output is numerical or continuous.

Common Types of Regression problems are:

1. Regression on Tabular data

2. Regression on image or sound data

3. Regression on text Data

Images and sounds are transformed into a numerical vector which can be accepted by the Algorithms.

Examples: housing prices, forecasting, etc

Common regression models are: linear regression, decision forest regression, neural network regression.

**Automating training of regressors:**

Challenges involved in successfully training and ML model manually include : selecting right features, choosing right algorithms for task, hyperparameter tuning, selecting right evaluation metrics.

**Automated ML** helps you take all those challenges away. It provides the automated exploration of the combinations needed to successfully produce a trained model. It does this by testing multiple algorithms in parallel and returning the best performing ones.

Data scientists and developers can now scale efficiently and achieve higher productivity by focusing on more important ML tasks.

**Semi-Supervised learning**

When fully labelled data can't be obtained or it is too expensive, we can get partially labelled data, that's where Semi-Supervised learning is useful.

It combined traditional supervised learning with unsupervised approaches. There are three approaches it makes to use the advantages provided by the labelled part of the data:

1. **Self training**: it trains model using labelled data part then predicted on unlabelled part to get new values.

2. **Multi view training:** it means you train multiple models on different views of the data. It involves various feature selection and parts of the training data

3. **Self-ensemble training:** a single base model and different hyperparameter settings

**Clustering-Unsupervised Learning**

Clustering is organizing entities from the input data into a finite number of subsets or clusters. Its goal is to maximize the intra-cluster similarity.

**Applications of clustering;**

-personalizations

- fraud detection

- medical imaging

- city planning

Clustering algorithms can be broadly classified into 4 types:

1. Centroid based clustering: organizing data into clusters based on distance of members from the centroid of the cluster.

2. Density based clustering: here the algorithm clusters members that are closely packed together. Has the advantage of learning arbitrary shapes of data.

3. Distribution based clustering: assumes that data has an inherent distribution type such as normal distribution, it then clusters based on the probability distribution.

4. Hierarchical Clustering: here the algorithm builds a tree of clusters based on hierarchy. It is best suited for hierarchical data such as taxonomies.

K-Means is a centroid based algorithm.

These are the steps taken by K means algorithm to handle data in order:

1. Initialize centroids: Randomly initializes the K cluster centroids. Each centroid conceptually represents a cluster.

2. Cluster assignment: it assigns each member (data point) to a cluster. Assignment is based on euclidean distance of members from the centroids. So members are assigned to the closest centroid to them.

3. Move centroid: K means will compute the new cluster centroids based on current cluster membership and centroid locations may change for optimal results.

4. Check for convergence: K-Means checks for convergence using criteria like "by how much did a centroid location change as a result of new membership" or based on a fixed number of other criterions.

If convergence criteria is not met it will keep iterating. But once met, it stops.

Key parameters include:

1. Number of centroids

2. Initialization approach

3. Distance metric

4. Normalize features

5. Assign label mode

6. Number of iterations

Feature learning: transforms sets of inputs into other inputs that are potentially more useful in solving a given problem

Anomaly Detection: identified two major entity groups normal and abnormal for a given dataset.

************

**Lesson 5 Revision**

Classical ML vs DL

All Deep learning algorithms are ML algorithms, ML algorithms are not necessarily deep learning algorithms.

ML is a subset of AI that focuses on creating programs that are capable of learning without explicit instruction.

DL is a specialized class of ML algorithms that are based on artificial Neural network.

Characteristics of DL:

1. Highly effective in learning multidimensional, complex and non linear functions

2. handles massive and large data amounts.

3. Excels with raw unstructured data

4. Learning is computationally expensive as it needs specialized hardware for processing

5. It can learn time related patterns

6. It can be on-par with human capabilities

Deep learning can be applied in: language translation, image recognition, speech recognition, forecasting, autonomous vehicles etc

**Approaches to ML**

Supervised learning: learns from data that contains bother the inputs and expected outputs.

Unsupervised Learning: learns from data that contains only inputs. It then finds the hidden structure in the data

Reinforcement Learning: learns how an agent should take actions in a given environment to maximize a reward function.

Markov Decision process is a framework that can be used to solve many reinforcement learning problems as it does not assume an exact mathematical model.

Recommendation systems have 2 approaches:

1. Content based filtering: this makes use of properties for both features and the items.

2. Collaborative filtering: this uses only identifiers for users and items and does not take into account the properties.

Text Classification

Text embedding is the process of translating text into some kind of numerical representation so model can make sense of it.

TD-IDF means Term Frequency - Inverse document frequency.

Anomaly detection once again is a ML technique concerned with finding data points in datasets that deviate significantly from the norm.

**********

** Lesson 6 Revision**

**Managed Services for ML**

Compute Resources: These are clusters of computers running on the cloud which provide you with raw computing power needed for your workloads.

Managed Services abstracts away these problems:

1. Lengthy installation and setup processes

2. Expertise required to configure hardware

3. Fair amount of troubleshooting

It makes for seamless setup and easy configs for any needed hardware.

Examples of compute Resources are: Training clusters, inferencing clusters, compute instances, attached Compute, local Compute.

Compute target is a designated resource or environment where you run training scripts or host your service deployment.

There are two variations of Compute targets: training Compute targets and inferencing compute targets.

Managed Notebook Environments:

Most popular notebooks are: Jupyter, Databrick notebooks, R-Markdown and Apache Zeppelin.

For basic modelling the steps are followed: you create and experiment, then create a run, you generate a model produced by the run, you use model registry to keep track of all your runs.

**Advanced Modelling:**

The following steps are involved in deploying a trained model

1. Get model file (any format)

2. Get a scoring script (.py format)

3. Create a real time scoring web service

4. Repeat the process each time you re-train the model

Programmatically accessing managed Services:

With Azure ML SDK for Python, you can start training on a local machine and scale out to use Azure ML compute resources. You can train better performing, highly accuracy models.

**************

**Lesson 7: Explainable AI **

**Modern AI: Challenges and Principles**

Why worry about responsible AI?

1. Increasing inequality

2. Weaponization

3. Unintentional Bias

4. Adversarial Attacks

5. Killer Drones

6. Deep Fakes

7. Data Processing

8. Hyper unrealistic expectations

Approaches to solving these problems:

1. **Model Explainability: **About your model's explainability or interpretability

2. **Fairness:** answers the questions, "who is neglected?", "who is misrepresented?"

**Microsoft AI principles:**

There are 6 principles that can be easily remembered with the mnemonic PARFIT which means:

1. **Privacy and Security (P):** AI systems should be secure and respect existing privacy laws

2. **Accountability (A):** Those who design AI systems must be accountable for how their systems operate and must be periodically checking if these systems are operating effectively.

3. **Reliability and Safety(R):** Customers need to trust that AI solution will perform reliably and safely within a clear set of parameters.

4. **Inclusiveness (I): **AI systems should engage, empower and accurately represent people and all scenarios. It should use inclusive design practices to eliminate unintentional Bias.

5. **Transparency (T):** Transparency means that people need to understand how AI decisions which affect their lives are made.

It should be know that Accountability and Transparency are the fundamental principles which ensure the effectiveness of all other principles. More like the foundation holding other principles.

**Opacity** of a function refers to the degree to which the inner workings of a function can be seen, understand and explained. Opacity itself as word means the quality of lacking transparency. So the more opaque a model is, the less you can explain it.

And that's it folks.. The final summary of my Machine learning with Microsoft Azure Phase 1 challenge. It's been awesome making amazing friends and getting the opportunity to serve by virtue of this Scholarship.

See you all later..

*****************

## Comments

## Post a Comment

Leave your comment or share