Showing posts from September, 2020

Final Summary Machine Learning with Azure via Udacity - Phase 1 (Motivation and Machine Learning)

Lesson 4 Revision Supervised Learning Classification Recall that in classification the outputs are categorical or discrete.  Types of Classification Problems: 1. Classification on Tabular data: where data is in form of rows and columns 2. Classification on Image and Sound data: where training data consists of images or audio sounds 3. Classification on text Data: consists of texts whose categories are known Categories of algorithms are: 1. Two class classification: where the prediction has to be in two categories 2. Multi class classification: where predicted has to generate results having more than 2 categories There are 2 key algorithms for optimizing Multi class logistic regression, they are: 1. Optimization tolerance 2. Regularization weight Optimization tolerance controls when to stop iterations when improvements between iterations are lost. So if the improvements between iterations go below a specified threshold, the algorithm stops and returns the current model. Regularization

Motivation and Machine Learning (Lesson 3) Part 2

  Feature Selection: Helps you answer the question: "What are the features that are most useful for a given model?" One of the reasons we need to apply this is that the number of features in your original dataset may be very very high. Benefits:  Eliminates irrelevant, redundant and highly correlated features  Reduce dimensionality for increased performance. As many ML models do not cope well on data with very large dimensions(many features). We can improve the situation of having too many features through dimensionality reduction. Commonly used techniques are: PCA (Principal Component Analysis) t-SNE (t-Distributed Stochastic Neighboring Entities) Feature embedding Azure ML prebuilt modules: Filter-based feature selection: identify columns in the input dataset that have the greatest predictive power Permutation feature importance: determine the best features to use by computing the feature importance scores ********************* Data Drift Data drift is change in the input d