Motivation and Machine Learning - Part 3
ML is a process for learning functions and models are specific representations of those functions gotten from training data.
Y=f(x) + e
Where y is output, x is input, f is function and e is the irreducible error. ML algorithms learn from a target function F that describes the mapping.
2.26: parametric and non-parametric algorithms
Based on the size and structure of a function ML algorithms try to learn, they can be classified into parametric and non-parametric.
parametric: these algorithms maps into a known functional form. It starts by assuming a form, then learning its coefficients based on that form.
non parametric: these algorithms do not make assumptions regarding the mapping between input and output data, so they are free to learn any functional form from the data.
Benefits of parametric functions: simpler to understand, faster, easy interpretation and requires less training data.
Disadvantages; it's highly constrained, limited complexity and poor fit on larger data.
Benefits of non parametric algorithms: high flexibility, power and high performance.
Disadvantages: needs more training data, slower, tends to overfitting, harder to explain.
2.27 - Classical ML vs Deep Learning (DL)
The difference is: DL represents a category of ML algorithms based on neural networks. All DL algorithms are ML algorithms, but not all ML algorithms are DL algorithms. You can just say DL is a subset of Classical ML.
DL Algorithms have the capacity to learn arbitrarily complex patterns from the data. They have better accuracy and support for larger datasets. It's difficult to explain, lacks transparency and requires a significant amount of compute.
Classical ML works well on small data, can work on low end machines , easier to interpret, doesn't require large amount of compute. In contrast, it doesn't work well on large dataset and not capable of learning complex functions like DL.
The output of classical ML is usually numerical. Output from DL can have multiple formats.
2.28 - Approaches to ML
The following are common approaches to ML:
Supervised Learning: when learning happens from the data that contains both inputs and expected outputs.
Under supervised we have Classification (outputs are categorical), regression (outputs are numerical), similarity Learning (learns from examples using a similarity function), Feature learning (features are learnt using labelled data) and Anomaly Detection (learns from data labelled as normal or abnormal)
The algorithm learns from data that contains only inputs and finds hidden structures in the data.
Under unsupervised we have Clustering (which finds inherent groups or clusters in the data), feature learning (learns more useful features based on unlabelled dataset), Anomaly detection (learns from unlabelled data to detect abnormal patterns in data while assuming most entities are normal)
Reinforcement Learning: Learns how an agent should take actions in a given environment to maximize a reward function. In most cases, the environment in which the agent is acting is modeled as a Markov decision process that does not assume knowledge of an exact mathematical model.
Supervised and Unsupervised Learning are passive approaches, where learning is performed without any actions which could influence what data could be observed in future. Reinforcement Learning is an active approach where the action of the agent influences the environment and the data that could be observed in future.
Reinforcement Learning is great for scenarios like robotics, autonomous driving, game and control theory.
This is my summary for Lesson 2.29 (The trade offs)
The two most common trade offs are:
1.Bias and Variance
Bias: This measures how incorrect a model prediction is in comparison with the true output. It is brought by erroneous assumptions made in the ML process to simplify the model and make target function easier to learn.
It's simply error from the erroneous assumptions in the learning algorithm. Bias point to the accuracy of our predictions. A high bias means the prediction will be inaccurate.
Bias can also be looked at as the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data.
Parametric algorithms because they assume form are prone to high bias!
Variance: This measures how much the target function will change if different training data is used. In other words, variance refers to an algorithm's sensitivity to fluctuations in a given training dataset.
Variance can be seen as the algorithm's tendency to learn random things irrespective of the real signal by fitting highly flexible models that follow the error or noise in data too closely.
This concept might still be confusing so remember this: If you are most likely to judge someone who you've not met based on few things you know about them(parameter) like tribe, country, skin color, gender, religion etc. You are highly biased . This means you do not take into consideration all information about the person but make assumptions about them based on preconceived form or structure.
If you are very likely to change your opinion of someone because of something new you discover about them, you have high variance. Example, If you see someone you knew as a doctor wearing soldier uniform and you quickly say they have joined the army. You have high variance. It could be they were just wearing the uniform for acting purposes or for just that day.
In ML, there's usually a battle to balance out Variance and bias. Parametric and linear algorithms have high bias and low variance (because they assume a form). Non parametric and non linear algorithms have low bias and high variance (because they don't assume form, but change quickly as data changes).
Prediction error is the sum of irreducible error, variance error and bias error. Irreducible error is independent of the algorithm we use. It comes from data collection.
2. Over-fitting and Under-fitting
Overfitting: is when model fits the training data very well, but fails to generalize. This is high variance and low bias.
Under-fitting: is when model neither fits training data nor generalize to new data. This shows high bias(incorrect predictions) and low variance.
Methods to prevent Overfitting and Under-fitting:
k-fold cross validation: it splits training set into k subsets and trains the model k times. It uses another subset to validate during each training. Helpful when your dataset is too small
Simplify: you can use fewer layers or neurons.
Early stopping: if performance is no longer improving after a num of iterations, stop training.
Reduce dimensionality: it redesigns training data into smaller dimensions to decrease model complexity.
Phew! That wraps up my summary and revision for lesson 2.