Motivation and Machine Learning (Lesson 3) Part 1
One of Machine Learning's most important process is model training. This is the process in which we transform data into trained ML models, hence its importance.
Before we train our model, it is important we master data handling, data preparation and data management because proper data is is a key ingredient for successful ML models.
Issues like high bias, classification problems, poor performance are often related to problems on the data itself. So it's really crucial to feed proper, accurate, clean and high quality data into our machine learning models for training.
Model training is a core process in Machine learning that allows us to build, train and check the quality of ML models.
Data wrangling is the process through which we clean data, restructure it and enrich the data to transform it into a format that is much more suitable for the training process of Machine Learning Algorithms.
Managing data for machine learning work on Azure needs us to understand 2 important concepts:
It helps you connect in a secure way with the storage that keeps your data. It stores and hides away connection information needed for you to do that. It works like a layer of abstraction that provides isolation from the various supported data storages in Azure.
This helps you get access to specific data in your datastore. It points to specific sets of files that contain either the train, validation of test data which we use in ML processing.
Datastores have a feature known as compute location independence. It means the data store can be accessed simultaneously by various compute instances and even shared.
Datasets can be created from local file, Azure Datasets, public url and etc. But it's important to note that it will be stored in a datastore (usually the default one), no matter how you create it.
It's important to note that datasets are merely references that point to data in your datastore, not copies of the data itself.
Data versioning helps us benchmark the state of our data and helps us know what version of a given dataset we have used to train a model.
Features can also be referred to as columns, properties, fields or even variables in a table. Rows can be referred to as cases, instances, observations or records.
Feature engineering is a key part of data preparation. It helps you create new values based on the value of existing features. This increases the power of machine learning algorithms and the new values created can make your models perform best.
Dimensionality reduction is a form of feature engineering that helps you adapt the shape and structure of given data to a form that can be accommodated by a Machine Learning Algorithm.
These are the concepts I've learnt so far. Will release part 2 soon.