Skip to main content

Featured post

The Active Ones Take It All

Hey, you! Yes.. you! Are you still delaying that wonderful idea you may have been nursing for a while now? Have you been hesitating on starting that business, journey, career, course, or work you have  to do? Have you identified a favorable opportunity, but you've not been able to utilize it because you're thinking too much about it? Then this article is for you. I want you to bear this at the back of your mind: "The active ones take it all." Life offers everything to the ones who are active. Life doesn't care about your intention or what you're thinking of doing. It cares about what you're doing! Let's say there are two people who intend to start a similar business, let's say it's a small restaurant. One of them has been nursing the idea for a long time and is very passionate about it. He keeps thinking and thinking of how to start up the business and get everything ready but has done nothing yet. The other one also nurses the idea

Motivation and Machine Learning (Lesson 3) Part 1

One of Machine Learning's most important process is model training. This is the process in which we transform data into trained ML models, hence its importance. 

Before we train our model, it  is important we master data handling,  data preparation and data management because proper data is is a key ingredient for successful ML models.

Issues like high bias, classification problems, poor performance are often related to problems on the data itself. So it's really crucial to feed proper, accurate, clean and high quality data into our machine learning models for training.

Model training is a core process in Machine learning that allows us to build, train and check the quality of ML models. 

Data wrangling is the process through which we clean data, restructure it and enrich the data to transform it into a format that is much more suitable for the training process of Machine Learning Algorithms.


Managing data for machine learning work on Azure needs us to understand 2 important concepts:

1. Datastore:

It helps you connect in a secure way with the storage that keeps your data. It stores and hides away connection information needed for you to do that. It works like a layer of abstraction that provides isolation from the various supported data storages in Azure. 

2. Dataset: 

This helps you get access to specific data in your datastore. It points to specific sets of files that contain either the train, validation of test data which we use in ML processing. 

Datastores have a feature known as compute location independence. It means the data store can be accessed simultaneously by various compute instances and even shared. 

Datasets can be created from local file, Azure Datasets, public url and etc. But it's important to note that it will be stored in a datastore (usually the default one), no matter how you create it. 

It's important to note that datasets are merely references that point to data in your datastore, not copies of the data itself. 

Data versioning helps us benchmark the state of our data and helps us know what version of a given dataset we have used to train a model. 


Introducing features:

Features can also be referred to as columns, properties, fields or even variables in a table. Rows can be referred to as cases, instances, observations or records

Feature engineering is a key part of data preparation. It helps you create new values based on the value of existing features. This increases the power of machine learning algorithms and the new values created can make your models perform best.

Dimensionality reduction is a form of feature engineering that helps you adapt the shape and structure of given data to a form that can be accommodated by a Machine Learning Algorithm. 

These are the concepts I've learnt so far. Will release part 2 soon. 


Popular Posts