0 on Machine Learning
A beginner walkthrough journey of learning Machine Learning.
Dear Human Readers,
First, let me clear up a few things. In this 0 on Machine Learning (0ML) series, I'm just going to write the things that I'm going to learn in my journey. So, this is not going to be the expert guide or tutorial. Maybe In the beginning I understood some concepts in some way but later it may turn out to be a different context.
About Me
Now talking about me, I'm Sarankumar. I'm a self-thought Developer for the past half-decade. So far, I worked on Android, Flutter, and Node.js. My way of learning is just to make my hands dirty and go with the flow method. This may be suitable for some or not. I have zero knowledge about ML (Yeah, I had AI in my CS degree and I'm from India dot(.)).
With my previous experience of learning things from scratch, I've learned one thing. That's I need to admit that I don't know ML, and be open-minded. When I cross 4-5% of this journey I will definitely gonna say that "Is ML that simple?" and when I cross 20% "Oh God, Is ML have this many concepts?" when I reach 40-50% I will be ready to take any problem and gonna solve that anyway. And ya nobody is gonna reach 100% because it will expand every day and we need to catch those new things.
Let's Begin
The first step, I'm gonna do is explore the concepts in Machine Learning.
On YouTube, I found a great video by Daniel Bourke - Machine Learning Roadmap.
Yeah, you're right it's 2.4 hours long and I watched it completely. I didn't binge watched that. It took me a week to watch that in my free time.
Here are a few take away from that video.
Data is everything. We are going to play a lot with data and numbers in this journey.
We are not going to write instructions for computers to do things. Instead, we are going to train them to find patterns in the data that we are given.
Don't complicate everything to add some fancy ML word to your landing page. If you can solve your problem without ML then don't use ML.
Problem Domains
What kind of problems we can solve with ML?
Generally, Machine learning problems are divided into two categories. 1. Supervised Learning, 2. Unsupervised Learning. So far, what I understood is that in a supervised model, all features are labeled meanwhile in an unsupervised not. By the way, here features mean a column of data. i.e. in general programming a variable of our Data class or Object. If we have a User Data class then their age variable is featured here. Apart from these below are detailed topics or problem domains.
Classification - Problems related to classifying things. i.e. will the given input come under the category or not? Like, does someone have heart disease or not based on their medical records. Is this picture of a cat or not? Most likely yes or no with the possible percentage is the final output.
Regression - As I understood Regression is used for finding the relationship between the variables to predict an output. Most likely can be used with historical data and predict what will happen next time period. Like In previous times, food for e.g. Briyani may be priced differently in a city based on a lot of reasons (like the year, season, meat price, and demand). Regression will find a relation between those variables (reasons) to predict the output.
Clustering - Basically, clustering is a classification of unlabeled data. Yes, It comes under unsupervised learning. What clustering will do is group the data based on their similarity with others. Like, If you're going to a mall there you will find all kinds of items. There it can easily find the similarity between the product of Levis, Louis Philip, Peter England, and also LG, Samsung, Apple, and Sony. It will group them as Fashion, Electronics and they are cluster. Inside a cluster, there may be sub-clusters like Mobile, TV, and Laptop. It can be used in a recommendation engine to recommend what people with similar interests or location watch.
Dimensionality Reduction - Actually, I am really confused about this. Is this a model? Or is it a technique? So far what I learned is the purpose of DR is used to reduce the feature (a.k.a input) set. But why? Machines can process 100 different input and their relation and predict their output. But how it's the result given to humans to understand? If we want those results in a 2d image then the most we can put is 2 items. That's it. Definitely need to look into these a lot in near future.
and we may use deep learning for problems like Natural Language Processing, Sentiment Analysis in a sentence.
Another thing is, supervised learning is mainly used for single or independent items (Like, Classification and Regression). On the other hand, Unsupervised is when we need or have data for a group of items (like using clustering for a recommendation engine).
The cycle of Machine Learning Project
We are gonna look at 6 important steps in an ML project.
Data Collection
Data Preparation
Train a Model
Analyze / Evaluate
Serve / Deploy Model
Retrain Model
This section is pretty easily understandable things just by their names and with this clean image. First, we need to collect data. The bare minimum we would need is 50 - 100 items. Some problems would need millions, billions even trillions of data to get the best result. As I previously said Data is the Key.
Then we need to prepare that data. What do we even need to prepare on that? Some data might be incomplete in those we would want to remove or fill in a mean value or average. So, the end result will come with high accuracy. That's called feature imputation.
Then in Model Training first need to choose an algorithm based on the problem and data that we have. There is no one-stop solution algorithm to solve all problems in a problem domain. Even in each problem domain, we need to use a different algorithm based on the data we have and the solution we need.
Analyze / Evaluate - This includes how much it takes to predict the output and the cost needed to train. Based on that we would need to improve those areas. In a self-driving car case if it takes 10 seconds to identify any obstacle then that is the worst thing.
Deploy Model - Here we come to the final step i.e. deploy our trained model to production. Here come the terms like MLOps - a DevOps for Machine Learning and all other cloud tools. Since I'm familiar with DevOps I'm not gonna cover much on this right now. We can dive into the ML-specific product later on our journey.
Re-train - Yes, it's not over yet. As usual, to work for a long time and be more accurate everything needs to be updated. Collect more data, result, and re-train the model.
That's it for this week. From next week onwards I'm gonna look into Python. Yeah, everyone's recommended language to get started with ML. Obviously, I haven't worked much on Python.
You can find me at:
LinkedIn - https://linkedin.com/in/sarankumar-ns
Github - https://github.com/sarankumar-ns
Who we are
RedLeaf Softs Pvt. Ltd. is an IT Company Based in Thoothukudi. We provide Mobile application development services. To Learn More about us.
Get 15 minutes of Free Consultation on how to get your ideas into the Real World.