Supervised vs Unsupervised Learning
Machine Learning is the art and science of training machines with data without explicitly programming them. The Machine Learning algorithms can be classified into Supervised and Unsupervised Learning algorithms.
In Supervised Learning, the model is presented with input and desired output. The model learns from this data and is then presented with new data to make predictions i.e. either classify them or predict numeric values.
In contrast, the Unsupervised models work on their own to analyze the data provided to them and find underlying patterns and relationships between the input/output variables without any prior training of data.
The main difference between both is that a Supervised model learns from the data by making predictions iteratively and adjusting for the correct answer thereby making it more reliable than an Unsupervised model which has no idea what the values for the output might be.
Explain the working behind both the algorithms.
For example, the model is given a dataset say, fruits, it gathers all the information like shape, size, taste, and name of the fruit and learns about each type of data. Once the model gets trained, it is presented with a new fruit (test data), using the training data, the model will predict the name of the fruit by analyzing all the information learned previously.
Unlike Supervised Learning, Unsupervised Learning exhibits self-organization. The algorithm is provided with unlabeled and unclassified data and is expected to find hidden features from the data. It involves the grouping of data based on similarities, patterns, or differences without any guidance.
For example, the model is fed with unlabeled data of cats and dogs. The model divides the data into two different clusters without knowing their labels. Now, when it is presented with a cat's image (test data), based on the knowledge gathered, it will put it into the cat's cluster.
Explain Supervised and Unsupervised Learning with real-life examples.
We can take a real-life example of a baby and a family dog. She recognizes her family dog. When she is introduced to a new dog, she identifies it as a dog because it has the same features (4 legs, 2 ears, 2 eyes) as her dog. This is Unsupervised Learning. If it would have been Supervised Learning, a family member would have told her that it's a dog.
Supervised Learning use cases include:
- Classifying whether an email is Spam or not
- Predicting house prices
- Predicting weather conditions
- Predicting whether a Customer is happy with the service or not
Unsupervised Learning use cases include:
What types of problems are solved using Supervised and Unsupervised Learning Algorithms?
Supervised Learning can be broadly classified into Classification and Regression problems.
- Classification problems use algorithms to allot the data into categories such as true-false or some specific categories like apple-oranges etc. Classification of an email as Spam or not is an example. Support Vector Machine and Decision Tree, etc are some classification algorithms.
- Regression problems use algorithms to predict some numerical value or find a relation between input and output variables. Weather Forecasting is a regression example. Linear Regression is a regression algorithm.
Unsupervised Learning can be categorized into Clustering, Association and Dimensionality Reduction .
- Clustering involves grouping the data into clusters based on similarities or differences. Market Segmentation is a clustering example. K-Means Clustering is one of the clustering algorithms.
- Association determines the set of items that occur together in the dataset. A real-life example is Market Basket Analysis. Apriori, Eclat, F-P Growth are some association algorithms.
- Dimensionality Reduction refers to reduce the dimension of a dataset so that they can be easily processed by supervised algorithms. There are two steps involved i.e. feature selection and feature extraction.
What are the Advantages and Disadvantages of using Supervised Learning over Unsupervised Learning?
The results produced by Supervised models tend to be more accurate and reliable than those produced by Unsupervised models since the data of supervised is well known and labeled.
Supervised Learning can get more specific about the data and output which is not possible in the case of Unsupervised Learning as it's the job of the model to group the data and find the hidden patterns.
The results of supervised model can be ascertained as there is prior knowledge about the classes invovled. Contrarily, unsupervised output cannot be ascertained.
Supervised Learning model is considered a little complex as compared to Unsupervised model as the Supervised model might require a lot of computation time to train and the labels for input/output variables require expertise. This is why unsupervised techniques are preferred.
Supervised algorithms are not suitable for handling complex tasks which Unsupervised algorithms handle perfectly. For example, if the test data is a little different from training data, the supervised algorithm will yield incorrect results.
How is Supervised Learning different from Unsupervised Learning?
Some key differences are as follows:
- Goals: The goal of Supervised Learning is to train the model with labeled data so that it predicts correct output when given test data whereas the goal of Unsupervised Learning is to process large chunks of data to find out interesting insights, patterns, and correlations present in the data.
- Output Feedback: Supervised Learning has a direct feedback mechanism as the machine is trained with labeled data whereas there's no feedback mechanism involved in the case of Unsupervised Learning as the model is unaware of output.
- Complexity: Unsupervised models are more complex as they need large data to produce outputs/ insights as compared to Supervised models.
- Applications: Supervised algorithms are great for Spam Detection, Weather Forecasting, Price Predictions, etc. On the other hand, Unsupervised algorithms work well for Anomaly Detection, Recommendation Engines, and Medical Imaging, etc.
How to decide whether to use Supervised or Unsupervised Learning for a particular task?
To decide on which Machine Learning model to choose, one needs to answer the following questions:
- Input Data: Is the data labeled or unlabeled? For Supervised algorithms to work, entire data needs to be labeled and Unsupervised algorithms work best with unlabeled data.
- Goals: The problem to be solved is defined or not? Supervised algorithms can work only if the problem is well-defined but Unsupervised algorithms will be able to find hidden patterns and insights from the data and generate new problems.
- Algorithms: Does the dimensionality of your data coincide with that of existing algorithms? Will the algorithm support the amount of data you have?
So, Supervised Learning gives quite accurate results but fails to perform on large data. Contrarily, Unsupervised algorithms can handle large data but can yield inaccurate results.
What to do when the data has both labeled and unlabeled data items?
When the data has both labeled and unlabeled data items, Semi-Supervised Learning can be preferred. Semi-Supervised Learning integrates both Supervised and Unsupervised approaches. It was introduced to counter the expensive costs of acquiring labeled data. So, this model is trained on a large chunk of unlabeled data combined with a small chunk of labeled data.
Semi-Supervised Learning uses Unsupervised Learning approach to cluster similar data and then trains the model on different batches of data labeling the unlabeled data. These labels are called Pseudo Labels. The model is now trained on the combination of pseudo labeled and labeled data which leads to improvement in the model's accuracy.
Semi-Supervised Learning can ideal for the medical field wherein the doctor's efforts combined with that of the machine can lead to better accuracy. For example, a Radiologist can manually label some CT scans for tumors which can increase the machine's accuracy of predicting the right patients who need medical attention.
What is self-supervised learning and how it related to supervised and unsupervised learning?
Self Supervised Learning is based on artificial neural network. It obtains supervisory signals from data itself predicting the hidden part of input from unhidden part of input. First, it recognises the pseudo labels and then uses supervised and unsupervised learning for further task.
Self-Supervised Learning is considered as autonomous form of Supervised Learning where there is no need of labeled input/outputs. It is like an extension to Unsupervised Learning but does not involve grouping or clustering. Alike Semi-Supervised Learning, it depends on manually labeled data. However, it does not require any prior labeled data as Semi-Supervised Learning does.
Self-Supervised Learning automates the process of labeling the data minimizing the cost of obtaining labeled data for Supervised Learning. Self-Supervised Learning applications include speech recognition, voice recognition, face detection, 3-d rotations etc.
Real Life examples include:
- Wav2vec, a Facebook algorithm used to perform speech recognition.
- BERT, Bidirectional Encoder Representations from Transformers, a Google algorithm used to understand the context of search queries better.
- Arora, Vishal. 2019. "Supervised, Unsupervised and Semi-supervised ML." amazonaws. Accessed 2022-01-07.
- Arora, Surbhi. 2020. "Supervised vs Unsupervised vs Reinforcement." Aitude. Accessed 2022-01-07.
- Baheti, Pragati. 2021. "Supervised vs. Unsupervised Learning: What’s the Difference?" v7labs. Accessed 2022-01-09.
- Bhattacharyya, Jayita. 2020. "Pseudo Labelling – A Guide To Semi-Supervised Learning." aim. Accessed 2022-01-09.
- Brownlee, Jason. 2016. "Supervised and Unsupervised Machine Learning Algorithms." Machinelearningmastery. Accessed 2022-01-07.
- Brownlee, Jason. 2021. "What Is Semi-Supervised Learning." Machine Learning Mastery. Accessed 2022-01-11.
- Delua, Julianna. 2021. "Supervised vs.Unsupervised Learning: What's the Difference?" IBM. Accessed 2022-01-07.
- Foote, Keith. 2021. "A brief history of machine learning." Dataversity. Accessed 2022-01-07.
- Gladchuk, Veronika. 2020. "History of machine learning." Label Your Data. Accessed 2022-01-07.
- Goled, Shraddha. 2021. "Self-Supervised Learning Vs Semi-Supervised Learning: How They Differ." AIM. Accessed 2022-01-11.
- Kaur, Simran. 2021. "Supervised vs Unsupervised Learning." Hackrio. Accessed 2022-01-07.
- Kot, Justyna. 2020. "A brief history of machine learning." Concise Software. Accessed 2022-01-07.
- Marr, Bernard. 2016. "A Short History of Machine Learning." Forbes. Accessed 2022-01-07.
- Medium. 2017. "History of machine Learning." Medium. Accessed 2022-01-07.
- Prakash, Arun. 2020. "Semi-Supervised Learning with Pseudo labeling." Francium Tech. Accessed 2022-01-11.
- ResearchGate. 2021. "Supervised vs Unsupervised Learning." Research Gate. Accessed 2022-01-09.
- Rogers, Sierra. 2021. "Supervised vs Unsupervised Learning." Capterra. Accessed 2022-01-07.
- Sai, Madhu. 2014. "Supervised and Unsupervised learning." Dataaspirant. Accessed 2022-01-07.
- Vas. 2022. "Machine Learning for Everyone?" vas3k. Accessed 2022-01-09.
- Wikipedia. 2021. "Self-supervised learning." WIkipedia. Accessed 2022-01-11.
- Wikipedia. 2022. "Machine Learning." Wikipedia. Accessed 2022-01-07.
- Yagcioglu, Semih. 2020. "Classical Examples of Supervised vs. Unsupervised Learning in Machine Learning." Springboard. Accessed 2022-01-07.
- Soni, Devin. 2018. "Supervised vs. Unsupervised Learning." towardsdatascience. Accessed 2022-01-06.
- Salian, Isha. 2018. "What’s the Difference Between Supervised, Unsupervised, Semi-Supervised and Reinforcement Learning?" nvidia. Accessed 2022-01-06.
- Airon, Palak. 2020. "The A – Z of Supervised Learning, Use Cases, and Disadvantages." opendatascience. Accessed 2022-01-07.
- Pratt, Mary. 2020. "Unsupervised Learning." Techtarget. Accessed 2022-01-07.
- Castle, Nikki. 2018. "What is Semi-Supervised Learning?" Oracle. Accessed 2022-01-07.
- Semi-Supervised Learning
- Self-Supervised Learning
- Zero-Shot Learning
- Transfer Learning
- Generative Adversarial Network
- Reinforcement Learning