Supervised vs Unsupervised Learning

Machine Learning is the art and science of training machines with data without explicitly programming them. The Machine Learning algorithms can be classified into Supervised and Unsupervised Learning algorithms.

In Supervised Learning, the model is presented with input and desired output. The model learns from this data and is then presented with new data to make predictions i.e. either classify them or predict numeric values.

In contrast, the Unsupervised models work on their own to analyze the data provided to them and find underlying patterns and relationships between the input/output variables without any prior training of data.

The main difference between both is that a Supervised model learns from the data by making predictions iteratively and adjusting for the correct answer thereby making it more reliable than an Unsupervised model which has no idea what the values for the output might be.

Discussion

  • Explain the working behind both the algorithms.
    Supervised and Unsupervised Learning Example. Source: ResearchGate 2021
    Supervised and Unsupervised Learning Example. Source: ResearchGate 2021

    Supervised Learning involves a training set that has labeled data and the goal is to find a mapping function that maps the inputs to desired outputs.

    For example, the model is given a dataset say, fruits, it gathers all the information like shape, size, taste, and name of the fruit and learns about each type of data. Once the model gets trained, it is presented with a new fruit (test data), using the training data, the model will predict the name of the fruit by analyzing all the information learned previously.

    Unlike Supervised Learning, Unsupervised Learning exhibits self-organization. The algorithm is provided with unlabeled and unclassified data and is expected to find hidden features from the data. It involves the grouping of data based on similarities, patterns, or differences without any guidance.

    For example, the model is fed with unlabeled data of cats and dogs. The model divides the data into two different clusters without knowing their labels. Now, when it is presented with a cat's image (test data), based on the knowledge gathered, it will put it into the cat's cluster.

  • Explain Supervised and Unsupervised Learning with real-life examples.

    We can take a real-life example of a baby and a family dog. She recognizes her family dog. When she is introduced to a new dog, she identifies it as a dog because it has the same features (4 legs, 2 ears, 2 eyes) as her dog. This is Unsupervised Learning. If it would have been Supervised Learning, a family member would have told her that it's a dog.

    Supervised Learning use cases include:

    • Classifying whether an email is Spam or not
    • Predicting house prices
    • Predicting weather conditions
    • Predicting whether a Customer is happy with the service or not

    Unsupervised Learning use cases include:

    • Finding correlations in customer data
    • Classifying people based on interest
    • Grouping customers by their purchase behavior
    • Reducing the complexity of a problem
  • What types of problems are solved using Supervised and Unsupervised Learning Algorithms?
    Types of Supervised and Unsupervised learning problems.. Source: Vas 2022
    Types of Supervised and Unsupervised learning problems.. Source: Vas 2022

    Supervised Learning can be broadly classified into Classification and Regression problems.

    • Classification problems use algorithms to allot the data into categories such as true-false or some specific categories like apple-oranges etc. Classification of an email as Spam or not is an example. Support Vector Machine and Decision Tree, etc are some classification algorithms.
    • Regression problems use algorithms to predict some numerical value or find a relation between input and output variables. Weather Forecasting is a regression example. Linear Regression is a regression algorithm.

    Unsupervised Learning can be categorized into Clustering, Association and Dimensionality Reduction .

    • Clustering involves grouping the data into clusters based on similarities or differences. Market Segmentation is a clustering example. K-Means Clustering is one of the clustering algorithms.
    • Association determines the set of items that occur together in the dataset. A real-life example is Market Basket Analysis. Apriori, Eclat, F-P Growth are some association algorithms.
    • Dimensionality Reduction refers to reduce the dimension of a dataset so that they can be easily processed by supervised algorithms. There are two steps involved i.e. feature selection and feature extraction.
  • What are the Advantages and Disadvantages of using Supervised Learning over Unsupervised Learning?

    The results produced by Supervised models tend to be more accurate and reliable than those produced by Unsupervised models since the data of supervised is well known and labeled.

    Supervised Learning can get more specific about the data and output which is not possible in the case of Unsupervised Learning as it's the job of the model to group the data and find the hidden patterns.

    The results of supervised model can be ascertained as there is prior knowledge about the classes invovled. Contrarily, unsupervised output cannot be ascertained.

    Supervised Learning model is considered a little complex as compared to Unsupervised model as the Supervised model might require a lot of computation time to train and the labels for input/output variables require expertise. This is why unsupervised techniques are preferred.

    Supervised algorithms are not suitable for handling complex tasks which Unsupervised algorithms handle perfectly. For example, if the test data is a little different from training data, the supervised algorithm will yield incorrect results.

  • How is Supervised Learning different from Unsupervised Learning?

    Some key differences are as follows:

    • Goals: The goal of Supervised Learning is to train the model with labeled data so that it predicts correct output when given test data whereas the goal of Unsupervised Learning is to process large chunks of data to find out interesting insights, patterns, and correlations present in the data.
    • Output Feedback: Supervised Learning has a direct feedback mechanism as the machine is trained with labeled data whereas there's no feedback mechanism involved in the case of Unsupervised Learning as the model is unaware of output.
    • Complexity: Unsupervised models are more complex as they need large data to produce outputs/ insights as compared to Supervised models.
    • Applications: Supervised algorithms are great for Spam Detection, Weather Forecasting, Price Predictions, etc. On the other hand, Unsupervised algorithms work well for Anomaly Detection, Recommendation Engines, and Medical Imaging, etc.
  • How to decide whether to use Supervised or Unsupervised Learning for a particular task?

    To decide on which Machine Learning model to choose, one needs to answer the following questions:

    • Input Data: Is the data labeled or unlabeled? For Supervised algorithms to work, entire data needs to be labeled and Unsupervised algorithms work best with unlabeled data.
    • Goals: The problem to be solved is defined or not? Supervised algorithms can work only if the problem is well-defined but Unsupervised algorithms will be able to find hidden patterns and insights from the data and generate new problems.
    • Algorithms: Does the dimensionality of your data coincide with that of existing algorithms? Will the algorithm support the amount of data you have?

    So, Supervised Learning gives quite accurate results but fails to perform on large data. Contrarily, Unsupervised algorithms can handle large data but can yield inaccurate results.

  • What to do when the data has both labeled and unlabeled data items?
    Comparison of different types of ML algorithms. Sorce: Baheti 2021
    Comparison of different types of ML algorithms. Sorce: Baheti 2021

    When the data has both labeled and unlabeled data items, Semi-Supervised Learning can be preferred. Semi-Supervised Learning integrates both Supervised and Unsupervised approaches. It was introduced to counter the expensive costs of acquiring labeled data. So, this model is trained on a large chunk of unlabeled data combined with a small chunk of labeled data.

    Semi-Supervised Learning uses Unsupervised Learning approach to cluster similar data and then trains the model on different batches of data labeling the unlabeled data. These labels are called Pseudo Labels. The model is now trained on the combination of pseudo labeled and labeled data which leads to improvement in the model's accuracy.

    Semi-Supervised Learning can ideal for the medical field wherein the doctor's efforts combined with that of the machine can lead to better accuracy. For example, a Radiologist can manually label some CT scans for tumors which can increase the machine's accuracy of predicting the right patients who need medical attention.

  • What is self-supervised learning and how it related to supervised and unsupervised learning?

    Self Supervised Learning is based on artificial neural network. It obtains supervisory signals from data itself predicting the hidden part of input from unhidden part of input. First, it recognises the pseudo labels and then uses supervised and unsupervised learning for further task.

    Self-Supervised Learning is considered as autonomous form of Supervised Learning where there is no need of labeled input/outputs. It is like an extension to Unsupervised Learning but does not involve grouping or clustering. Alike Semi-Supervised Learning, it depends on manually labeled data. However, it does not require any prior labeled data as Semi-Supervised Learning does.

    Self-Supervised Learning automates the process of labeling the data minimizing the cost of obtaining labeled data for Supervised Learning. Self-Supervised Learning applications include speech recognition, voice recognition, face detection, 3-d rotations etc.

    Real Life examples include:

    • Wav2vec, a Facebook algorithm used to perform speech recognition.
    • BERT, Bidirectional Encoder Representations from Transformers, a Google algorithm used to understand the context of search queries better.

Milestones

1943

First mathematical model of neural networks presented in the scientific paper "A logical calculus of the ideas immanent in nervous activity" by Walter Pitts and Warren McCulloch.

1949

The book "The Organization of Behavior" by Donald Hebb is published. This book has theories on relationship of neural networks and brain activity.

1950

Alan Turing creates the “Turing Test” to determine if a computer had human-like intelligence. For passing this test, a computer should be able to make a human believe that it is another human.

1952

Arthur Samuel, from IBM, made the first computer program on the game of checkers. IBM tried to improve it by iteratively making winning strategies. This led to development of alpha-beta pruning and minimax algorithm.

1957

Frank Rosenblatt combines Donald Hebb's model of brain cell with Arthur Samuel's theory of machine learning to create a perceptron. It successfully stimulated the thought processes of the human brain. This is where today’s neural networks originate from.

1967

Marcello Pelillo created the K-Nearest Neighbor algorithm (Supervised Learning)which is used for classification and regression in machine learning. This could give the solution to traveling salesmen problem (start from a random city and visit all cities in shortest possible distance).

1990

This year saw the transformation of Machine Learning from a knowledge-driven approach to a data-driven approach. Researchers created programs for computers that could analyze large amounts of data and draw conclusions from the results. (Unsupervised learning)

1995

Random Forest Algorithm got introduced in a paper published by Tin Kam Ho. This algorithm creates and merges multiple AI decisions into a "forest". When relying on multiple different decision trees, the model significantly improves in its accuracy and decision-making.

2009

The book “Introduction to Semi-Supervised Learning” was published in 2009 and was written by Xiaojin Zhu and Andrew Goldberg.

References

  1. Arora, Vishal. 2019. "Supervised, Unsupervised and Semi-supervised ML." amazonaws. Accessed 2022-01-07.
  2. Arora, Surbhi. 2020. "Supervised vs Unsupervised vs Reinforcement." Aitude. Accessed 2022-01-07.
  3. Baheti, Pragati. 2021. "Supervised vs. Unsupervised Learning: What’s the Difference?" v7labs. Accessed 2022-01-09.
  4. Bhattacharyya, Jayita. 2020. "Pseudo Labelling – A Guide To Semi-Supervised Learning." aim. Accessed 2022-01-09.
  5. Brownlee, Jason. 2016. "Supervised and Unsupervised Machine Learning Algorithms." Machinelearningmastery. Accessed 2022-01-07.
  6. Brownlee, Jason. 2021. "What Is Semi-Supervised Learning." Machine Learning Mastery. Accessed 2022-01-11.
  7. Delua, Julianna. 2021. "Supervised vs.Unsupervised Learning: What's the Difference?" IBM. Accessed 2022-01-07.
  8. Foote, Keith. 2021. "A brief history of machine learning." Dataversity. Accessed 2022-01-07.
  9. Gladchuk, Veronika. 2020. "History of machine learning." Label Your Data. Accessed 2022-01-07.
  10. Goled, Shraddha. 2021. "Self-Supervised Learning Vs Semi-Supervised Learning: How They Differ." AIM. Accessed 2022-01-11.
  11. Kaur, Simran. 2021. "Supervised vs Unsupervised Learning." Hackrio. Accessed 2022-01-07.
  12. Kot, Justyna. 2020. "A brief history of machine learning." Concise Software. Accessed 2022-01-07.
  13. Marr, Bernard. 2016. "A Short History of Machine Learning." Forbes. Accessed 2022-01-07.
  14. Medium. 2017. "History of machine Learning." Medium. Accessed 2022-01-07.
  15. Prakash, Arun. 2020. "Semi-Supervised Learning with Pseudo labeling." Francium Tech. Accessed 2022-01-11.
  16. ResearchGate. 2021. "Supervised vs Unsupervised Learning." Research Gate. Accessed 2022-01-09.
  17. Rogers, Sierra. 2021. "Supervised vs Unsupervised Learning." Capterra. Accessed 2022-01-07.
  18. Sai, Madhu. 2014. "Supervised and Unsupervised learning." Dataaspirant. Accessed 2022-01-07.
  19. Vas. 2022. "Machine Learning for Everyone?" vas3k. Accessed 2022-01-09.
  20. Wikipedia. 2021. "Self-supervised learning." WIkipedia. Accessed 2022-01-11.
  21. Wikipedia. 2022. "Machine Learning." Wikipedia. Accessed 2022-01-07.
  22. Yagcioglu, Semih. 2020. "Classical Examples of Supervised vs. Unsupervised Learning in Machine Learning." Springboard. Accessed 2022-01-07.

Further Reading

  1. Soni, Devin. 2018. "Supervised vs. Unsupervised Learning." towardsdatascience. Accessed 2022-01-06.
  2. Salian, Isha. 2018. "What’s the Difference Between Supervised, Unsupervised, Semi-Supervised and Reinforcement Learning?" nvidia. Accessed 2022-01-06.
  3. Airon, Palak. 2020. "The A – Z of Supervised Learning, Use Cases, and Disadvantages." opendatascience. Accessed 2022-01-07.
  4. Pratt, Mary. 2020. "Unsupervised Learning." Techtarget. Accessed 2022-01-07.
  5. Castle, Nikki. 2018. "What is Semi-Supervised Learning?" Oracle. Accessed 2022-01-07.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
32
5
2118
1
7
445
1797
Words
2
Likes
215
Hits

Cite As

Devopedia. 2022. "Supervised vs Unsupervised Learning." Version 33, January 12. Accessed 2022-01-18. https://devopedia.org/supervised-vs-unsupervised-learning
Contributed by
2 authors


Last updated on
2022-01-12 15:24:22
  • Semi-Supervised Learning
  • Self-Supervised Learning
  • Zero-Shot Learning
  • Transfer Learning
  • Generative Adversarial Network
  • Reinforcement Learning