• Machine Learning with Data. Source: Desjardins-Proulx 2013.
    image
  • The Stanford Cart. Source: Sheth 2017.
    image
  • How Machines Learn. Source: Jain 2015.
    image
  • Types of Machine Learning. Source: Raschka 2017.
    image
  • The place of feature engineering in the machine learning workflow. Source: Casari and Zheng 2018, Fig 1-2.
    image
  • Machine Learning data test split process. Source: Bhatia 2017.
    image
  • Overfit Underfit. Source: Bhande 2018.
    image
  • AI, ML and DL. Source: Pinterest 2018.
    image
  • Comparing Bagging and Boosting methods. Source: Aporras 2016.
    image
  • ImageNet evolution timeline. Source: Guo et al. 2016.
    image

Machine Learning

Improve this article. Show messages.

Summary

image
Machine Learning with Data. Source: Desjardins-Proulx 2013.

Machine Learning is providing the machine or algorithm the capability to learn permutations and combinations of a given circumstance and react appropriately. There is uncertainity in circumstance and hence reaction. This is the unknown that the machine has to learn. Machines learn from vast amounts of historical circumstances and reactions provided to it in machine readable format. We simply called this data.

Machines aim to maximize the desired outcome. The statistical modeling is highly contextual and with assumptions. For instance, the math behind linear regression and logistic regression are very different. Machine learning generalizes them under supervised learning and optimizes for minimum error. All machines use numerical mathematics to iteratively solve for unknown parameters.

Machine learning, due to its holistic approach, can solve a broad set of problems such as image identification, text generation, speech recognition, etc.

Milestones

1950

Alan Turing creates the Turing Test in which a computer must attempt to pass itself of as a human to other humans. In June 2014, a robot named Eugene passes this test by convincing 33% of the judges. A more difficult variant called Loebner Prize requires that more than 50% of the judges be convinced after a 25-min conversation. As of March 2018, no robot has won the prize.

1952

Arthur Samuel writes the first learning program. Applied to the game of checkers, the program is able to learn from mistakes and improve its gameplay with each new game. By mid-1970s, the program beats humans at checkers. Board games are useful in developing ML because they are understandable and complex.

1957

Just as the human brain is composed of interconnected neurons, Frank Rosenblatt designs the first artificial neural network called the perceptron. The idea is to solve complex problems through a series of simple decisions. Rosenblatt applies it for doing image recognition.

1967

The Nearest Neighbour algorithm is created and applied to map routing. This starts the field of pattern recognition.

1979
image

Invented by researchers at Stanford University, a robot now named the Stanford Cart is able to navigate obstacles in a room on its own.

1981

Gerald Dejong invents Explanation Based Learning. Computer uses data to train itself and create a rule to achieve a given goal. It discards information irrelevant to the problem. This is a type of supervised learning. In general, the 1980s is the decade of expert systems that are based on rules.

1990

The 1990s is the decade when approach to ML shifts from being knowledge driven to data driven. This is supported through the next two decades with greater availability of data, cloud computing and big data technologies.

2006

Geoffrey Hinton coins the term Deep Learning (DL) to describe new architectures of neural networks. This approach is applied to image recognition.

2012

The Google Brain project uses DL to detect visual patterns. Google X project applies Google Brain to YouTube videos to identity frames that contain cats. Geoffrey Hinton leads a team and wins ImageNet's computer vision contest by a large margin. This popularizes DL. In the coming years, DL becomes an important technique to create models with much better accuracy. This is the decade when DL becomes feasible.

2015

Google's AlphaGo uses ML to beat professional player Lee Sedol in a challenging board game called Go.

2017

Google Brain chief Jeff Dean states that DL starts to work with at least 100,000 data points. This underscores the importance of data availability for DL.

Discussion

  • How do machines learn?
    image
    How Machines Learn. Source: Jain 2015.

    Traditionally, intelligence was introduced into a system explicity using rules. Rules took the form of "if this happens while in this state, do that". These rules are derived from a knowledge base that's particular to that domain or application. However, such a rule-based system has limitations. To characterize the system completely, there could be potentially hundreds of rules. Moreover, rules come with exceptions that need to be considered as well. This is clearly not manageable for complex systems.

    Machine Learning takes a different approach. Instead of working on pre-defined rules, machines look at large amounts of data. For each data point, they take note of the associated response. They do this for sufficient amount of data and thereby implicitly learn the rules. These implicit rules can be described in terms of features and outcomes.

    For machines to learn properly, relevant and wide-ranging data should be made available. Data should cover all possible scenarios. Data is typically split into training dataset and testing dataset. Machines learn from the former set. The latter is used exclusively to validate the model. The learning process is not linear. It's self-correcting and iterative.

  • What are the different Machine Learning types?
    image
    Types of Machine Learning. Source: Raschka 2017.

    Learning takes place based on what worked through historical events (asynchronous learning) and on what is accepted in contemporary events (synchronous learning).

    When machine is trained using historical data, the learnings can be classified as Supervised and Unsupervised learning:

    • Supervised Learning uses a self-correcting feedback loop. The expectation is labelled. For instance, Temperature, Moisture and Humidity (called features) can be used to predict the chance of rain in the next 24 hours. Historic data that include Temperature, Moisture and Humidity are recorded and labelled as 'Rain' or 'Not Rain' depending on whether it rained or not rained in the following 24 hours. This is called Classfication problem. The system can also be designed to learn the amount of rain. This is called Regression problem.
    • Unsupervised Learning enables logical attribution of stakeholders through association measures. For instance, an airline customer can be attributed based on the class he flies, food preferences, frequency of flights, etc.

    AI systems use synchronous learning to reward/penalise right/wrong decisions and prevent future mishaps. Reinforcement Learning is concurrently applied in the decision process as a result of series of actions.

  • What is feature engineering and why is it important?
    image
    The place of feature engineering in the machine learning workflow. Source: Casari and Zheng 2018, Fig 1-2.

    A dataset will typically contain one or more variables or features. Some of these may influence the outcome. For example, temperature and humidity may be features that influence the chance of rain in the next 6 hours. The data may also contain the time of day or day of the week but these are features that probably don't influence the chance of rain. The job of an ML engineer is therefore to identify the right features for the problem. The selected features add up to the outcome of model. The accuracy of the ML model directly depends on features the ML engineer has chosen.

    Feature engineering is the first task an ML engineer has to do when data is cleaned and transformed. Feature engineering is arriving at relevant variables that relate to solving the problem at hand. Feature engineering is done by domain experts who understand what each variable means, how to interpret it and how it relates to other variables.

  • How ML adds value to Big Data?

    The biggest strength of ML lies in the heterogeneous dataset that captures diverse scenarios. This enabling efficient and holistic learning. This is the very reason why huge dataset is a blessing for ML. Today's data coming from the mobile and web include video, audio, image and text. Problems that rely on such data can be modelled better with the help of ML.

  • What kind of problems can be solved with ML?

    Broadly, the following problems are solved with ML:

    • Regression: This is the task of predicting a continuous quantity. Here, predictions are often made for quantities, such as amounts and sizes. For example, a house may be predicted to sell for a specific dollar value, perhaps in the range of $100,000 to $200,000.
    • Classification: This is the task of predicting a discrete class label. For example, 1. an email of text can be classified as belonging to one of two classes: 'spam' and 'not spam', 2. image classification problems where there could be thousands classes (cat, dog, fish, car, etc.).
  • What's the approach to solving ML Problems?
    image
    Machine Learning data test split process. Source: Bhatia 2017.

    Once you have defined the problem and outlined the features, you then need to split the data in a way that's easy to test. You split this data in a 70 (train) : 30 (test) ratio. 70% with which machines learns and 30% where it tests learning. The training data is modeled for validation. This model needs to be validated with testing dataset and evaluated against multiple models to find the best model.

    The idea of splitting data into training and test datasets can be traced to the Common Task Framework.

    It's important to have an acceptable accuracy percentage (say, 60%+) across both training and testing datasets. If the accuracy rate isn't high enough or not consistent across the two datasets, then the ML process should be repeated with different or modified features.

  • What is overfitting in the context of ML?
    image
    Overfit Underfit. Source: Bhande 2018.

    Often we read too much into past. We're surprised to see that history didn't repeat itself. This could happen for two reasons:

    • Response that is specific to one particular circumstance
    • Too little data

    When this happens in ML, we call it overfitting. The possibility of overfitting exists because the criterion used for selecting the model may not be the same as the criterion used to judge the suitability of a model. For example, a model might be selected by maximizing its performance on some set of training data, and yet its suitability must be determined by its ability to perform well on unseen data. We can state that overfitting occurs when a model has memorized training data rather than learned to generalize from a trend.

  • Could you compare or contrast Machine Learning (ML), Deep Learning (DL) and Artificial Intelligence (AI)?
    image
    AI, ML and DL. Source: Pinterest 2018.

    This is better explained through an example. ML is about learning a task. For instance, a self-driving car learns many task: to brake or not to brake, speed up or slow down, turn the steering wheel, indicator functions, etc. While ML learns all these tasks separately, AI executes them in a coordinated manner, rewards good decisions, and penalises wrong decision. Thus, AI coordinates across ML tasks and also applies a feedback loop to do better in the future. AI also accounts for information that may not be part of ML and those are contextual.

    DL is special case of ML. While ML learns once, DL does do in multiple stages. When problems are complex, DL does better than ML. For example, recognizing a human may involve identifying basic features (eyes, ears, hands, legs, etc.) at stage 1; and identifying higher order features (face, upper body, lower body, etc.) in stage 2; and finally calling it out as 'Human' in stage 3.

  • How to improve the accuracy of ML models?
    image
    Comparing Bagging and Boosting methods. Source: Aporras 2016.
    • Ensemble methods are techniques that create multiple models and then combine them to produce better results. For example, a candidate goes through multiple rounds of job interviews. Although a single interviewer might not be able to test the candidate for each required skill and trait, the combined feedback of multiple interviewers usually helps in better assessment of the candidate.
    • Bagging (Bootstrap Aggregating) is an ensemble method. First, we create random samples of the training dataset (subsets of the training dataset). We build a classifier for each sample. Finally, results of these multiple classifiers are combined using averaging or majority voting. Bagging helps to reduce the variance error.
    • Boosting: The first predictor starts by classifying original dataset with equal weights to each observation. If classes are predicted incorrectly using the first learner, then it gives higher weight to the wrongly classified observation for the successive learner. Being an iterative process, it continues to add classifier learner until a limit is reached in the number of models or accuracy. Boosting has shown better predictive accuracy than bagging, but tends to overfit the training data.
  • In what scenarios are ML not applicable or has failed?
    image
    ImageNet evolution timeline. Source: Guo et al. 2016.

    ML is applied in diverse fields where plenty of data is available. There are scenarios ML has challenges, but constant endeavor ensures improved accuracy and increased acceptability. In the accompanying figure we can see how ImageNet ML algorithms have evolved for better accuracy.

    Failure of ML can be attributed to incorrect problem formulation, wrong choice of features or inappropriate algorithms.

References

  1. Aporras. 2016. "Difference between Bagging and Boosting?" QuantDare, April 20. Accessed 2018-04-15.
  2. Bhande, Ahup. 2018. "What is underfitting and overfitting in machine learning and how to deal with it." Medium, March 12. Accessed 2018-04-13.
  3. Bhatia, Ankur. 2017. "Improve Threat Classification Accuracy With Supervised Machine Learning" Security Intelligence, January 6. Accessed 2018-04-08.
  4. Brownlee, Jason. 2013. "How to Define Your Machine Learning Problem." Machine Learning Mastery, December 23. Accessed 2018-04-08.
  5. Brownlee, Jason. 2017. "Difference Between Classification and Regression in Machine Learning." Machine Learning Mastery, December 11. Accessed 2018-04-08.
  6. Build With Google Cloud. 2018. "A history of machine learning." Accessed 2018-04-08.
  7. Casari, Amanda and Zheng, Alice. 2018. "Feature Engineering for Machine Learning." O'Reilly Media, Inc., March. Accessed 2018-04-13.
  8. Desjardins-Proulx, Philippe. 2013. "Machine learning and deep transfer learning." July 5. Accessed 2018-04-13.
  9. Donoho, David. 2015. "50 years of Data Science." Based on a presentation at the Tukey Centennial workshop, Princeton NJ, Version 1.00, September 18. Accessed 2018-04-11.
  10. Frank, Blair Hanley. 2017. "Google Brain chief: Deep learning takes at least 100,000 examples." VentureBeat, October 23. Accessed 2018-04-08.
  11. Gonzalez, Victor. 2018. "A Brief History of Machine Learning." Synergic Partners, March. Accessed 2018-04-08.
  12. Guo, Yanming, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S. Lew. 2016. "Deep learning for visual understanding: A review." Neurocomputing, vol. 187, pp. 27-48. Accessed 2018-04-13.
  13. Hern, Alex. 2014. "What is the Turing test? And are we all doomed now?" The Guardian, June 9. Accessed 2018-04-07.
  14. Jain, Kunal. 2015. "Machine Learning basics for a newbie." Analytics Vidhya, June 11. Accessed 2018-04-08.
  15. Kaushik, Saurav. 2017. "How to build Ensemble Models in machine learning?" Analytics Vidhya, February 15. Accessed 2018-04-08.
  16. Marr, Bernard. 2016. "A Short History of Machine Learning -- Every Manager Should Read." Forbes, February 19. Accessed 2018-04-08.
  17. Pinterest. 2018. "AI, ML, DL." Accessed 2018-04-14.
  18. Project ARM. 2017. "Image Recognition: a short history and all you need to know about it." Project ARM, January 18. Accessed 2018-04-13.
  19. Raschka, Sebastian. 2017. "3 different types of machine learning." KDnuggets, November. Accessed 2018-04-13.
  20. Ray, Sunil. 2015. "5 Easy questions on Ensemble Modeling everyone should know." Analytics Vidhya, September 30. Accessed 2018-04-08.
  21. Reese, Hope. 2016. "Top 10 AI failures of 2016." TechRepublic, December 2. Accessed 2018-04-13.
  22. Sheth, Aneri. 2017. "History of Machine Learning." Bloombench, August 25. Accessed 2018-04-08.
  23. Tiwari, Satyam Prasad. 2017. "Evolution and History Of Machine Learning." BitsDroid, November 11. Accessed 2018-04-08.
  24. Wikipedia. 2018. "Timeline of machine learning." April 2. Accessed 2018-04-08.

Milestones

1950

Alan Turing creates the Turing Test in which a computer must attempt to pass itself of as a human to other humans. In June 2014, a robot named Eugene passes this test by convincing 33% of the judges. A more difficult variant called Loebner Prize requires that more than 50% of the judges be convinced after a 25-min conversation. As of March 2018, no robot has won the prize.

1952

Arthur Samuel writes the first learning program. Applied to the game of checkers, the program is able to learn from mistakes and improve its gameplay with each new game. By mid-1970s, the program beats humans at checkers. Board games are useful in developing ML because they are understandable and complex.

1957

Just as the human brain is composed of interconnected neurons, Frank Rosenblatt designs the first artificial neural network called the perceptron. The idea is to solve complex problems through a series of simple decisions. Rosenblatt applies it for doing image recognition.

1967

The Nearest Neighbour algorithm is created and applied to map routing. This starts the field of pattern recognition.

1979
image

Invented by researchers at Stanford University, a robot now named the Stanford Cart is able to navigate obstacles in a room on its own.

1981

Gerald Dejong invents Explanation Based Learning. Computer uses data to train itself and create a rule to achieve a given goal. It discards information irrelevant to the problem. This is a type of supervised learning. In general, the 1980s is the decade of expert systems that are based on rules.

1990

The 1990s is the decade when approach to ML shifts from being knowledge driven to data driven. This is supported through the next two decades with greater availability of data, cloud computing and big data technologies.

2006

Geoffrey Hinton coins the term Deep Learning (DL) to describe new architectures of neural networks. This approach is applied to image recognition.

2012

The Google Brain project uses DL to detect visual patterns. Google X project applies Google Brain to YouTube videos to identity frames that contain cats. Geoffrey Hinton leads a team and wins ImageNet's computer vision contest by a large margin. This popularizes DL. In the coming years, DL becomes an important technique to create models with much better accuracy. This is the decade when DL becomes feasible.

2015

Google's AlphaGo uses ML to beat professional player Lee Sedol in a challenging board game called Go.

2017

Google Brain chief Jeff Dean states that DL starts to work with at least 100,000 data points. This underscores the importance of data availability for DL.

Tags

See Also

Further Reading

  1. A Machine Learning Visual Tutorial with Examples
  2. Machine Learning for Humans
  3. Top 10 Machine Learning Algorithms for Beginners
  4. 8 Fun Machine Learning Projects for Beginners

Top Contributors

Last update: 2018-04-15 12:44:49 by arvindpdmn
Creation: 2018-04-08 06:00:11 by arvindpdmn

Article Stats

2086
Words
0
Chats
4
Authors
12
Edits
2
Likes
880
Hits

Cite As

Devopedia. 2018. "Machine Learning." Version 12, April 15. Accessed 2018-10-18. https://devopedia.org/machine-learning
BETA V0.17