• Machine Learning with Data. Source: Desjardins-Proulx 2013.
• The Stanford Cart. Source: Sheth 2017.
• How Machines Learn. Source: Jain 2015.
• Types of Machine Learning. Source: Raschka 2017.
• The place of feature engineering in the machine learning workflow. Source: Casari and Zheng 2018, Fig 1-2.
• Machine Learning data test split process. Source: Bhatia 2017.
• Overfit Underfit. Source: Bhande 2018.
• AI, ML and DL. Source: Pinterest 2018.
• Comparing Bagging and Boosting methods. Source: Aporras 2016.
• ImageNet evolution timeline. Source: Guo et al. 2016.

# Machine Learning

## Summary

Machine Learning is providing the machine or algorithm the capability to learn permutations and combinations of a given circumstance and react appropriately. There is uncertainity in circumstance and hence reaction. This is the unknown that the machine has to learn. Machines learn from vast amounts of historical circumstances and reactions provided to it in machine readable format. We simply called this data.

Machines aim to maximize the desired outcome. The statistical modeling is highly contextual and with assumptions. For instance, the math behind linear regression and logistic regression are very different. Machine learning generalizes them under supervised learning and optimizes for minimum error. All machines use numerical mathematics to iteratively solve for unknown parameters.

Machine learning, due to its holistic approach, can solve a broad set of problems such as image identification, text generation, speech recognition, etc.

## Milestones

1950

Alan Turing creates the Turing Test in which a computer must attempt to pass itself of as a human to other humans. In June 2014, a robot named Eugene passes this test by convincing 33% of the judges. A more difficult variant called Loebner Prize requires that more than 50% of the judges be convinced after a 25-min conversation. As of March 2018, no robot has won the prize.

1952

Arthur Samuel writes the first learning program. Applied to the game of checkers, the program is able to learn from mistakes and improve its gameplay with each new game. By mid-1970s, the program beats humans at checkers. Board games are useful in developing ML because they are understandable and complex.

1957

Just as the human brain is composed of interconnected neurons, Frank Rosenblatt designs the first artificial neural network called the perceptron. The idea is to solve complex problems through a series of simple decisions. Rosenblatt applies it for doing image recognition.

1967

The Nearest Neighbour algorithm is created and applied to map routing. This starts the field of pattern recognition.

1979

Invented by researchers at Stanford University, a robot now named the Stanford Cart is able to navigate obstacles in a room on its own.

1981

Gerald Dejong invents Explanation Based Learning. Computer uses data to train itself and create a rule to achieve a given goal. It discards information irrelevant to the problem. This is a type of supervised learning. In general, the 1980s is the decade of expert systems that are based on rules.

1990

The 1990s is the decade when approach to ML shifts from being knowledge driven to data driven. This is supported through the next two decades with greater availability of data, cloud computing and big data technologies.

2006

Geoffrey Hinton coins the term Deep Learning (DL) to describe new architectures of neural networks. This approach is applied to image recognition.

2012

The Google Brain project uses DL to detect visual patterns. Google X project applies Google Brain to YouTube videos to identity frames that contain cats. Geoffrey Hinton leads a team and wins ImageNet's computer vision contest by a large margin. This popularizes DL. In the coming years, DL becomes an important technique to create models with much better accuracy. This is the decade when DL becomes feasible.

2015

Google's AlphaGo uses ML to beat professional player Lee Sedol in a challenging board game called Go.

2017

Google Brain chief Jeff Dean states that DL starts to work with at least 100,000 data points. This underscores the importance of data availability for DL.

## Discussion

• How do machines learn?

Traditionally, intelligence was introduced into a system explicity using rules. Rules took the form of "if this happens while in this state, do that". These rules are derived from a knowledge base that's particular to that domain or application. However, such a rule-based system has limitations. To characterize the system completely, there could be potentially hundreds of rules. Moreover, rules come with exceptions that need to be considered as well. This is clearly not manageable for complex systems.

Machine Learning takes a different approach. Instead of working on pre-defined rules, machines look at large amounts of data. For each data point, they take note of the associated response. They do this for sufficient amount of data and thereby implicitly learn the rules. These implicit rules can be described in terms of features and outcomes.

For machines to learn properly, relevant and wide-ranging data should be made available. Data should cover all possible scenarios. Data is typically split into training dataset and testing dataset. Machines learn from the former set. The latter is used exclusively to validate the model. The learning process is not linear. It's self-correcting and iterative.

• What are the different Machine Learning types?

Learning takes place based on what worked through historical events (asynchronous learning) and on what is accepted in contemporary events (synchronous learning).

When machine is trained using historical data, the learnings can be classified as Supervised and Unsupervised learning:

• Supervised Learning uses a self-correcting feedback loop. The expectation is labelled. For instance, Temperature, Moisture and Humidity (called features) can be used to predict the chance of rain in the next 24 hours. Historic data that include Temperature, Moisture and Humidity are recorded and labelled as 'Rain' or 'Not Rain' depending on whether it rained or not rained in the following 24 hours. This is called Classfication problem. The system can also be designed to learn the amount of rain. This is called Regression problem.
• Unsupervised Learning enables logical attribution of stakeholders through association measures. For instance, an airline customer can be attributed based on the class he flies, food preferences, frequency of flights, etc.

AI systems use synchronous learning to reward/penalise right/wrong decisions and prevent future mishaps. Reinforcement Learning is concurrently applied in the decision process as a result of series of actions.

• What is feature engineering and why is it important?

A dataset will typically contain one or more variables or features. Some of these may influence the outcome. For example, temperature and humidity may be features that influence the chance of rain in the next 6 hours. The data may also contain the time of day or day of the week but these are features that probably don't influence the chance of rain. The job of an ML engineer is therefore to identify the right features for the problem. The selected features add up to the outcome of model. The accuracy of the ML model directly depends on features the ML engineer has chosen.

Feature engineering is the first task an ML engineer has to do when data is cleaned and transformed. Feature engineering is arriving at relevant variables that relate to solving the problem at hand. Feature engineering is done by domain experts who understand what each variable means, how to interpret it and how it relates to other variables.

• How ML adds value to Big Data?

The biggest strength of ML lies in the heterogeneous dataset that captures diverse scenarios. This enabling efficient and holistic learning. This is the very reason why huge dataset is a blessing for ML. Today's data coming from the mobile and web include video, audio, image and text. Problems that rely on such data can be modelled better with the help of ML.

• What kind of problems can be solved with ML?

Broadly, the following problems are solved with ML:

• Regression: This is the task of predicting a continuous quantity. Here, predictions are often made for quantities, such as amounts and sizes. For example, a house may be predicted to sell for a specific dollar value, perhaps in the range of $100,000 to$200,000.
• Classification: This is the task of predicting a discrete class label. For example, 1. an email of text can be classified as belonging to one of two classes: 'spam' and 'not spam', 2. image classification problems where there could be thousands classes (cat, dog, fish, car, etc.).
• What's the approach to solving ML Problems?

In a typical ML pipeline we would classify the problem, gather data, process data, model the problem, execute the models, validate results, and deploy the solution.

Once you have defined the problem and outlined the features, you then need to split the data in a way that's easy to test. You split this data in a 70 (train) : 30 (test) ratio. 70% with which machines learns and 30% where it tests learning. The training data is modeled for validation. This model needs to be validated with testing dataset and evaluated against multiple models to find the best model.

The idea of splitting data into training and test datasets can be traced to the Common Task Framework.

It's important to have an acceptable accuracy percentage (say, 60%+) across both training and testing datasets. If the accuracy rate isn't high enough or not consistent across the two datasets, then the ML process should be repeated with different or modified features.

• What is overfitting in the context of ML?

Often we read too much into past. We're surprised to see that history didn't repeat itself. This could happen for two reasons:

• Response that is specific to one particular circumstance
• Too little data

When this happens in ML, we call it overfitting. The possibility of overfitting exists because the criterion used for selecting the model may not be the same as the criterion used to judge the suitability of a model. For example, a model might be selected by maximizing its performance on some set of training data, and yet its suitability must be determined by its ability to perform well on unseen data. We can state that overfitting occurs when a model has memorized training data rather than learned to generalize from a trend.

• Could you compare or contrast Machine Learning (ML), Deep Learning (DL) and Artificial Intelligence (AI)?

This is better explained through an example. ML is about learning a task. For instance, a self-driving car learns many task: to brake or not to brake, speed up or slow down, turn the steering wheel, indicator functions, etc. While ML learns all these tasks separately, AI executes them in a coordinated manner, rewards good decisions, and penalises wrong decision. Thus, AI coordinates across ML tasks and also applies a feedback loop to do better in the future. AI also accounts for information that may not be part of ML and those are contextual.

DL is special case of ML. While ML learns once, DL does do in multiple stages. When problems are complex, DL does better than ML. For example, recognizing a human may involve identifying basic features (eyes, ears, hands, legs, etc.) at stage 1; and identifying higher order features (face, upper body, lower body, etc.) in stage 2; and finally calling it out as 'Human' in stage 3.

• How to improve the accuracy of ML models?
• Ensemble methods are techniques that create multiple models and then combine them to produce better results. For example, a candidate goes through multiple rounds of job interviews. Although a single interviewer might not be able to test the candidate for each required skill and trait, the combined feedback of multiple interviewers usually helps in better assessment of the candidate.
• Bagging (Bootstrap Aggregating) is an ensemble method. First, we create random samples of the training dataset (subsets of the training dataset). We build a classifier for each sample. Finally, results of these multiple classifiers are combined using averaging or majority voting. Bagging helps to reduce the variance error.
• Boosting: The first predictor starts by classifying original dataset with equal weights to each observation. If classes are predicted incorrectly using the first learner, then it gives higher weight to the wrongly classified observation for the successive learner. Being an iterative process, it continues to add classifier learner until a limit is reached in the number of models or accuracy. Boosting has shown better predictive accuracy than bagging, but tends to overfit the training data.
• In what scenarios are ML not applicable or has failed?

ML is applied in diverse fields where plenty of data is available. There are scenarios where ML has challenges, but constant endeavor ensures improved accuracy and increased acceptability. In the accompanying figure we can see how ImageNet ML algorithms have evolved for better accuracy.

Failure of ML can be attributed to incorrect problem formulation, wrong choice of features or inappropriate algorithms.

ML algorithms can become biased due to various reasons. For example, an algorithm that sees only men writing code and only women in kitchen in its training data will naturally become biased. In a real-world case of bias, Google Allo once responded with a turban emoji when shown a gun emoji. Google Translate showed gender bias in Turkish-English translations. Amazon's AI-based recruiting tool was found to be favour male candidates.

Other failures of AI/ML happened in 2018. Uber's self-driving car killed a pedestrian. IBM's Watson AI Health has failed to impress doctors.

## Milestones

1950

Alan Turing creates the Turing Test in which a computer must attempt to pass itself of as a human to other humans. In June 2014, a robot named Eugene passes this test by convincing 33% of the judges. A more difficult variant called Loebner Prize requires that more than 50% of the judges be convinced after a 25-min conversation. As of March 2018, no robot has won the prize.

1952

Arthur Samuel writes the first learning program. Applied to the game of checkers, the program is able to learn from mistakes and improve its gameplay with each new game. By mid-1970s, the program beats humans at checkers. Board games are useful in developing ML because they are understandable and complex.

1957

Just as the human brain is composed of interconnected neurons, Frank Rosenblatt designs the first artificial neural network called the perceptron. The idea is to solve complex problems through a series of simple decisions. Rosenblatt applies it for doing image recognition.

1967

The Nearest Neighbour algorithm is created and applied to map routing. This starts the field of pattern recognition.

1979

Invented by researchers at Stanford University, a robot now named the Stanford Cart is able to navigate obstacles in a room on its own.

1981

Gerald Dejong invents Explanation Based Learning. Computer uses data to train itself and create a rule to achieve a given goal. It discards information irrelevant to the problem. This is a type of supervised learning. In general, the 1980s is the decade of expert systems that are based on rules.

1990

The 1990s is the decade when approach to ML shifts from being knowledge driven to data driven. This is supported through the next two decades with greater availability of data, cloud computing and big data technologies.

2006

Geoffrey Hinton coins the term Deep Learning (DL) to describe new architectures of neural networks. This approach is applied to image recognition.

2012

The Google Brain project uses DL to detect visual patterns. Google X project applies Google Brain to YouTube videos to identity frames that contain cats. Geoffrey Hinton leads a team and wins ImageNet's computer vision contest by a large margin. This popularizes DL. In the coming years, DL becomes an important technique to create models with much better accuracy. This is the decade when DL becomes feasible.

2015

Google's AlphaGo uses ML to beat professional player Lee Sedol in a challenging board game called Go.

2017

Google Brain chief Jeff Dean states that DL starts to work with at least 100,000 data points. This underscores the importance of data availability for DL.

## Top Contributors

Last update: 2019-01-21 16:10:40 by gurumoorthyP
Creation: 2018-04-08 06:00:11 by arvindpdmn

Author
No. of Edits
No. of Chats
DevCoins
7
0
845
2
0
666
2
0
201
1
0
100
1
0
91
2205
Words
0
Chats
13
Edits
2
Likes
1409
Hits

## Cite As

Devopedia. 2019. "Machine Learning." Version 13, January 21. Accessed 2019-04-26. https://devopedia.org/machine-learning
• Site Map