# Machine Learning

Machine Learning is a data-oriented technique that enables computers to learn from experience. Human experience comes from our interaction with the environment. For computers, experience is indirect. It's based on data collected from the world, data about the world.

What we call data is really vast amounts of historical circumstances and reactions in a machine readable format. The machine (computer) learns permutations and combinations of a given circumstance and reacts appropriately. There is uncertainty in circumstance and hence reaction. This is the unknown that the machine has to learn. Machines aim to maximize the desired outcome.

Machine learning due to its holistic approach can solve a broad set of problems such as object detection, voice recognition, computational biology, price forecasting, and more.

## Discussion

• How do machines learn?

Traditionally, intelligence was introduced into a system explicitly using rules. Rules took the form of "if this happens while in this state, do that". These rules are derived from a knowledge base that's particular to that domain or application. However, such a rule-based system has limitations. To characterize the system completely, there could be potentially hundreds of rules. Moreover, rules come with exceptions that need to be considered as well. This is clearly not manageable for complex systems.

Machine Learning takes a different approach. Instead of working on pre-defined rules, machines look at large amounts of data. For each data point, they take note of the associated response. They do this for sufficient amount of data and thereby implicitly learn the rules. These implicit rules can be described in terms of features and outcomes.

For machines to learn properly, relevant and wide-ranging data should be made available. Data should cover all possible scenarios. Data is typically split into training dataset and testing dataset. Machines learn from the former set. The latter is used exclusively to validate the model. The learning process is not linear. It's self-correcting and iterative.

• What are the different Machine Learning types?

Learning takes place based on what worked through historical events (asynchronous learning) and on what is accepted in contemporary events (synchronous learning).

When machine is trained using historical data, the learnings can be classified as Supervised and Unsupervised learning:

• Supervised Learning uses a self-correcting feedback loop. The expectation is labelled. For instance, Temperature, Moisture and Humidity (called features) can be used to predict the chance of rain in the next 24 hours. Historic data that include Temperature, Moisture and Humidity are recorded and labelled as 'Rain' or 'Not Rain' depending on whether it rained or not rained in the following 24 hours. This is called Classification problem. The system can also be designed to learn the amount of rain. This is called Regression problem.
• Unsupervised Learning becomes useful when labelled datasets are not available. Without explicit instructions, model attempts to find structure in the data. Clustering, anomaly detection, association, autoencoders are different ways to organized data.

AI systems use synchronous learning to reward/penalise right/wrong decisions and prevent future mishaps. Reinforcement Learning is concurrently applied in the decision process as a result of series of actions.

• How does ML differ from regression modelling?

By purpose, ML is for predictions whereas regression is to infer relationships between variables. But this is oversimplifying the truth. Regression can also be used for predictions. In general, statistical models are easier to interpret. They use all of the data towards building the model, whereas ML partitions the dataset into training and test sets.

Linear regression comes from statistics and may be considered "too simple" to be treated as an ML approach. However, ridge or lasso regression are derived from linear regression, and these are commonly used by ML researchers. It's therefore sensible to include regression in an ML toolbox. In fact, understanding data should be the main goal and both statistical and ML models should be seen as tools.

Statistical modelling is highly contextual and with assumptions. For instance, the math behind linear regression and logistic regression are very different. Machine learning generalizes them under supervised learning and optimizes for minimum error. All machines use numerical math techniques to iteratively solve for unknown parameters.

• What is feature engineering and why is it important?

Feature engineering comes after data is cleaned and transformed. Feature engineering is arriving at relevant variables that relate to solving the problem at hand. Feature engineering is done by domain experts who understand what each variable means, how to interpret it and how it relates to other variables.

A dataset will typically contain one or more variables or features. Some of these may influence the outcome. For example, temperature and humidity may be features that influence the chance of rain in the next six hours. The data may also contain the time of day or day of the week but these are features that probably don't influence the chance of rain.

The job of an ML engineer is therefore to identify the right features for the problem. The selected features add up to the outcome of model. The accuracy of the ML model directly depends on features the ML engineer has chosen. Good features make modelling a simpler task. Bad features would result in a complicated model.

• How does ML add value to Big Data?

A Forrester report from 2016 showed that 60-73% of all big data within enterprises are not used for analytics. In 2018, a report by DataRPM on the Industrial Internet-of-Things (IIoT) showed that 85% of sensor data collected from trillions of data points are unused. Manual analysis is impossible. The use of ML can unlock the value in this data.

Today's data coming from the mobile and web include video, audio, image and text. In addition, lots of data come from sensors in automotive and healthcare verticals. Historic data in financial services or retail help ML algorithms discover patterns. Problems that rely on such data can be modelled better with the help of ML.

In fact, ML likes large volumes and variety of data. In an ideal case, ML has access to lots of data collected in diverse scenarios. This enables efficient and holistic learning.

• What kind of problems can be solved with ML?

Broadly, the following problems are solved with ML:

• Regression: This is the task of predicting a continuous quantity. Here, predictions are often made for quantities, such as amounts and sizes. For example, a house may be predicted to sell for a specific dollar value, perhaps in the range of $100,000 to$200,000.
• Classification: This is the task of predicting a discrete class label. For example, 1. an email of text can be classified as belonging to one of two classes: 'spam' and 'not spam', 2. image classification problems where there could be thousands classes (cat, dog, fish, car, etc.).
• What's the approach to solving ML Problems?

In a typical ML pipeline we would classify the problem, gather data, process data, model the problem, execute the models, validate results, and deploy the solution.

Once you have defined the problem and outlined the features, you then need to split the data in a way that's easy to test. You split this data in a 70 (train) : 30 (test) ratio. 70% with which machines learns and 30% where it tests learning. The training data is modelled for validation. This model needs to be validated with testing dataset and evaluated against multiple models to find the best model.

The idea of splitting data into training and test datasets can be traced to the Common Task Framework.

It's important to have an acceptable accuracy percentage (say, 60%+) across both training and testing datasets. If the accuracy rate isn't high enough or not consistent across the two datasets, then the ML process should be repeated with different or modified features.

• What is overfitting in the context of ML?

Often we read too much into past. We're surprised to see that history didn't repeat itself. This could happen either because response is specific to one particular circumstance or there's too little data. When this happens in ML, we call it overfitting.

The possibility of overfitting exists because the criterion used for selecting the model may not be the same as the criterion used to judge the suitability of a model. For example, a model might be selected by maximizing its performance on some set of training data, and yet its suitability must be determined by its ability to perform well on unseen data. We can state that overfitting occurs when a model has memorized training data rather than learned to generalize from a trend.

• Could you compare or contrast Machine Learning (ML), Deep Learning (DL) and Artificial Intelligence (AI)?

This is better explained through an example. ML is about learning a task. For instance, a self-driving car learns many tasks: to brake or not to brake, speed up or slow down, turn the steering wheel, indicator functions, etc. While ML learns all these tasks separately, AI executes them in a coordinated manner, rewards good decisions, and penalises wrong decisions. AI also accounts for contextual information that may not be part of ML. Thus, ML is an approach to realizing AI.

DL is special case of ML and it's inspired by how neurons in the human brain process information. While ML learns once, DL does it in multiple stages. ML expects features to be provided at the input. DL discovers features on its own. When problems are complex, DL does better than ML. For example, recognizing a human may involve identifying basic features (eyes, ears, hands, legs, etc.) at stage 1; and identifying higher order features (face, upper body, lower body, etc.) in stage 2; and finally calling it out as 'Human' in stage 3.

• How to improve the accuracy of ML models?
• Ensemble methods are techniques that create multiple models and then combine them to produce better results. For example, a candidate goes through multiple rounds of job interviews. Although a single interviewer might not be able to test the candidate for each required skill and trait, the combined feedback of multiple interviewers usually helps in better assessment of the candidate.
• Bagging (Bootstrap Aggregating) is an ensemble method. First, we create random samples of the training dataset (subsets of the training dataset). We build a classifier for each sample. Finally, results of these multiple classifiers are combined using averaging or majority voting. Bagging helps to reduce the variance error.
• Boosting: The first predictor starts by classifying original dataset with equal weights to each observation. If classes are predicted incorrectly using the first learner, then it gives higher weight to the wrongly classified observation for the successive learner. Being an iterative process, it continues to add classifier learner until a limit is reached in the number of models or accuracy. Boosting has shown better predictive accuracy than bagging, but tends to overfit the training data.
• In what scenarios are ML not applicable or has failed?

ML is applied in diverse fields where plenty of data is available. There are scenarios where ML has challenges, but constant endeavor ensures improved accuracy and increased acceptability. In the accompanying figure we can see how ImageNet ML algorithms have evolved for better accuracy.

Failure of ML can be attributed to incorrect problem formulation, wrong choice of features or inappropriate algorithms.

ML algorithms can become biased due to various reasons. For example, an algorithm that sees only men writing code and only women in kitchen in its training data will naturally become biased. In a real-world case of bias, Google Allo once responded with a turban emoji when shown a gun emoji. Google Translate showed gender bias in Turkish-English translations. Amazon's AI-based recruiting tool was found to be favour male candidates.

Other failures of AI/ML happened in 2018. Uber's self-driving car killed a pedestrian. IBM's Watson AI Health has failed to impress doctors.

## Milestones

1950

Alan Turing creates the Turing Test in which a computer must attempt to pass itself of as a human to other humans. In June 2014, a robot named Eugene passes this test by convincing 33% of the judges. A more difficult variant called Loebner Prize requires that more than 50% of the judges be convinced after a 25-min conversation. As of March 2018, no robot has won the prize.

1952

Arthur Samuel writes the first learning program. Applied to the game of checkers, the program is able to learn from mistakes and improve its gameplay with each new game. By mid-1970s, the program beats humans at checkers. Board games are useful in developing ML because they are understandable and complex.

1957

Just as the human brain is composed of interconnected neurons, Frank Rosenblatt designs the first artificial neural network called the perceptron. The idea is to solve complex problems through a series of simple decisions. Rosenblatt applies it for doing image recognition.

1967

The Nearest Neighbour algorithm is created and applied to map routing. This starts the field of pattern recognition.

1979

Invented by researchers at Stanford University, a robot now named the Stanford Cart is able to navigate obstacles in a room on its own.

1981

Gerald Dejong invents Explanation Based Learning. Computer uses data to train itself and create a rule to achieve a given goal. It discards information irrelevant to the problem. This is a type of supervised learning. In general, the 1980s is the decade of expert systems that are based on rules.

1990

The 1990s is the decade when approach to ML shifts from being knowledge driven to data driven. This is supported through the next two decades with greater availability of data, cloud computing and big data technologies.

2006

Geoffrey Hinton coins the term Deep Learning (DL) to describe new architectures of neural networks. This approach is applied to image recognition.

2012

The Google Brain project uses DL to detect visual patterns. Google X project applies Google Brain to YouTube videos to identity frames that contain cats. Geoffrey Hinton leads a team and wins ImageNet's computer vision contest by a large margin. This popularizes DL. In the coming years, DL becomes an important technique to create models with much better accuracy. This is the decade when DL becomes feasible.

2015

Google's AlphaGo uses ML to beat professional player Lee Sedol in a challenging board game called Go.

2017

Google Brain chief Jeff Dean states that DL starts to work with at least 100,000 data points. This underscores the importance of data availability for DL.

Author
No. of Edits
No. of Chats
DevCoins
11
1
1400
2
0
765
2
0
232
1
0
114
2
0
112
2470
Words
5
Likes
5870
Hits

## Cite As

Devopedia. 2020. "Machine Learning." Version 18, December 31. Accessed 2022-09-22. https://devopedia.org/machine-learning
Contributed by
5 authors

Last updated on
2020-12-31 07:15:06
• Site Map