Machine Learning is a data-oriented technique that enables computers to learn from experience. Human experience comes from our interaction with the environment. For computers, experience is indirect. It's based on data collected from the world, data about the world.
What we call data is really vast amounts of historical circumstances and reactions in a machine readable format. The machine (computer) learns permutations and combinations of a given circumstance and reacts appropriately. There is uncertainty in circumstance and hence reaction. This is the unknown that the machine has to learn. Machines aim to maximize the desired outcome.
Machine learning due to its holistic approach can solve a broad set of problems such as object detection, voice recognition, computational biology, price forecasting, and more.
How do machines learn?
Traditionally, intelligence was introduced into a system explicitly using rules. Rules took the form of "if this happens while in this state, do that". These rules are derived from a knowledge base that's particular to that domain or application. However, such a rule-based system has limitations. To characterize the system completely, there could be potentially hundreds of rules. Moreover, rules come with exceptions that need to be considered as well. This is clearly not manageable for complex systems.
Machine Learning takes a different approach. Instead of working on pre-defined rules, machines look at large amounts of data. For each data point, they take note of the associated response. They do this for sufficient amount of data and thereby implicitly learn the rules. These implicit rules can be described in terms of features and outcomes.
For machines to learn properly, relevant and wide-ranging data should be made available. Data should cover all possible scenarios. Data is typically split into training dataset and testing dataset. Machines learn from the former set. The latter is used exclusively to validate the model. The learning process is not linear. It's self-correcting and iterative.
What are the different Machine Learning types?
Learning takes place based on what worked through historical events (asynchronous learning) and on what is accepted in contemporary events (synchronous learning).
When machine is trained using historical data, the learnings can be classified as Supervised and Unsupervised learning:
- Supervised Learning uses a self-correcting feedback loop. The expectation is labelled. For instance, Temperature, Moisture and Humidity (called features) can be used to predict the chance of rain in the next 24 hours. Historic data that include Temperature, Moisture and Humidity are recorded and labelled as 'Rain' or 'Not Rain' depending on whether it rained or not rained in the following 24 hours. This is called Classification problem. The system can also be designed to learn the amount of rain. This is called Regression problem.
- Unsupervised Learning becomes useful when labelled datasets are not available. Without explicit instructions, model attempts to find structure in the data. Clustering, anomaly detection, association, autoencoders are different ways to organized data.
AI systems use synchronous learning to reward/penalise right/wrong decisions and prevent future mishaps. Reinforcement Learning is concurrently applied in the decision process as a result of series of actions.
How does ML differ from regression modelling?
By purpose, ML is for predictions whereas regression is to infer relationships between variables. But this is oversimplifying the truth. Regression can also be used for predictions. In general, statistical models are easier to interpret. They use all of the data towards building the model, whereas ML partitions the dataset into training and test sets.
Linear regression comes from statistics and may be considered "too simple" to be treated as an ML approach. However, ridge or lasso regression are derived from linear regression, and these are commonly used by ML researchers. It's therefore sensible to include regression in an ML toolbox. In fact, understanding data should be the main goal and both statistical and ML models should be seen as tools.
Statistical modelling is highly contextual and with assumptions. For instance, the math behind linear regression and logistic regression are very different. Machine learning generalizes them under supervised learning and optimizes for minimum error. All machines use numerical math techniques to iteratively solve for unknown parameters.
What is feature engineering and why is it important?
Feature engineering comes after data is cleaned and transformed. Feature engineering is arriving at relevant variables that relate to solving the problem at hand. Feature engineering is done by domain experts who understand what each variable means, how to interpret it and how it relates to other variables.
A dataset will typically contain one or more variables or features. Some of these may influence the outcome. For example, temperature and humidity may be features that influence the chance of rain in the next six hours. The data may also contain the time of day or day of the week but these are features that probably don't influence the chance of rain.
The job of an ML engineer is therefore to identify the right features for the problem. The selected features add up to the outcome of model. The accuracy of the ML model directly depends on features the ML engineer has chosen. Good features make modelling a simpler task. Bad features would result in a complicated model.
How does ML add value to Big Data?
A Forrester report from 2016 showed that 60-73% of all big data within enterprises are not used for analytics. In 2018, a report by DataRPM on the Industrial Internet-of-Things (IIoT) showed that 85% of sensor data collected from trillions of data points are unused. Manual analysis is impossible. The use of ML can unlock the value in this data.
Today's data coming from the mobile and web include video, audio, image and text. In addition, lots of data come from sensors in automotive and healthcare verticals. Historic data in financial services or retail help ML algorithms discover patterns. Problems that rely on such data can be modelled better with the help of ML.
In fact, ML likes large volumes and variety of data. In an ideal case, ML has access to lots of data collected in diverse scenarios. This enables efficient and holistic learning.
What kind of problems can be solved with ML?
Broadly, the following problems are solved with ML:
- Regression: This is the task of predicting a continuous quantity. Here, predictions are often made for quantities, such as amounts and sizes. For example, a house may be predicted to sell for a specific dollar value, perhaps in the range of $100,000 to $200,000.
- Classification: This is the task of predicting a discrete class label. For example, 1. an email of text can be classified as belonging to one of two classes: 'spam' and 'not spam', 2. image classification problems where there could be thousands classes (cat, dog, fish, car, etc.).
What's the approach to solving ML Problems?
In a typical ML pipeline we would classify the problem, gather data, process data, model the problem, execute the models, validate results, and deploy the solution.
Once you have defined the problem and outlined the features, you then need to split the data in a way that's easy to test. You split this data in a 70 (train) : 30 (test) ratio. 70% with which machines learns and 30% where it tests learning. The training data is modelled for validation. This model needs to be validated with testing dataset and evaluated against multiple models to find the best model.
The idea of splitting data into training and test datasets can be traced to the Common Task Framework.
It's important to have an acceptable accuracy percentage (say, 60%+) across both training and testing datasets. If the accuracy rate isn't high enough or not consistent across the two datasets, then the ML process should be repeated with different or modified features.
What is overfitting in the context of ML?
Often we read too much into past. We're surprised to see that history didn't repeat itself. This could happen either because response is specific to one particular circumstance or there's too little data. When this happens in ML, we call it overfitting.
The possibility of overfitting exists because the criterion used for selecting the model may not be the same as the criterion used to judge the suitability of a model. For example, a model might be selected by maximizing its performance on some set of training data, and yet its suitability must be determined by its ability to perform well on unseen data. We can state that overfitting occurs when a model has memorized training data rather than learned to generalize from a trend.
Could you compare or contrast Machine Learning (ML), Deep Learning (DL) and Artificial Intelligence (AI)?
This is better explained through an example. ML is about learning a task. For instance, a self-driving car learns many tasks: to brake or not to brake, speed up or slow down, turn the steering wheel, indicator functions, etc. While ML learns all these tasks separately, AI executes them in a coordinated manner, rewards good decisions, and penalises wrong decisions. AI also accounts for contextual information that may not be part of ML. Thus, ML is an approach to realizing AI.
DL is special case of ML and it's inspired by how neurons in the human brain process information. While ML learns once, DL does it in multiple stages. ML expects features to be provided at the input. DL discovers features on its own. When problems are complex, DL does better than ML. For example, recognizing a human may involve identifying basic features (eyes, ears, hands, legs, etc.) at stage 1; and identifying higher order features (face, upper body, lower body, etc.) in stage 2; and finally calling it out as 'Human' in stage 3.
How to improve the accuracy of ML models?
- Ensemble methods are techniques that create multiple models and then combine them to produce better results. For example, a candidate goes through multiple rounds of job interviews. Although a single interviewer might not be able to test the candidate for each required skill and trait, the combined feedback of multiple interviewers usually helps in better assessment of the candidate.
- Bagging (Bootstrap Aggregating) is an ensemble method. First, we create random samples of the training dataset (subsets of the training dataset). We build a classifier for each sample. Finally, results of these multiple classifiers are combined using averaging or majority voting. Bagging helps to reduce the variance error.
- Boosting: The first predictor starts by classifying original dataset with equal weights to each observation. If classes are predicted incorrectly using the first learner, then it gives higher weight to the wrongly classified observation for the successive learner. Being an iterative process, it continues to add classifier learner until a limit is reached in the number of models or accuracy. Boosting has shown better predictive accuracy than bagging, but tends to overfit the training data.
In what scenarios are ML not applicable or has failed?
ML is applied in diverse fields where plenty of data is available. There are scenarios where ML has challenges, but constant endeavor ensures improved accuracy and increased acceptability. In the accompanying figure we can see how ImageNet ML algorithms have evolved for better accuracy.
Failure of ML can be attributed to incorrect problem formulation, wrong choice of features or inappropriate algorithms.
ML algorithms can become biased due to various reasons. For example, an algorithm that sees only men writing code and only women in kitchen in its training data will naturally become biased. In a real-world case of bias, Google Allo once responded with a turban emoji when shown a gun emoji. Google Translate showed gender bias in Turkish-English translations. Amazon's AI-based recruiting tool was found to be favour male candidates.
Other failures of AI/ML happened in 2018. Uber's self-driving car killed a pedestrian. IBM's Watson AI Health has failed to impress doctors.
Alan Turing creates the Turing Test in which a computer must attempt to pass itself of as a human to other humans. In June 2014, a robot named Eugene passes this test by convincing 33% of the judges. A more difficult variant called Loebner Prize requires that more than 50% of the judges be convinced after a 25-min conversation. As of March 2018, no robot has won the prize.
Arthur Samuel writes the first learning program. Applied to the game of checkers, the program is able to learn from mistakes and improve its gameplay with each new game. By mid-1970s, the program beats humans at checkers. Board games are useful in developing ML because they are understandable and complex.
The Google Brain project uses DL to detect visual patterns. Google X project applies Google Brain to YouTube videos to identity frames that contain cats. Geoffrey Hinton leads a team and wins ImageNet's computer vision contest by a large margin. This popularizes DL. In the coming years, DL becomes an important technique to create models with much better accuracy. This is the decade when DL becomes feasible.
- Aporras. 2016. "Difference between Bagging and Boosting?" QuantDare, April 20. Accessed 2018-04-15.
- Barrett, Jeff. 2018. "Up to 73 Percent of Company Data Goes Unused for Analytics. Here's How to Put It to Work." Inc.com, April 12. Accessed 2020-08-02.
- Bhande, Ahup. 2018. "What is underfitting and overfitting in machine learning and how to deal with it." Medium, March 12. Accessed 2018-04-13.
- Bhatia, Ankur. 2017. "Improve Threat Classification Accuracy With Supervised Machine Learning" Security Intelligence, January 6. Accessed 2018-04-08.
- Brownlee, Jason. 2013. "How to Define Your Machine Learning Problem." Machine Learning Mastery, December 23. Accessed 2018-04-08.
- Brownlee, Jason. 2016. "Overfitting and Underfitting With Machine Learning Algorithms." Machine Learning Mastery, March 21. Accessed 2020-08-02.
- Brownlee, Jason. 2017. "Difference Between Classification and Regression in Machine Learning." Machine Learning Mastery, December 11. Accessed 2018-04-08.
- Brownlee, Jason. 2019. "A Gentle Introduction to Uncertainty in Machine Learning." Machine Learning Mastery, September 13. Accessed 2020-08-02.
- Build With Google Cloud. 2018. "A history of machine learning." Accessed 2018-04-08.
- Casari, Amanda and Zheng, Alice. 2018. "Feature Engineering for Machine Learning." O'Reilly Media, Inc., April. Accessed 2020-08-02.
- Copeland, Michael. 2016. "What’s the Difference Between Artificial Intelligence, Machine Learning and Deep Learning?" Blog, NVIDIA, June 29. Accessed 2020-08-02.
- Desjardins-Proulx, Philippe. 2013. "Machine learning and deep transfer learning." July 5. Accessed 2018-04-13.
- Despois, Julien. 2018. "Memorizing is not learning! — 6 tricks to prevent overfitting in machine learning." Hackernoon, March 20. Accessed 2020-08-02.
- Donoho, David. 2015. "50 years of Data Science." Based on a presentation at the Tukey Centennial workshop, Princeton NJ, Version 1.00, September 18. Accessed 2018-04-11.
- Ford, Glen. 2018. "4 human-caused biases we need to fix for machine learning." The Next Web, October 27. Accessed 2019-01-21.
- Frank, Blair Hanley. 2017. "Google Brain chief: Deep learning takes at least 100,000 examples." VentureBeat, October 23. Accessed 2018-04-08.
- Garbade, Michael J. 2018. "Clearing the Confusion: AI vs Machine Learning vs Deep Learning Differences." Towards Data Science, on Medium, September 15. Accessed 2020-08-02.
- Gonzalez, Victor. 2018. "A Brief History of Machine Learning." Synergic Partners, March. Accessed 2018-04-08.
- Google Developers. 2020. "Training and Test Sets: Splitting Data." Machine Learning Crash Course, Google Developers. February 10. Accessed 2020-08-02.
- Gualtieri, Mike. 2016. "Hadoop Is Data’s Darling For A Reason." Blog, Forrester, January 21. Accessed 2020-08-02.
- Guo, Yanming, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S. Lew. 2016. "Deep learning for visual understanding: A review." Neurocomputing, vol. 187, pp. 27-48. Accessed 2018-04-13.
- Hale, Kerri. 2018. "Machine Learning and Big Data — Real-World Applications." Towards Data Science, on Medium, March 27. Accessed 2020-08-02.
- Hern, Alex. 2014. "What is the Turing test? And are we all doomed now?" The Guardian, June 9. Accessed 2018-04-07.
- Jain, Kunal. 2015. "Machine Learning basics for a newbie." Analytics Vidhya, June 11. Accessed 2018-04-08.
- Kaushik, Saurav. 2017. "How to build Ensemble Models in machine learning?" Analytics Vidhya, February 15. Accessed 2018-04-08.
- Krauth, Oliva. 2018. "Artificial ignorance: The 10 biggest AI failures of 2017." TechRepublic, January 04. Accessed 2019-01-21.
- MC.AI. 2020. "Machine Learning vs Rules Based Approach to Building Decisioning Software." MC.AI, June 5. Accessed 2020-08-02.
- Marr, Bernard. 2016. "A Short History of Machine Learning -- Every Manager Should Read." Forbes, February 19. Accessed 2018-04-08.
- MathWorks. 2020. "What Is Machine Learning? 3 things you need to know." MathWorks. Accessed 2020-08-02.
- Mayo, Matthew. 2017. "Is Regression Analysis Really Machine Learning?" KDnuggets, June. Accessed 2020-08-02.
- Microsoft Docs. 2020. "Weather forecast using the sensor data from your IoT hub in Azure Machine Learning." IoT Hub, Azure, Microsoft, February 10. Accessed 2020-08-02.
- Peng, Tony. 2018. "2018 in Review: 10 AI Failures." Synced, December 10. Accessed 2019-01-21.
- Pinterest. 2018. "AI, ML, DL." Accessed 2018-04-14.
- Project ARM. 2017. "Image Recognition: a short history and all you need to know about it." Project ARM, January 18. Accessed 2018-04-13.
- Raschka, Sebastian. 2017. "3 different types of machine learning." KDnuggets, November. Accessed 2018-04-13.
- Ray, Sunil. 2015. "5 Easy questions on Ensemble Modeling everyone should know." Analytics Vidhya, September 30. Accessed 2018-04-08.
- Reese, Hope. 2016. "Top 10 AI failures of 2016." TechRepublic, December 2. Accessed 2018-04-13.
- Salian, Isha. 2018. "SuperVize Me: What’s the Difference Between Supervised, Unsupervised, Semi-Supervised and Reinforcement Learning?" Blog, NVDIA, August 2. Accessed 2020-08-02.
- Sapp, Carlton E. 2017. "Preparing and Architecting for Machine Learning." Technical Professional Advices, Gartner, January 17. Accessed 2019-01-21.
- Saxena, Deepanker. 2018. "Machine Learning vs Rules Based Systems." Blog, Socure, August 6. Accessed 2020-08-02.
- Sheth, Aneri. 2017. "History of Machine Learning." Bloombench, August 25. Accessed 2018-04-08.
- Stewart, Matthew. 2019. "The Actual Difference Between Statistics and Machine Learning." Towards Data Science, on Medium, March 25. Accessed 2020-08-02.
- Tiwari, Satyam Prasad. 2017. "Evolution and History Of Machine Learning." BitsDroid, November 11. Accessed 2018-04-08.
- Wikipedia. 2018. "Timeline of machine learning." April 2. Accessed 2018-04-08.
- Data Science
- Regression Modelling
- Artificial Neural Network
- Deep Learning
- Statistical Classification