# Machine Learning Model

In traditional programming, a function or program reads a set of input data, processes them and outputs the results. Machine Learning (ML) takes a different approach. Lots of input data and corresponding outputs are given. ML employs an algorithm to learn from this dataset and outputs a "function". This function or program is what we call an ML Model.

Essentially, the model encapsulates a relationship or pattern that maps the input to the output. The model learns this automatically without being explicitly programmed with fixed rules or patterns. The model can then be given unseen data for which it predicts the output.

ML models come in different shapes and formats. Model metadata and evaluation metrics can help compare different models.

## Discussion

• Could you explain ML models with some examples?

Consider a function that reads Celsius value and outputs Fahrenheit value. This implements a simple mathematical formula. In ML, once the model is trained on the dataset, the formula is implicit in the model. It can read new Celsius values and give correct Fahrenheit values.

Let's say we're trying to estimate house prices based on attributes. It may be that houses with more than two bedrooms fall into a higher price bracket. Areas 8500 sq.ft. and 11500 sq.ft are important thresholds at which prices tend to jump. Rather than encode these rules into a function, we can build a ML model to learn these rules implicitly.

In another dataset, there are three species of irises. Each iris sample has four attributes: sepal length/width, petal length/width. An ML model can be trained to recognize three distinct clusters based on these attributes. All flowers belonging to a cluster are of the same species.

In all these examples, ML saves us the trouble of writing functions to predict the output. Instead, we train an ML model to implicitly learn the function.

• What are the essentials that help an ML model learn?

There are many types (aka shapes/structures/architectures) of ML models. Typically, this structure is not selected automatically. The data scientist pre-selects the structure. Given data, the model learns within the confines of the chosen structure. We may say that the model is fine-tuning the parameters of its structure as it sees more and more data.

The model learns in iterations. Initially, it will make poor predictions, that is, predicted output deviate from actual output. As it sees more data, it gets better. Prediction error is quantified by a cost/loss function. Every model needs such a function to know how well it's learning and when to stop learning.

The next essential aspect of model training is the optimizer. It tells the model how to adjust its parameters with each iteration. Essentially, the optimizer attempts to minimize the loss function.

If results are poor, the data scientist may modify or even select a different structure. She may pre-process the input differently or focus on certain aspects of the input, called features. These decisions could be based on experience or analysis of wrong predictions.

• What possible structures, loss functions and optimizers are available to train an ML model?

Classical ML offers many possible model structures. For example, Scikit-Learn has model structures for regression, classification and clustering problems. Some of these include linear regression, logistic regression, Support Vector Machine (SVM), Stochastic Gradient Descent (SGD), nearest neighbour, Guassian process, Naive Bayes, decision tree, ensemble methods, k-Means, and more.

For building neural networks, many architectures are possible: Feed-Forward Neural Network (FFNN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU), Long Short Term Memory (LSTM), Autoencoder, Attention Network, and many more. In code, these can be built using building blocks such as convolution, pooling, padding, normalization, dropout, linear transforms, non-linear activations, and more.

TensorFlow supports many loss functions: BinaryCrossentropy, CategoricalCrossentropy, CosineSimilarity, KLDivergence, MeanAbsoluteError, MeanSquaredError, Poisson, SquaredHinge, and more. Among the optimizers are Adadelta, Adagrad, Adam, Adamax, Ftrl, Nadam, RMSprop, and SGD.

• What exactly is saved in an ML model?

ML frameworks typically support different ways to save the model:

• Only Weights: Weights or parameters represent the model's current state. During training, we may wish to save checkpoints. A checkpoint is a snapshot of the model's current state. A checkpoint includes model weights, optimizer state, current epoch and training loss. For inference, we can create a fresh model and load the weights of a fully trained model.
• Only Architecture: Specifies the model's structure. If it's a neural network, this would be details of each layer and how they're connected. Data scientists can share model architecture this way, with each one training the model to suit their needs.
• Complete Model: This includes model architecture, the weights, optimizer state, and a set of losses and metrics. In PyTorch, this is less flexible since serialized data is bound to specific classes and directory structure.

In Keras, when saving only weights or the complete model, *.tf and *.h5 file formats are applicable. YAML or JSON can be used to save only the architecture.

• Which are the formats in which ML models are saved?

Open Neural Network Exchange (ONNX) is open format that enables interoperability. A model in ONNX can be used with various frameworks, tools, runtimes and compilers. ONNX also makes it easier to access hardware optimizations.

A number of ML frameworks are out there, each saving models in its own format. TensorFlow saves models as protocol buffer files with *.pb extension. PyTorch saves models with *.pt extension. Keras saves in HDF5 format with *.h5 extension. An older XML-based format supported by Scikit-Learn is Predictive Model Markup Language (PMML). SparkML uses MLeap format and files are packaged into a *.zip file. Apple's Core ML framework uses *.mlmodel file format.

In Python, Scikit-Learn adopts pickled Python objects with *.pkl extension. Joblib with *.joblib extension is an alternative that's faster than Pickle for large NumPy arrays. If XGBoost is used, then a model can be saved in *.bst, *.joblib or *.pkl formats.

With some formats, it's possible to save not just models but also pipelines composed of multiple models. Scikit-Learn is an example that can export pipelines in Joblib, Pickle, or PMML formats.

• What metadata could be useful along with an ML model?

Data scientists conduct multiple experiments to arrive at a suitable model. Without metadata and proper management of such metadata, it becomes difficult to reproduce the results and deploy the model into production. ML metadata also enables us to do auditing, compare models, understand provenance of artefacts, identify reusable steps for model building, and warn if data distribution in production deviates from training.

To facilitate these, metadata should include model type, types of features, pre-processing steps, hyperparameters, metrics, performance of training/test/validation steps, number of iterations, if early stopping was enabled, training time, and more.

A saved model (also called exported or serialized model), will need to be deserialized when doing predictions. Often, the versions of packages or even the runtime will need to be the same as those during serialization. Some recommend saving a reference to an immutable version of training data, version of source code that trained the model, versions of libraries and their dependencies, and the cross-validation score. For reproducible results across platform architectures, it's a good idea to deploy models within containers, such as Docker.

• Which are some useful tools when working with ML models?

There are tools to visualize an ML model. Examples include Netron and VisualDL. These display the model's computational graph. We can see data samples, histograms of tensors, precision-recall curves, ROC curves, and more. These can help us optimize the model better.

Since ONNX format aids interoperability, there are converters that can convert from other formats to ONNX. One such tool is ONNXMLTools that supports many formats. It's also a wrapper for other converters such as keras2onnx, tf2onnx and skl2onnx. ONNX GitHub code repository lists many more converters. Many formats can be converted to Apple Core ML's format using Core ML Tools. For Android, tf.lite.TFLiteConverter converts a Keras model to TFLite.

Sometimes converters are not required. For example, PyTorch can natively export to ONNX.

ONNX models themselves can be simplified and there are optimizers to do this. ONNX Optimizer is one tool. ONNX Simplifier is another, built using ONNX Optimizer. It basically looks at the whole graph and replaces redundant operators with their constant outputs. There's a ready-to-use online version of ONNX Simplifier.

## Milestones

1952

At IBM, Arthur Samuel writes the first learning program. Applied to the game of checkers, the program is able to learn from mistakes and improve its gameplay with each new game. In 1959, Samuel popularizes the term Machine Learning in a paper titled Some Studies in Machine Learning Using the Game of Checkers.

1986

Rumelhart et al. publish the method of backpropagation and show how it can be used to optimize the weights of neurons in artificial neural networks. This kindles renewed interest in neural networks. Although backpropagation was invented in the 1960s and developed by Paul Werbos in 1974, it was ignored back then due to the general lack of interest in AI.

1990

In this decade, ML shifts from a knowledge-driven to a data-driven approach. With the increasing use of statistics and neural networks, ML tackles practical problems rather than lofty goals of AI. Also during the 1990s, Support Vector Machine (SVM) emerges as an important ML technique.

2006

Hinton et al. publish a paper showing how a network of many layers can be trained by smartly initializing the weights. This paper is later seen as the start of Deep Learning movement, which is characterized by many layers, lots of training data, parallelized hardware and scalable algorithms. Subsequently, many DL frameworks are released, particularly in 2015.

Jun
2016

Vartak et al. propose ModelDB, a system for ML model management. Data scientists can use this to compare, explore or analyze models and pipelines. The system also manages metadata, quality metrics, and even training and test data. In general, from the mid-2000s we see interest in ML model management and platforms. Examples include Data Version Control (DVC) (2017), Kubeflow (2018), ArangoML Pipeline (2019), and TensorFlow Extended (TFX) (2019 public release).

Sep
2017

Microsoft and Facebook come together to announce Open Neural Network Exchange (ONNX). This is proposed as a common format for ML models. With ONNX, we obtain framework interoperability (developers can move their models across frameworks) and shared optimizations (hardware vendors and others can target ONNX for optimizations).

Jul
2019

While there are tools to convert from other formats to ONNX, one ML expert notes some limitations. For example, ATen operators in PyTorch are not supported in ONNX. This operator is not standardized in ONNX. However, it's possible to still export to ONNX by updating PyTorch source code, which is something only advanced users are likely to do.

Mar
2020

In an image classification task, a performance comparison of ONNX format with PyTorch format shows that ONNX is faster during inference. Improvements are higher at lower batch sizes. On another task, ONNX shows as much as 600% improvement over Scikit-Learn. Further improvements could be obtained by tuning ONNX for specific hardware.

## Sample Code

• # Source: https://cloud.google.com/ai-platform/prediction/docs/exporting-for-prediction
# Accessed 2021-01-01

# ---------------- Save model ----------------
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib

classifier = RandomForestClassifier()
classifier.fit(iris.data, iris.target)

joblib.dump(classifier, 'model.joblib')

# -------------- Save pipeline ---------------
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib
from sklearn.feature_selection import chi2
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline

pipeline = Pipeline([
('feature_selection', SelectKBest(chi2, k=2)),
('classification', RandomForestClassifier())
])
pipeline.fit(iris.data, iris.target)

joblib.dump(pipeline, 'model.joblib')


Author
No. of Edits
No. of Chats
DevCoins
3
0
1673
1
3
24
1924
Words
7
Likes
7325
Hits

## Cite As

Devopedia. 2021. "Machine Learning Model." Version 4, January 1. Accessed 2024-06-25. https://devopedia.org/machine-learning-model
Contributed by
2 authors

Last updated on
2021-01-01 13:53:52
• ML Model Debugging
• MLOps
• Machine Learning
• Open Neural Network Exchange
• Machine Learning as a Service
• Evaluation Metrics in Machine Learning
• Site Map