# Types of Regression

Regression is widely used for prediction or forecasting where given one or more independent variables we try to predict another variable. For example, given advertising expense, we can predict sales. Given a mother's smoking status and the gestation period, we can predict the baby's birth weight.

There are many types of regression models, one source mentioning as many as 35 different models. An analyst or statistician must select a model that makes sense to the problem. Models differ based on the number of independent variables, type of the dependent variable and how these two are related to each other.

Regression comes from statistics. It's one of many techniques used in machine learning.

## Discussion

• Could you introduce regression?

Suppose there's a dependent or response variable $$Y_i$$ and independent variables or predictors $$X_i$$. The essence of regression is to estimate the function $$f(X_i,\beta)$$ that's a model of how the dependent variable is related to the predictors. Adding an error term or residual $$\epsilon_i$$, we get $$Y_i = f(X_i,\beta) + \epsilon_i$$, for scalar $$Y_i$$ and vector $$X_i$$.

The residual is not seen in data. It's the difference between the observed value $$Y_i$$ and what the model predicts. With the goal of minimizing the residuals, regression estimates model parameters or coefficients $$\beta$$ from data. There are many ways to do this and the term estimation is used for this process.

Regression modelling also makes important assumptions. The sampled data should represent the population. There are no measurement errors in the predictor values. Residuals have zero mean (when conditioned on $$X_i$$) and constant variance. Residuals are also uncorrelated with one another. More assumptions are used depending on the model type and estimation technique.

Regression uncovers useful relationships, that is, how predictors are correlated to the response variable. Regression makes no claim that predictors influence or cause the outcome. Correlation should not be confused for causality.

• How do you classify the different types of regression?

Regression techniques can be classified in many ways:

• Number of Predictors: We can distinguish between Univariate Regression and Multivariate Regression.
• Outcome-Predictors Relationship: When this is linear, we can apply Linear Regression or its many variants. If the relationship is non-linear, we can apply Polynomial Regression or Spline Regression. More generally, when the relationship is known it's Parametric Regression, otherwise it's Non-parametric Regression.
• Predictor Selection: With multiple predictors, sometimes not all of them are important. Best Subsets Regression or Stepwise Regression can find the right subset of predictors. We could penalize too many predictors in the model using Ridge Regression, Lasso Regression or Elastic Net Regression.
• Correlated Predictors: If predictors are correlated, one approach is to transform them into fewer predictors by a linear combination of the original predictors. Principal Component Regression (PCR) and Partial Least Squares (PLS) Regression are two approaches to do this.
• Outcome Type: When predicting categorical data, we can apply Logistic Regression. When outcome is a count variable, we can apply Poisson Regression or Negative Binomial Regression. In fact, a suitable method of regression can be inferred from the distribution of the dependent variable.
• What are the types of linear regression models?

Simple Regression involves only one predictor. For example, $$Y_i = \beta_0 + \beta_{1}X_{1i} + \epsilon_i$$.

If we generalize to many predictors, the term Multiple Linear Regression is used. Consider a bivariate linear model $$Y_i = \beta_0 + \beta_{1}X_{1i} + \beta_{2}X^2_{2i} + \epsilon_i$$. Although there's a square term, the model is still linear in terms of the parameters.

To represent many Multiple Linear Regression models in a compact form we can use the General Linear Model. This generalization allows us to work with many dependent variables dependent on the same independent variables. This also incorporates different statistical models including ANOVA, ANCOVA, OLS, t-test and F-test.

The General Linear Model makes the assumption that $$Y_i ∼ N(X^T_i\beta,\sigma^2)$$, that is, response variable is normally distributed with a mean that's a linear combination of predictors. A larger class of models is called Generalized Linear Model (GLM) that allows $$Y_i$$ to be any distribution of the exponential family of distributions. The General Linear Model is a specialization of the GLM.

If response is affected by randomness, the Generalized Linear Mixed Model (GLMM) can be used.

• Could you compare linear and logistic regression?

Since logistic regression deals with categorical outcomes, it predicts the probability of an outcome rather than a continuous value. Predictions should therefore be restricted to the range 0-1. This is done by transforming the linear regression equation to the logit scale. This is the natural log of the odds of being in one category versus the other categories.

For this reason, logistic regression may be seen as a particular case of GLM. Logit is used as the link function that relates predictors to the outcome.

Logistic regression shares with linear regression many of the assumptions: independence of errors, linearity (but in the logit scale), absence of multicollinearity among predictors, and lack of influential outliers.

There are three types of logistic regressions:

• Binary: Only two outcomes. Example: predict that a student passes a test. When all predictors are categorial, we call them logit models .
• Nominal: More than two outcomes. Also called Multinominal Logistic Regression. Example: predict the colour of an iPhone model a customer is likely to buy.
• Ordinal: More than two ordered outcomes. Example: predicting a medical condition (good, stable, serious, critical).
• Could you explain parametric versus non-parametric regression?

Linear models and even non-linear models are parametric models since we know (or make an educated guess) about how the outcome relates to predictors. Once the model is fixed, the task is to estimate the parameters $$\beta$$ of the model. If we have problems in this estimation, we can revise the model and try again.

Non-parametric regression is more suitable when we have no idea how the outcome relates to the predictors. Usually when the relationship is non-linear, we can adopt non-parametric regression. For example, one study attempting to predict the logarithm of wage from age found that non-parametric regression approaches outperformed simple linear and polynomial regression methods.

Parametric models have a finite set of parameters that try to capture everything about observed data. Model complexity is bounded even with unbounded data. Non-parametric models are more flexible because the model gets better as more data is observed. We can view them as having infinite parameters or functions that we attempt to estimate. Artificial neural networks with infinitely many hidden units is equivalent to non-parametric regression.

• What are some specialized regression models?

We note a few of these with brief descriptions:

• Robust Regression: This is better suited than linear regression in handling outliers or influential observations. Observations are weighted.
• Huber Regression: To handle outliers better, this optimizes a combination of squared error and absolute error.
• Quantile Regression: Linear regression predicts the mean of the dependent variable. Quantile regression predicts the median. More generally, it predicts the nth quantile. For example, predicting the 25th quantile of a house price means that there's 25% chance that the actual price is below the predicted value.
• Functional Sequence Regression: Sometimes predictors affect the outcome in a time-dependent manner. This model includes the time component. For example, onion weight depends on environmental factors at various stages of the onion's growth.
• Regression Tree: Use a decision tree to split the predictor space at internal nodes. Terminal nodes or leaves represent predictions, which are the mean of data points in each partitioned region.
• Could you share examples to illustrate a few regression methods?

In a production plant, there's a linear correlation between water consumption and amount of production. Simple regression suffices in this case, giving the fit as Water = 2273 + 0.0799 Production. Thus, even without any production, 2273 units of water are consumed. Every unit of production increases water consumption by 0.0799 units. Both predictor and outcome are continuous variables.

As an example of multiple linear regression, let's predict the birth weight of a baby (continuous variable) based on two predictors: mother is a smoker or non-smoker (categorial variable) and gestation period (continuous variable). We represent non-smokers as 0 and smokers as 1. The regression equation is Wgt = - 2390 + 143.10 Gest - 244.5 Smoke. If we plot this, we'll actually see two parallel lines, one for smokers and one for non-smokers.

One study looked at the number of cigarettes college students smoked per day. They predicted this count from gender, birth order, education level, social/psychological factors, etc. The study used poisson regression, negative binomial regression, and many others.

• With so many types of regression models, how do I select a suitable one?

To apply linear regression, the main assumptions must be met: linearity, independence, constant variance and normality. Linearity can be checked via graphical analysis. A plot of residuals versus predicted values can show non-linearity, or use goodness of fit test. Non-linear relations can be made linear using transformations of predictors and/or the outcome. These could be log, square root or power transformations. Try adding transformations of current predictors. Try semi or non-parametric models.

In practice, linear regression is sensitive to outliers and cross-correlations. Piecewise linear regression, particularly for time series data, is a better approach. Non-parametric regression can be used when there's an unknown non-linear relationship. SVR is an example of non-parametric regression.

When overfitting is a problem, use cross validation to evaluate models. Ridge, lasso and elastic net models can help tackle overfitting. They can also handle multicollinearity. Quantile regression is suited to handle outliers.

For predicting counts, use negative binomial regression if variance is larger than the mean. Poisson regression can be used only if variance equals the mean.

• What are some tips to analyze model statistics?

Well-known model performance metrics include R-squared (R2), Root Mean Squared Error (RMSE), Residual Standard Error (RSE) and Mean Absolute Error (MAE). We also have metrics that penalize additional predictors: Adjusted R2, Akaike's Information Criteria (AIC) and Bayesian Information Criteria (BIC) and Mallows Cp. Higher the R2 or Adjusted R2, better the model. For all other metrics, lower value implies a better model.

A high t-statistic implies coefficient is probably non-zero. A low p-value on the t-statistic gives confidence on the estimate. Low coefficients and low p-value for the model as a whole can imply multicollinearity. While t-test is applied to individual coefficients, F-test is applied to the overall model.

Two models can be compared graphically. For example, the coefficients and their confidence intervals can be plotted and compared visually.

• What software packages support regression?

In R, functions lm(), summary(), residuals() and predict() in the base package enable linear regression. For GLM, we can use glm() function. Use quantreg package for quantile regression; glmnet for ridge, lasso and elastic net regression; pls for principal component regression; plsdepot for PLS regression; e1071 for Support Vector Regression (SVR); ordinal for ordinal regression; MASS for negative binomial regression; survival for cox regression. Other useful packages are stats, car, caret, sgd, BLR, Lars, and nlme.

In Python, scikit-learn provides a number of modules and functions for regression. Use module sklearn.linear_model for linear regression including logistic, poisson, gamma, huber, ridge, lasso, and elastic net; sklearn.svm for SVR; sklearn.neighbors for k-nearest neighbours regression; sklearn.isotonic for isotonic regression; metrics for regression metrics; sklearn.ensemble for ensemble methods for regression.

## Milestones

1795

Regression starts with Carl Friedrich Gauss with the method of least squares. He doesn't publish the method until much later in 1809. In 1805, Adrien-Marie Legendre invents the same approach independently. Legendre uses it to predict the orbits of comets.

1877

Francis Galton plots in 1877 what may be called the first regression line. It concerns the size of sweet-pea seeds. It correlated the size of daughter seeds against that of mother seeds. Such an analysis came about in the course of investigating Darwin's mechanism for heredity. By these experiments, Galton also introduces the concept of "reversion to the mean", later called regression to the mean.

1915

R.A. Fisher gives the exact sampling distribution of the coefficient of correlation, thus marking the beginning of multivariate analysis. Fisher then simplifies it to a form via z-transformation. In the early 1920s, he introduces the F distribution and maximum likelihood method of estimation.

1957

Hotelling proposes Principal Component Regression (PCR) in an attempt to reduce the number of explanatory variables (predictors) in the regression model. PCR itself is based on Principal Component Analysis (PCA) that was invented independently by Pearson (1901) and Hotelling (1930s).

1960

Although the logistic function was invented by Verhulst in the 1830s, it's only in the 1960s that it's applied to regression analysis. D.R. Cox is among the early researchers to do this. Many researchers, including Cox, independently develop Multinomial Logistic Regression through the 1960s.

1970

Hoerl and Kennard note that least squares estimation is unbiased and this can give poor results if there's multicollinearity among the predictors. To improve the estimation they propose a biased estimation approach that they call Ridge Regression. Ridge regression uses standardized variables, that is, outcome and predictors are subtracted by mean and divided by standard deviation. By introducing some bias, variance of the least squares estimator is controlled.

1972

D.R. Cox applies regression to life-table analysis. Among the sampled individuals, he observes either the time to "failure" or that the individual is removed from the study (called censoring). Moreover, the distribution of survival times is often skewed. For these reasons, linear regression is not suitable. Cox instead uses a hazard function that incorporates age-specific failure rate. In later years, this approach is simply called Cox Regression.

1972

Nelder and Wedderburn introduce the Generalized Linear Model (GLM). As examples, they relate GLM to normal, binomial (probit analysis), poisson (contingency tables), and gamma (variance components) distributions. However, it's only in the 1980s that GLM becomes popular due to the work of McCullagh and Nelder.

1978

Koenker and Bassett introduce Quantile Regression. This uses weighted least absolute error rather than least squares error common in linear regression.

1981

Huber proposes an estimator that's quadratic in small values and grows linearly for large values. It's later named Huber Regression.

1996

Tibshirani proposes Lasso Regression that uses the least squares estimator but constrains the sum of absolute value of coefficients to a maximum. This forces some coefficients to zero or low values, leading to more interpretable models. This is useful when we start with too many predictors.

2002

De'ath proposes the Multivariate Regression Tree (MRT). The history of regression trees goes back to the 1960s. With the release of CART (Classification and Regression Tree) software in 1984, they became more well known. However, CART is limited to a single response variable. MRT extends CART to multivariate response data.

2005

Zou and Hastie propose Elastic Net Regression. This combines elements of both ridge regression and lasso regression.

Author
No. of Edits
No. of Chats
DevCoins
7
3
1948
5
1
213
2527
Words
6
Likes
8494
Hits

## Cite As

Devopedia. 2021. "Types of Regression." Version 12, January 22. Accessed 2022-09-22. https://devopedia.org/types-of-regression
Contributed by
2 authors

Last updated on
2021-01-22 05:29:06
• Site Map