Types of Regression
 Summary

Discussion
 Could you introduce regression?
 How do you classify the different types of regression?
 What are the types of linear regression models?
 Could you compare linear and logistic regression?
 Could you explain parametric versus nonparametric regression?
 What are some specialized regression models?
 Could you share examples to illustrate a few regression methods?
 With so many types of regression models, how do I select a suitable one?
 What are some tips to analyze model statistics?
 What software packages support regression?
 Milestones
 References
 Further Reading
 Article Stats
 Cite As
Regression is widely used for prediction or forecasting where given one or more independent variables we try to predict another variable. For example, given advertising expense, we can predict sales.^{} Given a mother's smoking status and the gestation period, we can predict the baby's birth weight.^{}
There are many types of regression models,^{} one source mentioning as many as 35 different models.^{} An analyst or statistician must select a model that makes sense to the problem. Models differ based on the number of independent variables, type of the dependent variable and how these two are related to each other.
Regression comes from statistics. It's one of many techniques used in machine learning.^{}
Discussion
Could you introduce regression? Suppose there's a dependent or response variable \(Y_i\) and independent variables or predictors \(X_i\). The essence of regression is to estimate the function \(f(X_i,\beta)\) that's a model of how the dependent variable is related to the predictors. Adding an error term or residual \(\epsilon_i\), we get \(Y_i = f(X_i,\beta) + \epsilon_i\), for scalar \(Y_i\) and vector \(X_i\).^{}
The residual is not seen in data. It's the difference between the observed value \(Y_i\) and what the model predicts. With the goal of minimizing the residuals, regression estimates model parameters or coefficients \(\beta\) from data. There are many ways to do this and the term estimation is used for this process.^{}
Regression modelling also makes important assumptions. The sampled data should represent the population. There are no measurement errors in the predictor values. Residuals have zero mean (when conditioned on \(X_i\)) and constant variance. Residuals are also uncorrelated with one another. More assumptions are used depending on the model type and estimation technique.^{}
Regression uncovers useful relationships, that is, how predictors are correlated to the response variable. Regression makes no claim that predictors influence or cause the outcome.^{} Correlation should not be confused for causality.^{}
How do you classify the different types of regression? Regression techniques can be classified in many ways:
 Number of Predictors: We can distinguish between Univariate Regression and Multivariate Regression.^{}
 OutcomePredictors Relationship: When this is linear, we can apply Linear Regression or its many variants. If the relationship is nonlinear, we can apply Polynomial Regression or Spline Regression.^{} More generally, when the relationship is known it's Parametric Regression, otherwise it's Nonparametric Regression.^{}
 Predictor Selection: With multiple predictors, sometimes not all of them are important. Best Subsets Regression or Stepwise Regression can find the right subset of predictors.^{} We could penalize too many predictors in the model using Ridge Regression, Lasso Regression or Elastic Net Regression.^{}
 Correlated Predictors: If predictors are correlated, one approach is to transform them into fewer predictors by a linear combination of the original predictors. Principal Component Regression (PCR) and Partial Least Squares (PLS) Regression are two approaches to do this.^{}
 Outcome Type: When predicting categorical data, we can apply Logistic Regression.^{} When outcome is a count variable, we can apply Poisson Regression or Negative Binomial Regression. In fact, a suitable method of regression can be inferred from the distribution of the dependent variable.^{}
What are the types of linear regression models? Simple Regression involves only one predictor. For example, \(Y_i = \beta_0 + \beta_{1}X_{1i} + \epsilon_i\).^{}
If we generalize to many predictors, the term Multiple Linear Regression is used. Consider a bivariate linear model \(Y_i = \beta_0 + \beta_{1}X_{1i} + \beta_{2}X^2_{2i} + \epsilon_i\). Although there's a square term, the model is still linear in terms of the parameters.^{}
To represent many Multiple Linear Regression models in a compact form we can use the General Linear Model. This generalization allows us to work with many dependent variables dependent on the same independent variables. This also incorporates different statistical models including ANOVA, ANCOVA, OLS, ttest and Ftest.^{}
The General Linear Model makes the assumption that \(Y_i ∼ N(X^T_i\beta,\sigma^2)\), that is, response variable is normally distributed with a mean that's a linear combination of predictors. A larger class of models is called Generalized Linear Model (GLM) that allows \(Y_i\) to be any distribution of the exponential family of distributions. The General Linear Model is a specialization of the GLM.^{}
If response is affected by randomness, the Generalized Linear Mixed Model (GLMM) can be used.^{}
Could you compare linear and logistic regression? Since logistic regression deals with categorical outcomes, it predicts the probability of an outcome rather than a continuous value. Predictions should therefore be restricted to the range 01. This is done by transforming the linear regression equation to the logit scale. This is the natural log of the odds of being in one category versus the other categories.^{}
For this reason, logistic regression may be seen as a particular case of GLM. Logit is used as the link function that relates predictors to the outcome.^{}
Logistic regression shares with linear regression many of the assumptions: independence of errors, linearity (but in the logit scale), absence of multicollinearity among predictors, and lack of influential outliers.^{}
There are three types of logistic regressions:^{}
 Binary: Only two outcomes. Example: predict that a student passes a test. When all predictors are categorial, we call them logit models .^{}
 Nominal: More than two outcomes. Also called Multinominal Logistic Regression. Example: predict the colour of an iPhone model a customer is likely to buy.
 Ordinal: More than two ordered outcomes. Example: predicting a medical condition (good, stable, serious, critical).
Could you explain parametric versus nonparametric regression? Linear models and even nonlinear models are parametric models since we know (or make an educated guess) about how the outcome relates to predictors. Once the model is fixed, the task is to estimate the parameters \(\beta\) of the model. If we have problems in this estimation, we can revise the model and try again.^{}
Nonparametric regression is more suitable when we have no idea how the outcome relates to the predictors. Usually when the relationship is nonlinear, we can adopt nonparametric regression. For example, one study attempting to predict the logarithm of wage from age found that nonparametric regression approaches outperformed simple linear and polynomial regression methods.^{}
Parametric models have a finite set of parameters that try to capture everything about observed data. Model complexity is bounded even with unbounded data. Nonparametric models are more flexible because the model gets better as more data is observed. We can view them as having infinite parameters or functions that we attempt to estimate. Artificial neural networks with infinitely many hidden units is equivalent to nonparametric regression.^{}
What are some specialized regression models? We note a few of these with brief descriptions:
 Robust Regression: This is better suited than linear regression in handling outliers or influential observations. Observations are weighted.^{}
 Huber Regression: To handle outliers better, this optimizes a combination of squared error and absolute error.^{} ^{}
 Quantile Regression: Linear regression predicts the mean of the dependent variable. Quantile regression predicts the median. More generally, it predicts the nth quantile. For example, predicting the 25th quantile of a house price means that there's 25% chance that the actual price is below the predicted value.^{}
 Functional Sequence Regression: Sometimes predictors affect the outcome in a timedependent manner. This model includes the time component. For example, onion weight depends on environmental factors at various stages of the onion's growth.^{}
 Regression Tree: Use a decision tree to split the predictor space at internal nodes. Terminal nodes or leaves represent predictions, which are the mean of data points in each partitioned region.^{}
Could you share examples to illustrate a few regression methods? In a production plant, there's a linear correlation between water consumption and amount of production. Simple regression suffices in this case, giving the fit as
Water = 2273 + 0.0799 Production
. Thus, even without any production, 2273 units of water are consumed. Every unit of production increases water consumption by 0.0799 units. Both predictor and outcome are continuous variables.^{}As an example of multiple linear regression, let's predict the birth weight of a baby (continuous variable) based on two predictors: mother is a smoker or nonsmoker (categorial variable) and gestation period (continuous variable). We represent nonsmokers as 0 and smokers as 1. The regression equation is
Wgt =  2390 + 143.10 Gest  244.5 Smoke
. If we plot this, we'll actually see two parallel lines, one for smokers and one for nonsmokers.^{}One study looked at the number of cigarettes college students smoked per day. They predicted this count from gender, birth order, education level, social/psychological factors, etc. The study used poisson regression, negative binomial regression, and many others.^{}
With so many types of regression models, how do I select a suitable one? To apply linear regression, the main assumptions must be met: linearity, independence, constant variance and normality. Linearity can be checked via graphical analysis. A plot of residuals versus predicted values can show nonlinearity, or use goodness of fit test. Nonlinear relations can be made linear using transformations of predictors and/or the outcome. These could be log, square root or power transformations. Try adding transformations of current predictors. Try semi or nonparametric models.^{}
In practice, linear regression is sensitive to outliers and crosscorrelations. Piecewise linear regression, particularly for time series data, is a better approach.^{} Nonparametric regression can be used when there's an unknown nonlinear relationship.^{} SVR is an example of nonparametric regression.^{}
When overfitting is a problem, use cross validation to evaluate models. Ridge, lasso and elastic net models can help tackle overfitting.^{} ^{} They can also handle multicollinearity.^{} Quantile regression is suited to handle outliers.^{}
For predicting counts, use negative binomial regression if variance is larger than the mean. Poisson regression can be used only if variance equals the mean.^{} ^{}
What are some tips to analyze model statistics? Wellknown model performance metrics include Rsquared (R2), Root Mean Squared Error (RMSE), Residual Standard Error (RSE) and Mean Absolute Error (MAE). We also have metrics that penalize additional predictors: Adjusted R2, Akaike's Information Criteria (AIC) and Bayesian Information Criteria (BIC) and Mallows Cp. Higher the R2 or Adjusted R2, better the model. For all other metrics, lower value implies a better model.^{}
A high tstatistic implies coefficient is probably nonzero. A low pvalue on the tstatistic gives confidence on the estimate. Low coefficients and low pvalue for the model as a whole can imply multicollinearity.^{} While ttest is applied to individual coefficients, Ftest is applied to the overall model.^{}
Two models can be compared graphically. For example, the coefficients and their confidence intervals can be plotted and compared visually.^{}
What software packages support regression? In R, functions
lm()
,summary()
,residuals()
andpredict()
in thebase
package enable linear regression.^{} For GLM, we can useglm()
function. Usequantreg
package for quantile regression;glmnet
for ridge, lasso and elastic net regression;pls
for principal component regression;plsdepot
for PLS regression;e1071
for Support Vector Regression (SVR);ordinal
for ordinal regression;MASS
for negative binomial regression;survival
for cox regression.^{} Other useful packages arestats
,car
,caret
,sgd
,BLR
,Lars
,^{} andnlme
.^{}In Python,
scikitlearn
provides a number of modules and functions for regression. Use modulesklearn.linear_model
for linear regression including logistic, poisson, gamma, huber, ridge, lasso, and elastic net;sklearn.svm
for SVR;sklearn.neighbors
for knearest neighbours regression;sklearn.isotonic
for isotonic regression;metrics
for regression metrics;sklearn.ensemble
for ensemble methods for regression.^{}
Milestones
Regression starts with Carl Friedrich Gauss with the method of least squares. He doesn't publish the method until much later in 1809. In 1805, AdrienMarie Legendre invents the same approach independently. Legendre uses it to predict the orbits of comets.^{}
Francis Galton plots in 1877 what may be called the first regression line. It concerns the size of sweetpea seeds. It correlated the size of daughter seeds against that of mother seeds. Such an analysis came about in the course of investigating Darwin's mechanism for heredity. By these experiments, Galton also introduces the concept of "reversion to the mean", later called regression to the mean.^{}
R.A. Fisher gives the exact sampling distribution of the coefficient of correlation, thus marking the beginning of multivariate analysis. Fisher then simplifies it to a form via ztransformation. In the early 1920s, he introduces the F distribution and maximum likelihood method of estimation.^{}
Hotelling proposes Principal Component Regression (PCR) in an attempt to reduce the number of explanatory variables (predictors) in the regression model. PCR itself is based on Principal Component Analysis (PCA) that was invented independently by Pearson (1901) and Hotelling (1930s).^{}
Although the logistic function was invented by Verhulst in the 1830s, it's only in the 1960s that it's applied to regression analysis. D.R. Cox is among the early researchers to do this. Many researchers, including Cox, independently develop Multinomial Logistic Regression through the 1960s.^{}
Hoerl and Kennard note that least squares estimation is unbiased and this can give poor results if there's multicollinearity among the predictors. To improve the estimation they propose a biased estimation approach that they call Ridge Regression.^{} Ridge regression uses standardized variables, that is, outcome and predictors are subtracted by mean and divided by standard deviation.^{} By introducing some bias, variance of the least squares estimator is controlled.^{}
D.R. Cox applies regression to lifetable analysis. Among the sampled individuals, he observes either the time to "failure" or that the individual is removed from the study (called censoring). Moreover, the distribution of survival times is often skewed. For these reasons, linear regression is not suitable. Cox instead uses a hazard function that incorporates agespecific failure rate.^{} In later years, this approach is simply called Cox Regression.^{}
Nelder and Wedderburn introduce the Generalized Linear Model (GLM). As examples, they relate GLM to normal, binomial (probit analysis), poisson (contingency tables), and gamma (variance components) distributions.^{} However, it's only in the 1980s that GLM becomes popular due to the work of McCullagh and Nelder.^{}
Koenker and Bassett introduce Quantile Regression.^{} This uses weighted least absolute error rather than least squares error common in linear regression.^{}
Huber proposes an estimator that's quadratic in small values and grows linearly for large values. It's later named Huber Regression.^{}
Tibshirani proposes Lasso Regression that uses the least squares estimator but constrains the sum of absolute value of coefficients to a maximum. This forces some coefficients to zero or low values, leading to more interpretable models. This is useful when we start with too many predictors.^{}
De'ath proposes the Multivariate Regression Tree (MRT). The history of regression trees goes back to the 1960s. With the release of CART (Classification and Regression Tree) software in 1984, they became more well known.^{} However, CART is limited to a single response variable. MRT extends CART to multivariate response data.^{}
Zou and Hastie propose Elastic Net Regression. This combines elements of both ridge regression and lasso regression.^{}
References
 Analytics University. 2017. "35 Types of Regression Models used in Data Science." Analytics University, on YouTube, September 19. Accessed 20201111.
 Artigue, Heidi and Gary Smith. 2019. "The principal problem with principal components regression." Cogent Mathematics & Statistics, vol. 6, no. 1. Accessed 20201115.
 Bartocha, Kamil. 2014. "Linear Regression vs Logistic Regression vs Poisson Regression." MarketingDistillery, via SlideShare, November 23. Accessed 20201111.
 Bhalla, Deepanshu. 2018. "15 Types of Regression in Data Science." Listen Data, March. Accessed 20201111.
 Bock, Tim. 2020. "What is Linear Regression?" Blog, Display R. Accessed 20201111.
 Bolker, Ben. 2018. "Generalized linear mixed models." Accessed 20201112.
 Brannick, Michael T. 2020. "Logistic Regression." College of Arts & Sciences, Univ. of South Florida. Accessed 20201114.
 Cho, Wanhyun, Myung Hwan Na, Yuha Park, Deok Hyeon Kim, and Yongbeen Cho. 2020. "Prediction of Weights during Growth Stages of Onion Using Agricultural Data Analysis Method." Applied Sciences, MDPI, 10(6), 2094, March 19. Accessed 20201112.
 Ciaburro, Giuseppe. 2018. "R packages for regression." In: Regression Analysis with R, Packt Publishing Limited, January. Accessed 20201111.
 Cox, D.R. 1972. "Regression Models and LifeTables." Journal of the Royal Statistical Society, Series B (Methodological), vol. 34, no. 2, pp. 187220. Accessed 20201115.
 Cramer, J.S. 2002. "The Origins of Logistic Regression." Tinbergen Institute Discussion Paper, TI 2002119/4, November. Accessed 20201115.
 De'ath, Glenn. 2002. "Multivariate Regression Trees: a new technique for modeling species–environment relationships." Ecology, Ecological Society of America, vol. 83, no. 4, pp. 11051117. Accessed 20201116.
 Dye, Steven. 2020. "Quantile Regression." Towards Data Science, February 13. Accessed 20201112.
 Explorium. 2019. "The Complete Guide to Decision Trees." Blog, Explorium, December 10. Accessed 20201116.
 Gardner, William, Edward Patrick Mulvey, and Esther C. Shaw. 1995. "Regression Analyses of Counts and Rates: Poisson, Overdispersed Poisson, and Negative Binomial Models." Psychological Bulletin, vol. 118, no. 3, pp. 392404. Accessed 20201111.
 Ghahramani, Zoubin. 2015. "Parametric vs Nonparametric Models." Part II of Bayesian Inference, The Machine Learning Summer School, Max Planck Institute for Intelligent Systems, Tübingen, Germany, July 1324. Accessed 20201111.
 Gillham, Nicholas W. 2009. "Cousins: Charles Darwin, Sir Francis Galton and the birth of eugenics." Royal Statistical Society, vol. 6, no. 3, pp. 132135, September. Accessed 20201115.
 GraceMartin, Karen. 2008. "Regression Models for Count Data." The Analysis Factor, October 24. Updated 20180502. Accessed 20201111.
 GraceMartin, Karen. 2009. "Multiple Regression Model: Univariate or Multivariate GLM?" The Analysis Factor, April 20. Updated 20180502. Accessed 20201111.
 Granville, Vincent. 2014. "10 types of regressions. Which one to use?" Blog, Data Science Central, July 21. Accessed 20201111.
 Hoerl, Arthur E. and Robert W. Kennard. 1970. "Ridge Regression: Biased Estimation for Nonorthogonal Problems." Technometrics, vol. 12, no. 1, pp. 5567, February. Accessed 20201115.
 jvriesem. 2017. "When should linear regression be called “machine learning”?" CrossValidated, StackExchange, March 20. Accessed 20201111.
 Kabacoff, Robert I. 2020. "Multiple (Linear) Regression." QuickR, Datacamp. Accessed 20201111.
 Kassambara, Alboukadel. 2018. "Penalized Regression Essentials: Ridge, Lasso & Elastic Net." STHDA, March 11. Accessed 20201112.
 Kassambara, Alboukadel. 2018b. " Regression Model Accuracy Metrics: Rsquare, AIC, BIC, Cp and more." STHDA, March 11. Accessed 20201114.
 Khurram, Tauqeer. 2020. "Different Types of Regression Analysis to Know." Tech Funnel, March 18. Accessed 20201111.
 Koenker, Roger, and Gilbert Bassett. 1978. "Regression Quantiles." Econometrica, vol. 46, no. 1, pp. 3350, January. Accessed 20201115.
 Koenker, Roger and Kevin F. Hallock. 2001. "Quantile Regression." Journal of Economic Perspectives, vol. 15, no. 4, pp. 143156. Accessed 20201111.
 Kopf, Dan. 2015. "The Discovery of Statistical Regression." Priceonomics, November 6. Accessed 20201114.
 Legendre, Pierre, and Louis Legendre. 2012. "Multivariate regression trees (MRT)." Sec. 8.11 in: Developments in Environmental Modelling, Elsevier, vol. 24, pp. 337424. doi: 10.1016/B9780444538680.500083. Accessed 20201111.
 Liu, ChingTi, Jacqueline Milton, and Avery McIntosh. 2016. "Simple Linear Regression." Boston University School of Public Health, January 6. Accessed 20201114.
 Long, Jacob. 2020. "Tools for summarizing and visualizing regression models." Vignettes, jtools, on CRAN, June 22. Accessed 20201111.
 Mahmoud, Hamdy F. F. 2014. "Parametric versus Semi/nonparametric Regression Models." Laboratory for Interdisciplinary Statistical Analysis, Univ. of Colarado Boulder, July 23. Accessed 20201111.
 Mahmoud, Hamdy F. F. 2014b. "Parametric versus Semi/nonparametric Regression Models." Laboratory for Interdisciplinary Statistical Analysis, Univ. of Colarado Boulder, July 23. Accessed 20201112.
 Marquardt, Donald W. and Ron Snee. 1975. "Ridge Regression in Practice." The American Statistician, vol. 29, no. 1, February. Accessed 20201111.
 NCSS. 2020a. "Chapter 565: Cox Regression." NCSS Statistical Software. Accessed 20201115.
 NCSS. 2020b. "Chapter 335: Ridge Regression." NCSS Statistical Software. Accessed 20201115.
 Nelder, J. A. and R. W. M. Wedderburn. 1972. "Generalized Linear Models." Journal of the Royal Statistical Society. Series A (General), vol. 135, no. 3, pp. 370384. Accessed 20201115.
 Owen, Art B. 2006. "A robust hybrid of lasso and ridge regression." Stanford University, October. Accessed 20201111.
 PennState. 2020a. "Introduction to Generalized Linear Models." Sec. 6.1 in: STAT 504 Analysis of Discrete Data, The Pennsylvania State University. Accessed 20201111.
 PennState. 2020b. "Example on Birth Weight and Smoking." Sec. 8.1 in: STAT 501 Regression Methods, The Pennsylvania State University. Accessed 20201111.
 PennState. 2020c. "Logistic Regression." Sec. 15.1 in: STAT 501 Regression Methods, The Pennsylvania State University. Accessed 20201114.
 Philosophy Terms. 2016. "Causality." Philosophy Terms, October 10. Updated 20181025. Accessed 20201112.
 Princeton University. 2020a. "Interpreting Regression Output." Data and Statistical Services, Princeton University Library, Princeton University. Accessed 20201114.
 Rao, C. Radhakrishna. 1983. "Multivariate Analysis: Some Reminiscences on Its Origin and Development." Sankhyā: The Indian Journal of Statistics, Series B (19602002) 45, no. 2, pp. 28499. Accessed 20201115.
 Sagar, Chaitanya. 2017. "Building Regression Models in R using Support Vector Regression." KDNuggets, March. Accessed 20201114.
 scikitlearn. 2020a. "API Reference." v0.23.2, scikitlearn, August. Accessed 20201111.
 scikitlearn. 2020b. "sklearn.linear_model.HuberRegressor." v0.23.2, scikitlearn, August. Accessed 20201111.
 Sharareh, Parami , Tapak Leili, Moghimbeigi Abbas, Poorolajal Jalal, and Ghaleiha Ali. 2020. "Determining correlates of the average number of cigarette smoking among college students using count regression models." Scientific Reports, 10, Article number: 8874, June 1. Accessed 20201111.
 Steorts, Rebecca C. 2017. "Tree Based Methods: Regression Trees." Chapter 8 ISL, STA 325, Duke University. Accessed 20201111.
 STHDA. 2020. "Regression Analysis Essentials For Machine Learning." STHDA. Accessed 20201111.
 Stoltzfus, Jill C. 2011. "Logistic Regression: A Brief Primer." Academic Emergency Medicine, 18:10991104. Accessed 20201111.
 Tibshirani, Robert. 1996. "Regression shrinkage and selection via the lasso." J. Royal. Statist. Society, Series B (Methodological), vol. 58, no. 1, pp. 267288. Accessed 20201115.
 UCLA. 2020a. "Regression Models with Count Data." Statistical Consulting Group, UCLA. Accessed 20201111.
 UCLA. 2020b. "Introduction to Generalized Linear Mixed Models." Statistical Consulting Group, UCLA. Accessed 20201112.
 UCLA. 2020c. "Robust Regression." Stata Data Analysis Examples, UCLA. Accessed 20201112.
 Wikipedia. 2020a. "Regression analysis." Wikipedia, October 20. Accessed 20201111.
 Wikipedia. 2020b. "General linear model." Wikipedia, November 9. Accessed 20201111.
Further Reading
 Bhalla, Deepanshu. 2018. "15 Types of Regression in Data Science." Listen Data, March. Accessed 20201111.
 Statistics Solutions. 2020. "Selection Process for Multiple Regression." Statistics Solutions, June 23. Accessed 20201111.
 Princeton University. 2020b. "Introduction to Regression." Data and Statistical Services, Princeton University Library, Princeton University. Accessed 20201111.
 scikitlearn. 2020c. "Support Vector Regression (SVR) using linear and nonlinear kernels." v0.23.2, scikitlearn, August. Accessed 20201111.
 Long, Jacob. 2020. "Tools for summarizing and visualizing regression models." Vignettes, jtools, on CRAN, June 22. Accessed 20201111.
 Koenker, Roger and Kevin F. Hallock. 2001. "Quantile Regression." Journal of Economic Perspectives, vol. 15, no. 4, pp. 143156. Accessed 20201111.
Article Stats
Cite As
See Also
 Regression Modelling
 Linear Regression
 Logistic Regression
 Stepwise Regression
 Support Vector Regression
 Generalized Linear Regression