Logistic Regression

Logistic regression model. Source: Polamuri 2017.
Logistic regression model. Source: Polamuri 2017.

Suppose we're asked to classify emails into two categories: spam or not spam. Compare this with another application that attempts to predict product sales given recent advertising expense. Unlike the second example in which the target variable is continuous, email classification predicts a categorical variable.

Logistic regression is a statistical method used for classifying a target variable that is categorical in nature. It is an extension of a linear regression model. It uses a logistic function to estimate the probability of a target variable belonging to a particular class or category.

Discussion

  • Could you explain logistic regression with an example?
    Logistic regression is used for email classification. Source: Waseem 2020.
    Logistic regression is used for email classification. Source: Waseem 2020.

    Consider email classification as an example. To be able to predict if an email is spam or not, we will extract relevant information from the emails such as:

    • Sender of the email
    • Number of typos in the email
    • Occurrence of words or phrases such as "offer", "prize", "free gift", etc.

    The above information is converted into a vector of numerical features. These numerical features are linearly combined and then transformed using a logistic function to give a score in the range 0 to 1. This score is the probability of an email being either spam or not. If the probability is higher than 50%, then the email will be classified as spam.

  • What are different types of logistic regression?

    There are three types of logistic regression:

    • Binary or binomial: where the dependent variable can have only two outcomes. Examples: spam/not-spam, dead/alive, pass/fail.
    • Multiclass or multinomial: where the dependent variable is classified into three or more categories and these categories are not ordered. Examples: types of cuisines (Italian, Mediterranean, Chinese).
    • Ordinal: where the dependent variable is classified into three or more categories and these categories are ordered. Examples: movie rating (1-5).
  • Why can't I use linear regression for predicting classes?
    Linear Regression vs Logistic Regression. Source: Jaiswal 2021.
    Linear Regression vs Logistic Regression. Source: Jaiswal 2021.

    In classification problems, we are predicting the probability that the outcome variable belongs to a particular class. If linear regression is used for classification, it will treat the classes or categories as numbers. It will fit the best line that minimises the distance between the data points and the line. The linear regression equation would just give scores that lie along the best fit line. These scores cannot be interpreted as probabilities. A meaningful threshold cannot be set to distinguish the classes.

    Also, the linear regression model fits a straight line that can extrapolate. Values can go out of range, such as below 0 or above 1 (-∞ to ∞). Since probability lies in a fixed range between 0 to 1, in logistic regression, a logistic function is applied so that the dependent variable only takes values between 0 and 1.

  • What is the logistic function?
    Logistic function. Source: Molnar 2021.
    Logistic function. Source: Molnar 2021.

    Logistic function also known as sigmoid function is an S-shaped curve that can take any real-valued number and transforms it into a number between 0 and 1 using the following equation:

    $$f(x)= \frac{1}{1+e^{-x}}$$

    In the above image as x approaches ∞, then, f(x) becomes 1 and as x approaches -∞, then, f(x) becomes 0.

    $$f(x) = \frac{1}{1+e^{-∞}} = 1, \qquad e^{-∞}\to 0$$

    $$f(x) = \frac{1}{1+e^{-(-∞)}} = \frac{1}{1+ e^∞} = 0, \qquad 1/∞ \to 0$$

  • What are GLMs and how are they relevant to logistic regression?

    Generalized Linear Models (GLMs) are a class of non-linear regression models that can be used in certain cases where linear models do not fit well. They're applicable when the outcome variable follows a non-linear distribution such as binomial, exponential, poisson, etc.

    A GLM is represented by the following equation:

    $$\large{g(E(y))=\beta_0+\beta_1{}x_{1}+\ldots{}\beta_p{}x_{p}}$$

    Where,

    • \(E(y)\) is the mean value or the expected value of the outcome variable that follows an assumed distribution
    • \(\beta_0+\beta_1{}x_{1}+\ldots{}\beta_p{}x_{p}\) is the linear predictor i.e. the weighted sum of features where \(\beta\) is the weight and x is the explanatory variable.
    • \(g\) is the link function that mathematically links the expected value of the outcome variable and the linear predictor.

    GLM is a generalised form of linear regression and logistic regression is a specific type of GLM. For logistic regression, we can derive a specific link function \(g\) called the logit function.

  • What is the logistic regression equation and the logit function?
    Effect of coefficients on the logistic function. Source: van den Berg 2020.
    Effect of coefficients on the logistic function. Source: van den Berg 2020.

    Let's start with the linear regression equation:

    $$y=\beta_0+\beta_1{}x_{1}\qquad(1)$$

    We derive the link function for logistic regression. In linear regression, \(y\) is a continuous variable. Since we want a probability for logistic regression, we will wrap the linear predictor in a logistic function so that the values do not go below 0 or beyond 1. We will denote this as probability with \(p\):

    $$p=\frac{1}{1+e^{-(\beta_0+\beta_1{}x_{1})}}\qquad(2)$$

    The figure shows the probability that a person, given his/her age, will die within the next five years. We note that changing \(\beta_0\) shifts the curve while changing \(\beta_1\) affects steepness.

    Using (1) we can rewrite (2) as:

    $$p=\frac{1}{1+e^{-y}}=\frac{e^y}{1+ e^y}\qquad(3)$$

    If \(p\) is the probability that an email is spam, then the probability of a non-spam email can be written as:

    $$q=1-p=1-\frac{1}{1+e^{-y}}=\frac{1}{1+e^y}\qquad(4)$$

    Dividing (3) by (4) we get,

    $$\frac{p}{1-p}=e^y$$

    Taking natural logarithm on both sides and substituting the value of y we get the logistic regression equation,

    $$\ln(\frac{p}{1-p})=\beta_0+\beta_1{}x_{1}$$

    \(p/(1-p)\) is the odds ratio. \(\ln(p/(1-p))\) is the link function or logit function. The output values from this function are called logits.

  • What is the cost function for logistic regression?
    Log loss curve. Source: Mcdonald 2018.
    Log loss curve. Source: Mcdonald 2018.

    A cost function quantifies the error between the predicted value and the expected value. The weights of features in the model are estimated by minimising or maximising this cost function.

    The cost function used in logistic regression is known as Log Loss or Negative Log-Likelihood (NLL) equation. It is the negative average of the log of correctly predicted probabilities for each instance in the training data.

    $$-\frac{1}{N}\sum_{i =1}^Ny_i\cdot\ln(p(y_i))+(1-y_i)\cdot\ln(1-p(y_i))$$

    Where,

    • \(N\) is the number of training samples
    • \(y_i\) is actual value of i'th sample
    • \(p(y_i)\) is the predicted probability of the i'th sample

    We simplify this equation for the two possible outcomes for a single training sample:

    • True output y=1 (positive): \(-(1\cdot\ln(p) + (1–1)\cdot\ln(1-p)) = -ln(p)\)
    • True output y=0 (negative): \(-(0\cdot\ln(p) + (1–0)\cdot\ln(1-p)) = -ln(1-p)\)

    Also in the above graph we can see that since the scale is logarithmic the loss decreases slowly as the predicted probability gets closer to the true label. But, as the predicted probability diverges from the true label the loss increases rapidly. This has the effect of heavily penalising incorrect predictions.

Milestones

1838

The logistic function is introduced in a series of three papers by Pierre François Verhulst between 1838 and 1847. He uses it as a model of population growth by adjusting the exponential growth model, under the guidance of Adolphe Quetelet.

1889

The term regression is coined by Francis Galton to describe a biological phenomenon. He observes that the heights of descendants of tall ancestors tend to regress down towards a normal average, a phenomenon also known as regression toward the mean.

1943

Wilson and Worcester use logistic model in bioassay which is the first known application of its kind.

1966

Cox introduces multinomial logit model. This is a step up for logistic regression applications with the logit model.

1973

Daniel McFadden links the multinomial logit to the theory of discrete choice, specifically Luce's choice axiom, showing that the multinomial logit follows from the assumption of independence of irrelevant alternatives and interpreting odds of alternatives as relative preferences. This gives a theoretical foundation for the logistic regression. In 2000, McFadden is awarded Nobel Prize for this contribution.

Sample Code

  • # Source: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
    # Accessed 2021-01-22
     
    from sklearn.datasets import load_iris
    from sklearn.linear_model import LogisticRegression
     
    X, y = load_iris(return_X_y=True)
    clf = LogisticRegression(random_state=0).fit(X, y)
    clf.predict(X[:2, :])
    clf.predict_proba(X[:2, :])
    clf.score(X, y)

References

  1. Analytics Vidhya. 2015. "Simple Guide to Logistic Regression in R and Python." Blog, on Analytics Vidhya, November 1. Accessed 2021-01-15
  2. Bock, Tim. 2018. "What is Linear Regression?" Blog, Display R, April 5. Updated 2020-12-09. Accessed 2021-01-22.
  3. Brownlee, Jason. 2016. "Logistic Regression for Machine Learning." Blog, on Machine Learning Mastery, April 1. Updated 2020-08-15. Accessed 2021-01-04.
  4. Goel, Aman. 2018. "4 Logistic Regressions Examples to Help You Understand." Post, on Magoosh, May 21. Accessed 2021-01-07
  5. Grace-Martin, Karen. 2015. "What is a Logit Function and Why Use Logistic Regression?" The Analysis Factor, May 11. Updated 2018-12-14. Accessed 2021-01-22.
  6. HolyPython. 2020. "Logistic Regression History." Blog, on HolyPython, July 29. Accessed at 2021-01-21.
  7. Jaiswal, Sonoo. 2021. "Linear Regression vs Logistic Regression." Tutorial, on Javatpoint. Accessed 2021-01-08
  8. Krzyk, Kamil. 2018. "Coding Deep Learning for Beginners — Linear Regression (Part 2): Cost Function."towards data science, on Medium, August 8. Accessed 2021-01-18.
  9. Lumen Learning. 2021. "Introduction to Logistic Regression." In: Introduction to Statistics, Lumen Learning. Accessed 2021-01-22.
  10. Mcdonald, Conor. 2018. "Log Loss: A short note." Blog, on Wordpress, March 3. Accessed 2021-01-19
  11. Megha270396. 2020. "Binary Cross Entropy aka Log Loss-The cost function used in Logistic Regression." Blog, Analytics Vidhya, November 9. Accessed 2021-01-18
  12. Molnar, Christoph. 2021. "Interpretable Machine Learning." Github, January 4. Accessed 2021-01-04.
  13. Polamuri, Saimadhu. 2017. "How the Logistic Regression Model Works." Blog, on Dataspirant, March 2. Accessed 2021-01-04.
  14. Reddy, Sushmith. 2020. "Understanding the log loss function." Analytics Vidhya, on Medium, July 6. Accessed 2021-01-18.
  15. Sheldon, Kerby. 2019. "Generalized Linear Models." Notes, Department of Statistics, University of Michigan, December 9. Accessed 2021-01-15
  16. Swaminathan, Saishruthi. 2018. "Logistic Regression — Detailed Overview." Towards Data Science, on Medium, March 18. Accessed 2021-01-06.
  17. van den Berg, Ruben Geert. 2020. "Logistic Regression – Simple Introduction." SPSS Tutorials. Accessed 2021-01-22.
  18. Waseem, Mohammad. 2020."How To Implement Classification In Machine Learning?." Blog, Edureka.co, July 21. Accessed at 2021-01-05
  19. Wikipedia. 2020a. "Logistic Regression." Wikipedia, December 18. Accessed 2021-01-04.
  20. Wikipedia. 2020b. "Regression analysis." Wikipedia, December 21. Accessed 2021-01-21.
  21. Wikipedia. 2021c. "Logistic Function." Wikipedia, January 10. Accessed 2021-01-21.

Further Reading

  1. Brooks-Barlett, Jonny. 2018. "Probability concepts explained: Maximum likelihood estimation." Towards Data Science, on Medium, January 3. Accessed 2021-01-18
  2. Agarwal, Rahul. 2019. "The 5 Classification Evaluation Metrics Every Data Scientist Must Know." Blog, on KDnuggets, October. Accessed 2021-01-18
  3. Ray, Sunil. 2017. "Commonly used Machine Learning Algorithms (with Python and R Codes)." Blog, on Analytics Vidhya, September 7. Accessed 2021-01-18

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
6
3
1009
3
13
682
1261
Words
16
Chats
9
Edits
2
Likes
313
Hits

Cite As

Devopedia. 2021. "Logistic Regression." Version 9, January 22. Accessed 2021-02-08. https://devopedia.org/logistic-regression
Contributed by
2 authors


Last updated on
2021-01-22 08:58:07