# Logistic Regression

Suppose we're asked to classify emails into two categories: spam or not spam. Compare this with another application that attempts to predict product sales given recent advertising expense. Unlike the second example in which the target variable is continuous, email classification predicts a categorical variable.

Logistic regression is a statistical method used for classifying a target variable that is categorical in nature. It is an extension of a linear regression model. It uses a logistic function to estimate the probability of a target variable belonging to a particular class or category.

## Discussion

• Could you explain logistic regression with an example?

Consider email classification as an example. To be able to predict if an email is spam or not, we will extract relevant information from the emails such as:

• Sender of the email
• Number of typos in the email
• Occurrence of words or phrases such as "offer", "prize", "free gift", etc.

The above information is converted into a vector of numerical features. These numerical features are linearly combined and then transformed using a logistic function to give a score in the range 0 to 1. This score is the probability of an email being either spam or not. If the probability is higher than 50%, then the email will be classified as spam.

• What are different types of logistic regression?

There are three types of logistic regression:

• Binary or binomial: where the dependent variable can have only two outcomes. Examples: spam/not-spam, dead/alive, pass/fail.
• Multiclass or multinomial: where the dependent variable is classified into three or more categories and these categories are not ordered. Examples: types of cuisines (Italian, Mediterranean, Chinese).
• Ordinal: where the dependent variable is classified into three or more categories and these categories are ordered. Examples: movie rating (1-5).
• Why can't I use linear regression for predicting classes?

In classification problems, we are predicting the probability that the outcome variable belongs to a particular class. If linear regression is used for classification, it will treat the classes or categories as numbers. It will fit the best line that minimises the distance between the data points and the line. The linear regression equation would just give scores that lie along the best fit line. These scores cannot be interpreted as probabilities. A meaningful threshold cannot be set to distinguish the classes.

Also, the linear regression model fits a straight line that can extrapolate. Values can go out of range, such as below 0 or above 1 (-∞ to ∞). Since probability lies in a fixed range between 0 to 1, in logistic regression, a logistic function is applied so that the dependent variable only takes values between 0 and 1.

• What is the logistic function?

Logistic function also known as sigmoid function is an S-shaped curve that can take any real-valued number and transforms it into a number between 0 and 1 using the following equation:

$$f(x)= \frac{1}{1+e^{-x}}$$

In the above image as x approaches ∞, then, f(x) becomes 1 and as x approaches -∞, then, f(x) becomes 0.

$$f(x) = \frac{1}{1+e^{-∞}} = 1, \qquad e^{-∞}\to 0$$

$$f(x) = \frac{1}{1+e^{-(-∞)}} = \frac{1}{1+ e^∞} = 0, \qquad 1/∞ \to 0$$

• What are GLMs and how are they relevant to logistic regression?

Generalized Linear Models (GLMs) are a class of non-linear regression models that can be used in certain cases where linear models do not fit well. They're applicable when the outcome variable follows a non-linear distribution such as binomial, exponential, poisson, etc.

A GLM is represented by the following equation:

$$\large{g(E(y))=\beta_0+\beta_1{}x_{1}+\ldots{}\beta_p{}x_{p}}$$

Where,

• $$E(y)$$ is the mean value or the expected value of the outcome variable that follows an assumed distribution
• $$\beta_0+\beta_1{}x_{1}+\ldots{}\beta_p{}x_{p}$$ is the linear predictor i.e. the weighted sum of features where $$\beta$$ is the weight and x is the explanatory variable.
• $$g$$ is the link function that mathematically links the expected value of the outcome variable and the linear predictor.

GLM is a generalised form of linear regression and logistic regression is a specific type of GLM. For logistic regression, we can derive a specific link function $$g$$ called the logit function.

• What is the logistic regression equation and the logit function?

Let's start with the linear regression equation:

$$y=\beta_0+\beta_1{}x_{1}\qquad(1)$$

We derive the link function for logistic regression. In linear regression, $$y$$ is a continuous variable. Since we want a probability for logistic regression, we will wrap the linear predictor in a logistic function so that the values do not go below 0 or beyond 1. We will denote this as probability with $$p$$:

$$p=\frac{1}{1+e^{-(\beta_0+\beta_1{}x_{1})}}\qquad(2)$$

The figure shows the probability that a person, given his/her age, will die within the next five years. We note that changing $$\beta_0$$ shifts the curve while changing $$\beta_1$$ affects steepness.

Using (1) we can rewrite (2) as:

$$p=\frac{1}{1+e^{-y}}=\frac{e^y}{1+ e^y}\qquad(3)$$

If $$p$$ is the probability that an email is spam, then the probability of a non-spam email can be written as:

$$q=1-p=1-\frac{1}{1+e^{-y}}=\frac{1}{1+e^y}\qquad(4)$$

Dividing (3) by (4) we get,

$$\frac{p}{1-p}=e^y$$

Taking natural logarithm on both sides and substituting the value of y we get the logistic regression equation,

$$\ln(\frac{p}{1-p})=\beta_0+\beta_1{}x_{1}$$

$$p/(1-p)$$ is the odds ratio. $$\ln(p/(1-p))$$ is the link function or logit function. The output values from this function are called logits.

• What is the cost function for logistic regression?

A cost function quantifies the error between the predicted value and the expected value. The weights of features in the model are estimated by minimising or maximising this cost function.

The cost function used in logistic regression is known as Log Loss or Negative Log-Likelihood (NLL) equation. It is the negative average of the log of correctly predicted probabilities for each instance in the training data.

$$-\frac{1}{N}\sum_{i =1}^Ny_i\cdot\ln(p(y_i))+(1-y_i)\cdot\ln(1-p(y_i))$$

Where,

• $$N$$ is the number of training samples
• $$y_i$$ is actual value of i'th sample
• $$p(y_i)$$ is the predicted probability of the i'th sample

We simplify this equation for the two possible outcomes for a single training sample:

• True output y=1 (positive): $$-(1\cdot\ln(p) + (1–1)\cdot\ln(1-p)) = -ln(p)$$
• True output y=0 (negative): $$-(0\cdot\ln(p) + (1–0)\cdot\ln(1-p)) = -ln(1-p)$$

Also in the above graph we can see that since the scale is logarithmic the loss decreases slowly as the predicted probability gets closer to the true label. But, as the predicted probability diverges from the true label the loss increases rapidly. This has the effect of heavily penalising incorrect predictions.

## Milestones

1838

The logistic function is introduced in a series of three papers by Pierre François Verhulst between 1838 and 1847. He uses it as a model of population growth by adjusting the exponential growth model, under the guidance of Adolphe Quetelet.

1889

The term regression is coined by Francis Galton to describe a biological phenomenon. He observes that the heights of descendants of tall ancestors tend to regress down towards a normal average, a phenomenon also known as regression toward the mean.

1943

Wilson and Worcester use logistic model in bioassay which is the first known application of its kind.

1966

Cox introduces multinomial logit model. This is a step up for logistic regression applications with the logit model.

1973

Daniel McFadden links the multinomial logit to the theory of discrete choice, specifically Luce's choice axiom, showing that the multinomial logit follows from the assumption of independence of irrelevant alternatives and interpreting odds of alternatives as relative preferences. This gives a theoretical foundation for the logistic regression. In 2000, McFadden is awarded Nobel Prize for this contribution.

## Sample Code

• # Source: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
# Accessed 2021-01-22

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X, y)
clf.predict(X[:2, :])
clf.predict_proba(X[:2, :])
clf.score(X, y)

## Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins barsiwala
6
3
1009 arvindpdmn
4
13
689
1261
Words
2
Likes
3934
Hits

## Cite As

Devopedia. 2022. "Logistic Regression." Version 10, January 19. Accessed 2022-09-22. https://devopedia.org/logistic-regression

## Article Warnings

• In References, replace these sub-standard sources: javatpoint.com, edureka.co
• Site Map