A quick overview of Logistic Regression.

Machine Learning Classification

A pretty basic technique for binary classification.

Author

Affiliation

Andrea Bonvini

 

Published

May 20, 2021

Citation

Bonvini, 2021

Although the name might confuse, please note that this is a classification algorithm.

In Logistic Regression, we define a set weights w that should be combined (through a trivial dot product) with some features ϕ. Considering a problem of two-class classification, the posterior probability of class C1 can be written as a logistic sigmoid function:

p(C1|ϕ)=11+ewTϕ=σ(wTϕ)

and p(C2|ϕ)=1p(C1|ϕ)

Applying the Maximum Likelihood approach…

Given a dataset D={(ϕn,tn) n[1,N]}, tn{0,1}, we have to maximize the probability of getting the right label:

P(t|Φ,w)=Nn=1ytnn(1yn)1tn,  yn=σ(wTϕn)

Taking the negative log of the likelihood, the cross-entropy error function can be defined and it has to be minimized:

L(w)=lnP(t|Φ,w)=Nn=1(tnlnyn+(1tn)ln(1yn))=NnLn

Differentiating and using the chain rule:

Lnyn=yntnyn(1yn),    ynw=yn(1yn)ϕnLnw=Lnynynw=(yntn)ϕ

The gradient of the loss function is

L(w)=Nn=1(yntn)ϕn

It has the same form as the gradient of the sum-of-squares error function for linear regression. But in this case y is not a linear function of w and so, there is no closed form solution. The error function is convex (only one optimum) and can be optimized by standard gradient-based optimization techniques. It is, hence, easy to adapt to the online learning setting.

Talking about Multiclass Logistic Regression

For the multiclass case, the posterior probabilities can be represented by a softmax transformation of linear functions of feature variables:

p(Ck|ϕ)=yk(ϕ)=ewTkϕjewTjϕ

ϕ(x) has been abbreviated with ϕ for simplicity.

Maximum Likelihood is used to directly determine the parameters

p(T|Φ,w1,,wK)=Nn=1(Kk=1p(Ck|ϕn)tnk)=Nn=1(Kk=1ytnknk)Term for correct class

where ynk=p(Ck|ϕn)=ewTkϕnjewTjϕn

The cross-entropy function is:

L(w1,,wK)=lnp(T|Φ,w1,,wK)=Nn=1(Kk=1tnklnynk)

Taking the gradient

Lwj(w1,,wK)=Nn=1(ynjtnj)ϕn

Footnotes

    Citation

    For attribution, please cite this work as

    Bonvini (2021, May 21). Last Week's Potatoes: A quick overview of Logistic Regression.. Retrieved from https://lastweekspotatoes.com/posts/2021-07-22-a-quick-overview-of-logistic-regression/

    BibTeX citation

    @misc{bonvini2021a,
      author = {Bonvini, Andrea},
      title = {Last Week's Potatoes: A quick overview of Logistic Regression.},
      url = {https://lastweekspotatoes.com/posts/2021-07-22-a-quick-overview-of-logistic-regression/},
      year = {2021}
    }