Disadvantages

Model

We want:

\[ 0 \leqslant h_\theta x \leqslant 1 \]

We have following hypothesis function:

\[ h_\theta(x) = g(\theta^Tx) \]

And the sigmoid/logistic function:

\[ g(z) = \frac{1}{1 + e^{-z}} \]

Therefore we have:

\[ h_\theta(x) = \frac{1}{1 + e^{-\theta^Tx}} \]

Interpretation

\[h_\theta(x) = \] estimated probability that y = 1 on input x. That means:

\[ h_\theta(x) = P(y = 1 | x; \theta) \]

(probability that y = 1, given x, parametrized by \[\theta\])

Decision boundary

Suppose:

\[ h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2x_2) \]

let’s take: \[\theta_0 = -3, \theta_1 = 1, \theta_2 = 1\]

Predict y = 1 if \[ -3 + x_1 + x_2 \geqslant 0 \]

that is: \[ x_1 + x_2 \geqslant 3 \]

and predict y = 0 if

\[ x_1 -+ x_2 < 3 \]

Examples:

<Logistic%20Regression%20-%20Decision%20Boundary>

Non-linear decision boundaries:

Cost function

$$ Cost(h_θ(x), y) = \\ a) -log(h_θ(x)) if y = 1 \\ b) -log(1 - h_θ(x)) if y = 0

$$

even better (instead of having 2 lines):

\[ Cost(h_\theta(x), y) = -y*log(h_\theta(x))-(1-y)l*og(1-h_\theta(x)) \]

Therefore the logistic regression cost function is:

\[ \begin{array}{lll} J(\theta) &=& \frac{1}{m}\sum_{i=1}^{m}Cost(h_\theta(x^i), y^i)\\ &=& -\frac{1}{m}[\sum_{i=1}^{m}y^i log(h_\theta(x^i))+(1-y^i)log(1-h_\theta(x^i))] \end{array} \]

Gradient descent

\[ J(\theta)=-\frac{1}{m}[\sum_{i=1}^{m}y^i log(h_\theta(x^i))+(1-y^i)log(1-h_\theta(x^i))] \]

Want \[min_\theta J(\theta)\]:

\[ \begin{array}{lll} \text{Repeat and update simultaneously all } \theta_j:\\\ \theta_j &:=& \theta_j - \alpha\frac{\alpha}{\alpha\theta_j}J(\theta)\\ &:=& \theta_j -\alpha\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x^i))-y^i)x_j^i\\ &:=& \theta_j -\alpha\sum_{i=1}^{m}(h_\theta(x^i))-y^i)x_j^i\\ \end{array} \]

Algorithm looks identical to . But still there is a difference: The definition of \[h_\theta(x)\]:

Good articles