Disadvantages
- it is vulnerable to
Model
We want:
\[ 0 \leqslant h_\theta x \leqslant 1 \]
We have following hypothesis function:
\[ h_\theta(x) = g(\theta^Tx) \]
And the sigmoid/logistic function:
\[ g(z) = \frac{1}{1 + e^{-z}} \]
Therefore we have:
\[ h_\theta(x) = \frac{1}{1 + e^{-\theta^Tx}} \]
Interpretation
\[h_\theta(x) = \] estimated probability that y = 1 on input x. That means:
\[ h_\theta(x) = P(y = 1 | x; \theta) \]
(probability that y = 1, given x, parametrized by \[\theta\])
Decision boundary
Suppose:
\[ h_\theta(x) = g(\theta_0 + \theta_1x_1 + \theta_2x_2) \]
let’s take: \[\theta_0 = -3, \theta_1 = 1, \theta_2 = 1\]
Predict y = 1 if \[ -3 + x_1 + x_2 \geqslant 0 \]
that is: \[ x_1 + x_2 \geqslant 3 \]
and predict y = 0 if
\[ x_1 -+ x_2 < 3 \]
Examples:
<Logistic%20Regression%20-%20Decision%20Boundary>
Non-linear decision boundaries:
Cost function
$$ Cost(h_θ(x), y) = \\ a) -log(h_θ(x)) if y = 1 \\ b) -log(1 - h_θ(x)) if y = 0
$$
even better (instead of having 2 lines):
\[ Cost(h_\theta(x), y) = -y*log(h_\theta(x))-(1-y)l*og(1-h_\theta(x)) \]
Therefore the logistic regression cost function is:
\[ \begin{array}{lll} J(\theta) &=& \frac{1}{m}\sum_{i=1}^{m}Cost(h_\theta(x^i), y^i)\\ &=& -\frac{1}{m}[\sum_{i=1}^{m}y^i log(h_\theta(x^i))+(1-y^i)log(1-h_\theta(x^i))] \end{array} \]
Gradient descent
\[ J(\theta)=-\frac{1}{m}[\sum_{i=1}^{m}y^i log(h_\theta(x^i))+(1-y^i)log(1-h_\theta(x^i))] \]
Want \[min_\theta J(\theta)\]:
\[ \begin{array}{lll} \text{Repeat and update simultaneously all } \theta_j:\\\ \theta_j &:=& \theta_j - \alpha\frac{\alpha}{\alpha\theta_j}J(\theta)\\ &:=& \theta_j -\alpha\frac{1}{m}\sum_{i=1}^{m}(h_\theta(x^i))-y^i)x_j^i\\ &:=& \theta_j -\alpha\sum_{i=1}^{m}(h_\theta(x^i))-y^i)x_j^i\\ \end{array} \]
Algorithm looks identical to . But still there is a difference: The definition of \[h_\theta(x)\]:
- linear regression: \[h_\theta(x) = \theta^T x\]
- logistic regression: \[h_\theta(x) = \frac{1}{1 + e^{-\theta^Tx}}\]