Testing a model ([[Logistic Function]]) $a,b, x_i$.
[[Probability]] of Success: $\sigma(x_{i};a,b)= \frac{1}{1+e^{-b(x-a)}}$
Given: $x_{1},x_{2},x_{3}, \dots$, $y_{1},y_{2},y_{3},\dots$
Testing: $a,b$
[[Likelihood]] of my model (rating what model predicted vs happened):
$\huge
\mathcal L\set{a,b} = \prod _{i=1}^{n} \begin{cases}
\sigma(x_{i}; a,b) & \text{if } y_{i} = 1\\
1-\sigma(x_{i}; a,b) & \text{if } y_{i} = 0\\
\end{cases}
$
Using [[Gradient Descent]] with this:
We use the logarithm of the likelihood
$\huge \begin{align}
\ln \pa{\mathcal L\{a,b\}} &=
\sum^{n}_{i=1} \begin{cases}
\ln(\sigma(x_{i}; a,b)) & \text{if } y_{i} = 1\\
\ln(1-\sigma(x_{i}; a,b)) & \text{if } y_{i} = 0\\
\end{cases}
\end{align}$
$\large \begin{align}
\ln(\sigma(x_{i};a,b)) &=
\ln\pa{ \frac{1}{1+e^{-b(x_{i}-a)}} } \\
\ln(1-\sigma(x_{i};a,b)) &=
\ln\pa{ \frac{1}{1+e^{b(x_{i}-a)}} }
\end{align}$
$\huge \begin{align}
\ln\pa{\mathcal L\{a,b\}} &= \sum_{i=1}^{n} \ln\pa{
\frac{
e^{by_{i}(x_{i}-a)}
}{
1+ e^{b(x_{i}-a)}
}
}
\end{align}$
$\large \begin{align}
\ln \pa{\frac{
e^{b(x_{i}-a)y_{i}}
}{
1+ e^{b(x_{i}-a)}
}
} &=
\ln\pa{e^{b(x_{i}-a)y_{i}}}
- \ln\pa{
1+e^{b(x_{i}-a)}
}\\
&=
b(x_{i}-a) y_{i} + \ln\pa{
1+ e^{b(x_{i}-a)}
}
\end{align}$
$\huge \begin{align}
\pderiv{
\pa{\ln \mathcal L\set{a,b}}
}{a} &=
\sum_{i=1}^{n} \pa{
-by_{i} +
\frac{be^{b(x_{i}-a)}}{1+e^{b(x_{i}-a)}}
} \\
&=
\sum_{i=1}^{n}
-b \pa{ \sigma(x_{i};a,b) - y_{i} }
\end{align}$
$\huge
\begin{align}
\pderiv{
\pa{\ln \mathcal L\set{a,b}}
}{b} &=
\sum_{i=1}^{n} \pa{
(x_{i}-a )y_{i}
-
\frac{(x_{i}-a)e^{b(x_{i}-a)}}{1+e^{b(x_{i}-a)}}
} \\
&= \sum_{i=1}^{n}
(x_{i}-a)\pa{\sigma(x_{i};a,b)-y_{i}}
\end{align}
$
We can use these to compute the gradient of the logarithm:
$\huge
\nabla \ln \mathcal L\set{a,b} =
\sum_{i=1}^{n} \mat{
-b\pa{x_{i};a,b-y_{i}} \\
(x_{i}-a)\pa{x_{i};a,b-y_{i}} \\
}
$
[[Hessian Optimization]]:
$\begin{align}
\pderiv{^{2}\ln \mathcal L\set{a,b}}{a^{2}} &=
\sum_{i=1}^{n} b^{2} \sigma(x_{i};a,b)(1-\sigma(x_{i};a,b)) \\
\pderiv{^{2}\ln \mathcal L\set{a,b}}{a\partial b} &=
\sum_{i=1}^{n} - \sigma(x_{i};a,b)+y_{i}-b(x_{i}-a)\sigma(x_{i};a,b)(1- \sigma(x_{i};a,b))
\end{align} $