A modified version of [[Optimization]] / [[Gradient Descent]]. During the descent process, we use higher order [[Derivative]] / [[Gradient]] information at each point.
$\huge \vec p = \vec x_{i} - \mathbf{H}_{\vec x_{i}} ^{ -1} \nabla f(\vec x_{i}) $
Where $\mathbf{H}$ refers to the [[Hessian Matrix]].
[[Hessian Optimization]] tends to fail if the critical point has a flat nearby behavior, $\det(\mathbf{H})\approx {0}$ which causes its inverse to act chaotically. This is also harder to perform when operating in high order dimensions where taking an inverse is difficult.
### Derivation (Incomplete)
At point $\vec p=(p_x,p_y)$, we write the taylor series
$ \begin{align}
f(\vec x_{i}) &= f(\vec p)\\
&+ \frac{1}{2} {f_{xx}(\vec p)}(x_{i}) (x_{i}-p_{x})^{2} \\
&+ f_{xy}(x_{i}-p_{x})(y_{i}-p_{y}) \\
&+ \frac{1}{2} f_{yy}(y_{i}-p_{y})^{2} \\
\end{align}$
$\begin{align}
\begin{cases}
f_{x}(x_{i},y_{i}) &= f_{x x}(\vec p)(x_{i}-p_{x}) + f_{xy}(\vec p)(y_{i}-p_{y}) \\
f_{y}(x_{i},y_{i}) &= f_{x y}(\vec p)(x_{i}-p_{x}) + f_{yy}(\vec p)(y_{i}-p_{y}) \\
\end{cases}
\end{align}$
$\huge \begin{align}
\mat{
f_{x x} & f_{x y} \\
f_{x y} & f_{y y}
}
\mat{
x_{i}- p_{x} \\
y_{i} - p_{y}
}
&= \mat{
f_{x}(x_{i},y_{i}) \\
f_{y}(x_{i},y_{i})
}
\end{align}$
This system can be simplified as the [[Hessian Matrix]] of $f$ at $\vec p$ multiplied by $\vec x_i - \vec p$ equals the [[Gradient]] of $f$ at $\vec x_i$.
$\huge
\begin{align}
\mathbf{H} (\vec x - \vec p) &= \nabla f \\
\vec p &= \vec x_{i} - \mathbf{H}_{\vec x_{i}} ^{ -1} \nabla f(\vec x_{i})
\end{align} $