A modified version of [[Optimization]] / [[Gradient Descent]]. During the descent process, we use higher order [[Derivative]] / [[Gradient]] information at each point. $\huge \vec p = \vec x_{i} - \mathbf{H}_{\vec x_{i}} ^{ -1} \nabla f(\vec x_{i}) $ Where $\mathbf{H}$ refers to the [[Hessian Matrix]]. [[Hessian Optimization]] tends to fail if the critical point has a flat nearby behavior, $\det(\mathbf{H})\approx {0}$ which causes its inverse to act chaotically. This is also harder to perform when operating in high order dimensions where taking an inverse is difficult. ### Derivation (Incomplete) At point $\vec p=(p_x,p_y)$, we write the taylor series $ \begin{align} f(\vec x_{i}) &= f(\vec p)\\ &+ \frac{1}{2} {f_{xx}(\vec p)}(x_{i}) (x_{i}-p_{x})^{2} \\ &+ f_{xy}(x_{i}-p_{x})(y_{i}-p_{y}) \\ &+ \frac{1}{2} f_{yy}(y_{i}-p_{y})^{2} \\ \end{align}$ $\begin{align} \begin{cases} f_{x}(x_{i},y_{i}) &= f_{x x}(\vec p)(x_{i}-p_{x}) + f_{xy}(\vec p)(y_{i}-p_{y}) \\ f_{y}(x_{i},y_{i}) &= f_{x y}(\vec p)(x_{i}-p_{x}) + f_{yy}(\vec p)(y_{i}-p_{y}) \\ \end{cases} \end{align}$ $\huge \begin{align} \mat{ f_{x x} & f_{x y} \\ f_{x y} & f_{y y} } \mat{ x_{i}- p_{x} \\ y_{i} - p_{y} } &= \mat{ f_{x}(x_{i},y_{i}) \\ f_{y}(x_{i},y_{i}) } \end{align}$ This system can be simplified as the [[Hessian Matrix]] of $f$ at $\vec p$ multiplied by $\vec x_i - \vec p$ equals the [[Gradient]] of $f$ at $\vec x_i$. $\huge \begin{align} \mathbf{H} (\vec x - \vec p) &= \nabla f \\ \vec p &= \vec x_{i} - \mathbf{H}_{\vec x_{i}} ^{ -1} \nabla f(\vec x_{i}) \end{align} $