Gradient Descent - goose pond

[[Gradient]] Descent is a type of iterative method for [[Optimization]] for a multivariable [[Derivative|Differentiable]] [[Function]]. $\huge \mathbf{\vec x}_{n+1} = \mathbf{\vec x}_{n} + \gamma \nabla f(\mathbf{\vec x}_{n}) $ $\huge \gamma \in \R $ >[!note] Note that $\gamma$ is often called the 'learning rate' when referring to [[Machine Learning]] >[!example] Example: 2 Variables >$\huge \begin{align} >z = f(x,y) >\end{align} $ > >Find $p_x,p_y$ such that $f(p_{x},p_{y})$ is [[Maxima|maximized]]. > >We take the [[Gradient]] of $f$ and use it for each iteration. > >$\huge \begin{align} > >\mat{x_{i+1} \\ y_{i+1}} = > >\mat{x_{i} \\ y_{i}} + >\gamma >\nabla f\pa{ >\mat{x_{i} \\ y_{i}} >} > > >\end{align} $ > >[!info] [[Alexander Young|Young's]] Favorite Method A posible test for checking that your learn rate is too high, is to cut the learn rate $\gamma$ in half if $\nabla f(x_{i}) \cdot \nabla f(x_{i+1}) < -\iota$, where $\iota\in \R^{+}$ and is the cutting rate (ex. $\frac{1}{2}$).