Implementation of conjugate prior regularization; this means an Inverse-Wishart prior over the covariance matrices, a Normal prior over the means and a Dirichlet distribution prior over the weights.
Implementation of standard gradient ascent
Regularization term of the form scale*log(det(cov))
Implementation of gradient descent with momentum
Implementation of gradient ascent with Nesterov's correction
Contains basic functionality for an object that can be modified by Optimizer
Contains base hyperparameters with the respective getters and setters.
Contains common mathematical operations that can be performed in both matrices and vectors.
Contains common mathematical operations that can be performed in both matrices and vectors. Its purpose is avoid duplicating code in the optimization algorithms' classes
Contains basic functionality for a regularization term.
Softmax mapping to fit the weights vector
Softmax mapping to fit the weights vector
The precise mapping is w_i => log(w_i/w_last) and is an implementation of the procedure described here
Contains the basic functionality for any mapping to be used to fit the weights of a mixture model
Implementation of ADAM.
Implementation of ADAM. See Adam: A Method for Stochastic Optimization. Kingma, Diederik P.; Ba, Jimmy, 2014
Using it is NOT recommended; you should use SGD or its accelerated versions instead.
(Since version gradientgmm >= 1.4) ADAM can be unstable for GMM problems and should not be used
Implementation of ADAMAX.
Implementation of ADAMAX. See Adam: A Method for Stochastic Optimization. Kingma, Diederik P.; Ba, Jimmy, 2014
Using it is NOT recommended; you should use SGD or its accelerated versions instead.
(Since version gradientgmm >= 1.4) ADAMAX can be unstable for GMM problems and should not be used
Ratio mapping to fit the weight vector
Ratio mapping to fit the weight vector
The precise mapping is w_i => w_i/w_last
Using it is NOT recommended; use the default Softmax transformation instead.
(Since version gradientgmm >= 1.4) This is an experimental, numerically unstable transformation and should not be used
Implementation of gradient descent with momentum
As formulated in Goh, "Why Momentum Really Works", Distill, 2017. http://doi.org/10.23915/distill.00006