Click here to flash read.
We present a stochastic first-order optimization method specialized for deep
neural networks (DNNs), ECCO-DNN. This method models the optimization variable
trajectory as a dynamical system and develops a discretization algorithm that
adaptively selects step sizes based on the trajectory's shape. This provides
two key insights: designing the dynamical system for fast continuous-time
convergence and developing a time-stepping algorithm to adaptively select step
sizes based on principles of numerical integration and neural network
structure. The result is an optimizer with performance that is insensitive to
hyperparameter variations and that achieves comparable performance to
state-of-the-art optimizers including ADAM, SGD, RMSProp, and AdaGrad. We
demonstrate this in training DNN models and datasets, including CIFAR-10 and
CIFAR-100 using ECCO-DNN and find that ECCO-DNN's single hyperparameter can be
changed by three orders of magnitude without affecting the trained models'
accuracies. ECCO-DNN's insensitivity reduces the data and computation needed
for hyperparameter tuning, making it advantageous for rapid prototyping and for
applications with new datasets. To validate the efficacy of our proposed
optimizer, we train an LSTM architecture on a household power consumption
dataset with ECCO-DNN and achieve an optimal mean-square-error without tuning
hyperparameters.
No creative common's license