Optimization

Forward and reverse gradient-based hyperparameter optimization

Posted on 2020-08-28 at 14:16:29 UTC-0600

In the area of hyperparameter optimization (HO), the goal is to optimize a response function of the hyperparameters. The response function is usually the average loss on a validation set. Gradient-based HO refers to iteratively finding the optimal hyperparameters using gradient updates, just as we do with neural network training itself. The gradient of the response function with respect to the hyperparameters is called the hypergradient. One of the great things about this work is that their framework allows for all kinds of hyperparameters. The response function can be based on evaluation over the training set, the validation set, or both. The hyperparameters can be part of the loss function, part of regularization, or part of the model architecture.