A gradient-based bilevel optimization approach for tuning regularization hyperparameters

Sinha, Ankur; Khandait, Tanmay; Mohanty, Raja

IIMA Institutional Repository

This Institutional Repository has been created to collect, preserve and distribute the scholarly output of Indian Institute of Management, Ahmedabad. This will work as an important tool to facilitate scholarly communication and preserve the institution knowledge. The Vikram Sarabhai Library is proud to be hosting the repository for the dissemination and preservation of this valuable knowledge resource of the IIMA community.

Please use this identifier to cite or link to this item: http://hdl.handle.net/11718/27004

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sinha, Ankur	-
dc.contributor.author	Khandait, Tanmay	-
dc.contributor.author	Mohanty, Raja	-
dc.date.accessioned	2024-01-03T09:14:33Z	-
dc.date.available	2024-01-03T09:14:33Z	-
dc.date.issued	2023-09-29	-
dc.identifier.issn	18624480	-
dc.identifier.uri	http://hdl.handle.net/11718/27004	-
dc.description.abstract	Hyperparameter tuning in the area of machine learning is often achieved using naive techniques, such as random search and grid search. However, most of these methods seldom lead to an optimal set of hyperparameters and often get very expensive. The hyperparameter optimization problem is inherently a bilevel optimization task, and there exist studies that have attempted bilevel solution methodologies to solve this problem. These techniques often assume a unique set of weights that minimizes the loss on the training set. Such an assumption is violated by deep learning architectures. We propose a bilevel solution method for solving the hyperparameter optimization problem that does not suffer from the drawbacks of the earlier studies. The proposed method is general and can be easily applied to any class of machine learning algorithms that involve continuous hyperparameters. The idea is based on the approximation of the lower level optimal value function mapping that helps in reducing the bilevel problem to a single-level constrained optimization task. The single-level constrained optimization problem is then solved using the augmented Lagrangian method. We perform extensive computational study on three datasets that confirm the efficiency of the proposed method. A comparative study against grid search, random search, Tree-structured Parzen Estimator and Quasi Monte Carlo Sampler shows that the proposed algorithm is multiple times faster and leads to models that generalize better on the testing set.	en_US
dc.language.iso	en	en_US
dc.publisher	Springer	en_US
dc.relation.ispartof	Optimization Letters	en_US
dc.subject	Bilevel optimization	en_US
dc.subject	Hyperparameter tuning	en_US
dc.subject	Machine learning	en_US
dc.subject	Model regularization	en_US
dc.subject	Augmented Lagrangian method	en_US
dc.title	A gradient-based bilevel optimization approach for tuning regularization hyperparameters	en_US
dc.type	Article	en_US
Appears in Collections:	Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record