• Login
    View Item 
    •   IIMA Institutional Repository Home
    • Faculty Publications (Bibliographic)
    • Journal Articles
    • View Item
    •   IIMA Institutional Repository Home
    • Faculty Publications (Bibliographic)
    • Journal Articles
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A gradient-based bilevel optimization approach for tuning regularization hyperparameters

    Thumbnail
    Date
    2023-09-29
    Author
    Sinha, Ankur
    Khandait, Tanmay
    Mohanty, Raja
    Metadata
    Show full item record
    Abstract
    Hyperparameter tuning in the area of machine learning is often achieved using naive techniques, such as random search and grid search. However, most of these methods seldom lead to an optimal set of hyperparameters and often get very expensive. The hyperparameter optimization problem is inherently a bilevel optimization task, and there exist studies that have attempted bilevel solution methodologies to solve this problem. These techniques often assume a unique set of weights that minimizes the loss on the training set. Such an assumption is violated by deep learning architectures. We propose a bilevel solution method for solving the hyperparameter optimization problem that does not suffer from the drawbacks of the earlier studies. The proposed method is general and can be easily applied to any class of machine learning algorithms that involve continuous hyperparameters. The idea is based on the approximation of the lower level optimal value function mapping that helps in reducing the bilevel problem to a single-level constrained optimization task. The single-level constrained optimization problem is then solved using the augmented Lagrangian method. We perform extensive computational study on three datasets that confirm the efficiency of the proposed method. A comparative study against grid search, random search, Tree-structured Parzen Estimator and Quasi Monte Carlo Sampler shows that the proposed algorithm is multiple times faster and leads to models that generalize better on the testing set.
    URI
    http://hdl.handle.net/11718/27004
    Collections
    • Journal Articles [3738]

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of IIMA Institutional RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Login

    Statistics

    View Usage Statistics

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV