TY - JOUR
T1 - Surrogate Modeling for Bayesian Optimization beyond a Single Gaussian Process
AU - Lu, Qin
AU - Polyzos, Konstantinos D.
AU - Li, Bingcong
AU - Giannakis, Georgios B.
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2023/9/1
Y1 - 2023/9/1
N2 - Bayesian optimization (BO) has well-documented merits for optimizing black-box functions with an expensive evaluation cost. Such functions emerge in applications as diverse as hyperparameter tuning, drug discovery, and robotics. BO hinges on a Bayesian surrogate model to sequentially select query points so as to balance exploration with exploitation of the search space. Most existing works rely on a single Gaussian process (GP) based surrogate model, where the kernel function form is typically preselected using domain knowledge. To bypass such a design process, this paper leverages an ensemble (E) of GPs to adaptively select the surrogate model fit on-the-fly, yielding a GP mixture posterior with enhanced expressiveness for the sought function. Acquisition of the next evaluation input using this EGP-based function posterior is then enabled by Thompson sampling (TS) that requires no additional design parameters. To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model. The novel EGP-TS readily accommodates parallel operation. To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret for both sequential and parallel settings. Tests on synthetic functions and real-world applications showcase the merits of the proposed method.
AB - Bayesian optimization (BO) has well-documented merits for optimizing black-box functions with an expensive evaluation cost. Such functions emerge in applications as diverse as hyperparameter tuning, drug discovery, and robotics. BO hinges on a Bayesian surrogate model to sequentially select query points so as to balance exploration with exploitation of the search space. Most existing works rely on a single Gaussian process (GP) based surrogate model, where the kernel function form is typically preselected using domain knowledge. To bypass such a design process, this paper leverages an ensemble (E) of GPs to adaptively select the surrogate model fit on-the-fly, yielding a GP mixture posterior with enhanced expressiveness for the sought function. Acquisition of the next evaluation input using this EGP-based function posterior is then enabled by Thompson sampling (TS) that requires no additional design parameters. To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model. The novel EGP-TS readily accommodates parallel operation. To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret for both sequential and parallel settings. Tests on synthetic functions and real-world applications showcase the merits of the proposed method.
KW - Bayesian optimization
KW - Bayesian regret analysis
KW - Gaussian processes
KW - Thompson sampling
KW - ensemble learning
UR - http://www.scopus.com/inward/record.url?scp=85153347308&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85153347308&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2023.3264741
DO - 10.1109/TPAMI.2023.3264741
M3 - Article
C2 - 37018108
AN - SCOPUS:85153347308
SN - 0162-8828
VL - 45
SP - 11283
EP - 11296
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 9
ER -