Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates

Yuhong Yang, Dan Zhu

Research output: Contribution to journalArticlepeer-review

59 Scopus citations

Abstract

We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.

Original languageEnglish (US)
Pages (from-to)100-121
Number of pages22
JournalAnnals of Statistics
Volume30
Issue number1
DOIs
StatePublished - Feb 2002

Keywords

  • Concomitant variable
  • Multi-armed bandits
  • Nonparametric regression
  • Randomized allocation
  • Sequential allocation

Fingerprint

Dive into the research topics of 'Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates'. Together they form a unique fingerprint.

Cite this