Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates

Yuhong Yang; Dan Zhu

doi:10.1214/aos/1015362186

Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates

Yuhong Yang, Dan Zhu

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

59 Scopus citations

Abstract

We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.

Original language	English (US)
Pages (from-to)	100-121
Number of pages	22
Journal	Annals of Statistics
Volume	30
Issue number	1
DOIs	https://doi.org/10.1214/aos/1015362186
State	Published - Feb 2002

Keywords

Concomitant variable
Multi-armed bandits
Nonparametric regression
Randomized allocation
Sequential allocation

Access

10.1214/aos/1015362186

OpenUrl availability

Full text

Cite this

@article{0b92e29d49754e7f8d4ae8f1463ce905,

title = "Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates",

abstract = "We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.",

keywords = "Concomitant variable, Multi-armed bandits, Nonparametric regression, Randomized allocation, Sequential allocation",

author = "Yuhong Yang and Dan Zhu",

year = "2002",

month = feb,

doi = "10.1214/aos/1015362186",

language = "English (US)",

volume = "30",

pages = "100--121",

journal = "Annals of Statistics",

issn = "0090-5364",

publisher = "Institute of Mathematical Statistics",

number = "1",

}

TY - JOUR

T1 - Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates

AU - Yang, Yuhong

AU - Zhu, Dan

PY - 2002/2

Y1 - 2002/2

N2 - We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.

AB - We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.

KW - Concomitant variable

KW - Multi-armed bandits

KW - Nonparametric regression

KW - Randomized allocation

KW - Sequential allocation

UR - http://www.scopus.com/inward/record.url?scp=0036108219&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036108219&partnerID=8YFLogxK

U2 - 10.1214/aos/1015362186

DO - 10.1214/aos/1015362186

M3 - Article

AN - SCOPUS:0036108219

SN - 0090-5364

VL - 30

SP - 100

EP - 121

JO - Annals of Statistics

JF - Annals of Statistics

IS - 1

ER -

Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates

Abstract

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this