TY - JOUR
T1 - Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates
AU - Yang, Yuhong
AU - Zhu, Dan
PY - 2002/2
Y1 - 2002/2
N2 - We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.
AB - We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.
KW - Concomitant variable
KW - Multi-armed bandits
KW - Nonparametric regression
KW - Randomized allocation
KW - Sequential allocation
UR - http://www.scopus.com/inward/record.url?scp=0036108219&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036108219&partnerID=8YFLogxK
U2 - 10.1214/aos/1015362186
DO - 10.1214/aos/1015362186
M3 - Article
AN - SCOPUS:0036108219
SN - 0090-5364
VL - 30
SP - 100
EP - 121
JO - Annals of Statistics
JF - Annals of Statistics
IS - 1
ER -