Close the gaps: A learning-while-doing algorithm for single-product revenue management problems

Zizhuo Wang; Shiming Deng; Yinyu Ye

doi:10.1287/opre.2013.1245

Close the gaps: A learning-while-doing algorithm for single-product revenue management problems

Zizhuo Wang, Shiming Deng, Yinyu Ye

Industrial and Systems Engineering

Research output: Contribution to journal › Article › peer-review

102 Scopus citations

Abstract

We consider a retailer selling a single product with limited on-hand inventory over a finite selling season. Customer demand arrives according to a Poisson process, the rate of which is influenced by a single action taken by the retailer (such as price adjustment, sales commission, advertisement intensity, etc.). The relationship between the action and the demand rate is not known in advance. However, the retailer is able to learn the optimal action on the fly as she maximizes her total expected revenue based on the observed demand reactions. Using the pricing problem as an example, we propose a dynamic learning-while-doing algorithm that only involves function value estimation to achieve a near-optimal performance. Our algorithm employs a series of shrinking price intervals and iteratively tests prices within that interval using a set of carefully chosen parameters. We prove that the performance of our algorithm is among the best of all possible algorithms in terms of the asymptotic regret (the relative loss compared to the full information optimal solution). Our result closes the performance gaps between parametric and nonparametric learning and between the post-price mechanism and the customer-bidding mechanism. Important managerial insight from this research is that the values of information on both the parametric form of the demand function as well as each customer's exact reservation price are less important than prior literature suggests. Our results also suggest that firms would be better off to perform dynamic learning and action concurrently rather than sequentially.

Original language	English (US)
Pages (from-to)	318-331
Number of pages	14
Journal	Operations research
Volume	62
Issue number	2
DOIs	https://doi.org/10.1287/opre.2013.1245
State	Published - Jan 1 2014

Access

10.1287/opre.2013.1245

OpenUrl availability

Full text

Cite this

@article{ddd2fda775304885bec523ab3fc35766,

title = "Close the gaps: A learning-while-doing algorithm for single-product revenue management problems",

abstract = "We consider a retailer selling a single product with limited on-hand inventory over a finite selling season. Customer demand arrives according to a Poisson process, the rate of which is influenced by a single action taken by the retailer (such as price adjustment, sales commission, advertisement intensity, etc.). The relationship between the action and the demand rate is not known in advance. However, the retailer is able to learn the optimal action on the fly as she maximizes her total expected revenue based on the observed demand reactions. Using the pricing problem as an example, we propose a dynamic learning-while-doing algorithm that only involves function value estimation to achieve a near-optimal performance. Our algorithm employs a series of shrinking price intervals and iteratively tests prices within that interval using a set of carefully chosen parameters. We prove that the performance of our algorithm is among the best of all possible algorithms in terms of the asymptotic regret (the relative loss compared to the full information optimal solution). Our result closes the performance gaps between parametric and nonparametric learning and between the post-price mechanism and the customer-bidding mechanism. Important managerial insight from this research is that the values of information on both the parametric form of the demand function as well as each customer's exact reservation price are less important than prior literature suggests. Our results also suggest that firms would be better off to perform dynamic learning and action concurrently rather than sequentially.",

author = "Zizhuo Wang and Shiming Deng and Yinyu Ye",

year = "2014",

month = jan,

day = "1",

doi = "10.1287/opre.2013.1245",

language = "English (US)",

volume = "62",

pages = "318--331",

journal = "Operations research",

issn = "0030-364X",

publisher = "INFORMS Inst.for Operations Res.and the Management Sciences",

number = "2",

}

TY - JOUR

T1 - Close the gaps

T2 - A learning-while-doing algorithm for single-product revenue management problems

AU - Wang, Zizhuo

AU - Deng, Shiming

AU - Ye, Yinyu

PY - 2014/1/1

Y1 - 2014/1/1

N2 - We consider a retailer selling a single product with limited on-hand inventory over a finite selling season. Customer demand arrives according to a Poisson process, the rate of which is influenced by a single action taken by the retailer (such as price adjustment, sales commission, advertisement intensity, etc.). The relationship between the action and the demand rate is not known in advance. However, the retailer is able to learn the optimal action on the fly as she maximizes her total expected revenue based on the observed demand reactions. Using the pricing problem as an example, we propose a dynamic learning-while-doing algorithm that only involves function value estimation to achieve a near-optimal performance. Our algorithm employs a series of shrinking price intervals and iteratively tests prices within that interval using a set of carefully chosen parameters. We prove that the performance of our algorithm is among the best of all possible algorithms in terms of the asymptotic regret (the relative loss compared to the full information optimal solution). Our result closes the performance gaps between parametric and nonparametric learning and between the post-price mechanism and the customer-bidding mechanism. Important managerial insight from this research is that the values of information on both the parametric form of the demand function as well as each customer's exact reservation price are less important than prior literature suggests. Our results also suggest that firms would be better off to perform dynamic learning and action concurrently rather than sequentially.

AB - We consider a retailer selling a single product with limited on-hand inventory over a finite selling season. Customer demand arrives according to a Poisson process, the rate of which is influenced by a single action taken by the retailer (such as price adjustment, sales commission, advertisement intensity, etc.). The relationship between the action and the demand rate is not known in advance. However, the retailer is able to learn the optimal action on the fly as she maximizes her total expected revenue based on the observed demand reactions. Using the pricing problem as an example, we propose a dynamic learning-while-doing algorithm that only involves function value estimation to achieve a near-optimal performance. Our algorithm employs a series of shrinking price intervals and iteratively tests prices within that interval using a set of carefully chosen parameters. We prove that the performance of our algorithm is among the best of all possible algorithms in terms of the asymptotic regret (the relative loss compared to the full information optimal solution). Our result closes the performance gaps between parametric and nonparametric learning and between the post-price mechanism and the customer-bidding mechanism. Important managerial insight from this research is that the values of information on both the parametric form of the demand function as well as each customer's exact reservation price are less important than prior literature suggests. Our results also suggest that firms would be better off to perform dynamic learning and action concurrently rather than sequentially.

UR - http://www.scopus.com/inward/record.url?scp=84899561628&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899561628&partnerID=8YFLogxK

U2 - 10.1287/opre.2013.1245

DO - 10.1287/opre.2013.1245

M3 - Article

AN - SCOPUS:84899561628

SN - 0030-364X

VL - 62

SP - 318

EP - 331

JO - Operations research

JF - Operations research

IS - 2

ER -

Close the gaps: A learning-while-doing algorithm for single-product revenue management problems

Abstract

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this