Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals. (arXiv:2306.07071v2 [cs.LG] UPDATED)

Click here to flash read.

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a
player chooses from $K$ arms with unknown expected rewards and costs. The goal
is to maximize the total reward under a budget constraint. A player thus seeks
to choose the arm with the highest reward-cost ratio as often as possible.
Current state-of-the-art policies for this problem have several issues, which
we illustrate. To overcome them, we propose a new upper confidence bound (UCB)
sampling policy, $\omega$-UCB, that uses asymmetric confidence intervals. These
intervals scale with the distance between the sample mean and the bounds of a
random variable, yielding a more accurate and tight estimation of the
reward-cost ratio compared to our competitors. We show that our approach has
logarithmic regret and consistently outperforms existing policies in synthetic
and real settings.

Click here to read this post out

ID: 336757; Unique Viewers: 0

Unique Voters: 0

Total Votes: 0

Votes:

Latest Change: Aug. 16, 2023, 7:33 a.m. Changes:

/u/anonymous

Dictionaries:

Words:

Spaces:

CC:
No creative common's license

Comments: