Rotting Infinitely Many-armed Bandits beyond the Worst-case Rotting: An Adaptive Approach

Click here to flash read.

arXiv:2404.14202v1 Announce Type: new
Abstract: In this study, we consider the infinitely many armed bandit problems in rotting environments, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios capturing problem-dependent characteristics regarding the decay of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting scenario, and the other in which the number of rotting instances is bounded by $S_T$, referred to as the abrupt-rotting scenario. To address the challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards. Our proposed algorithm achieves tight regret bounds for both slow and abrupt rotting scenarios. Lastly, we demonstrate the performance of our algorithms using synthetic datasets.

Click here to read this post out

ID: 817119; Unique Viewers: 0

Unique Voters: 0

Total Votes: 0

Votes:

Latest Change: April 23, 2024, 7:31 a.m. Changes:

/u/anonymous

Dictionaries:

Words:

Spaces:

CC:
No creative common's license

Comments: