Click here to flash read.
We study parameterized MDPs (PMDPs) in which the key parameters of interest
are unknown and must be learned using Bayesian inference. One key defining
feature of such models is the presence of "uninformative" actions that provide
no information about the unknown parameters. We contribute a set of assumptions
for PMDPs under which Thompson sampling guarantees an asymptotically optimal
expected regret bound of $O(T^{-1})$, which are easily verified for many
classes of problems such as queuing, inventory control, and dynamic pricing.
Click here to read this post out
ID: 129525; Unique Viewers: 0
Voters: 0
Latest Change: May 16, 2023, 7:31 a.m.
Changes:
Dictionaries:
Words:
Spaces:
Comments:
Newcom