Click here to flash read.
We study the problem of decentralized constrained POMDPs in a team-setting
where the multiple non-strategic agents have asymmetric information. Strong
duality is established for the setting of infinite-horizon expected total
discounted costs when the observations lie in a countable space, the actions
are chosen from a finite space, and the immediate cost functions are bounded.
Following this, connections with the common-information and approximate
information-state approaches are established. The approximate
information-states are characterized independent of the Lagrange-multipliers
vector so that adaptations of the multiplier (during learning) will not
necessitate new representations. Finally, a primal-dual multi-agent
reinforcement learning (MARL) framework based on centralized training
distributed execution (CTDE) and three time-scale stochastic approximation is
developed with the aid of recurrent and feedforward neural-networks as
function-approximators.
No creative common's license