Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling. (arXiv:2305.08062v1 [stat.ML])

Click here to flash read.

We study off-policy evaluation (OPE) of contextual bandit policies for large
discrete action spaces where conventional importance-weighting approaches
suffer from excessive variance. To circumvent this variance issue, we propose a
new estimator, called OffCEM, that is based on the conjunct effect model (CEM),
a novel decomposition of the causal effect into a cluster effect and a residual
effect. OffCEM applies importance weighting only to action clusters and
addresses the residual causal effect through model-based reward estimation. We
show that the proposed estimator is unbiased under a new condition, called
local correctness, which only requires that the residual-effect model preserves
the relative expected reward differences of the actions within each cluster. To
best leverage the CEM and local correctness, we also propose a new two-step
procedure for performing model-based estimation that minimizes bias in the
first step and variance in the second step. We find that the resulting OffCEM
estimator substantially improves bias and variance compared to a range of
conventional estimators. Experiments demonstrate that OffCEM provides
substantial improvements in OPE especially in the presence of many actions.

Click here to read this post out

ID: 129639; Unique Viewers: 0

Unique Voters: 0

Total Votes: 0

Votes:

Latest Change: May 16, 2023, 7:31 a.m. Changes:

/u/anonymous

Dictionaries:

Words:

Spaces:

CC:
No creative common's license

Comments: