×
Well done. You've clicked the tower. This would actually achieve something if you had logged in first. Use the key for that. The name takes you home. This is where all the applicables sit. And you can't apply any changes to my site unless you are logged in.

Our policy is best summarized as "we don't care about _you_, we care about _them_", no emails, so no forgetting your password. You have no rights. It's like you don't even exist. If you publish material, I reserve the right to remove it, or use it myself.

Don't impersonate. Don't name someone involuntarily. You can lose everything if you cross the line, and no, I won't cancel your automatic payments first, so you'll have to do it the hard way. See how serious this sounds? That's how serious you're meant to take these.

×
Register


Required. 150 characters or fewer. Letters, digits and @/./+/-/_ only.
  • Your password can’t be too similar to your other personal information.
  • Your password must contain at least 8 characters.
  • Your password can’t be a commonly used password.
  • Your password can’t be entirely numeric.

Enter the same password as before, for verification.
Login

Grow A Dic
Define A Word
Make Space
Set Task
Mark Post
Apply Votestyle
Create Votes
(From: saved spaces)
Exclude Votes
Apply Dic
Exclude Dic

Click here to flash read.

We study the problem of transfer-learning in the setting of stochastic linear
bandit tasks. We consider that a low dimensional linear representation is
shared across the tasks, and study the benefit of learning this representation
in the multi-task learning setting. Following recent results to design
stochastic bandit policies, we propose an efficient greedy policy based on
trace norm regularization. It implicitly learns a low dimensional
representation by encouraging the matrix formed by the task regression vectors
to be of low rank. Unlike previous work in the literature, our policy does not
need to know the rank of the underlying matrix. We derive an upper bound on the
multi-task regret of our policy, which is, up to logarithmic factors, of order
$\sqrt{NdT(T+d)r}$, where $T$ is the number of tasks, $r$ the rank, $d$ the
number of variables and $N$ the number of rounds per task. We show the benefit
of our strategy compared to the baseline $Td\sqrt{N}$ obtained by solving each
task independently. We also provide a lower bound to the multi-task regret.
Finally, we corroborate our theoretical findings with preliminary experiments
on synthetic data.

Click here to read this post out
ID: 336738; Unique Viewers: 0
Unique Voters: 0
Total Votes: 0
Votes:
Latest Change: Aug. 16, 2023, 7:33 a.m. Changes:
Dictionaries:
Words:
Spaces:
Views: 7
CC:
No creative common's license
Comments: