×
Well done. You've clicked the tower. This would actually achieve something if you had logged in first. Use the key for that. The name takes you home. This is where all the applicables sit. And you can't apply any changes to my site unless you are logged in.

Our policy is best summarized as "we don't care about _you_, we care about _them_", no emails, so no forgetting your password. You have no rights. It's like you don't even exist. If you publish material, I reserve the right to remove it, or use it myself.

Don't impersonate. Don't name someone involuntarily. You can lose everything if you cross the line, and no, I won't cancel your automatic payments first, so you'll have to do it the hard way. See how serious this sounds? That's how serious you're meant to take these.

×
Register


Required. 150 characters or fewer. Letters, digits and @/./+/-/_ only.
  • Your password can’t be too similar to your other personal information.
  • Your password must contain at least 8 characters.
  • Your password can’t be a commonly used password.
  • Your password can’t be entirely numeric.

Enter the same password as before, for verification.
Login

Grow A Dic
Define A Word
Make Space
Set Task
Mark Post
Apply Votestyle
Create Votes
(From: saved spaces)
Exclude Votes
Apply Dic
Exclude Dic

Click here to flash read.

arXiv:2402.03923v2 Announce Type: replace
Abstract: Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. However, as applications broaden, it becomes increasingly crucial to train agents that not only maximize the returns, but align the actual return with a specified target return, giving control over the agent's performance. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the target return through supervised learning and is equipped with a mechanism to control the agent using the target return. Despite being designed to align the actual return with the target return, we have empirically identified a discrepancy between the actual return and the target return in DT. In this paper, we propose Return-Aligned Decision Transformer (RADT), designed to effectively align the actual return with the target return. Our model decouples returns from the conventional input sequence, which typically consists of returns, states, and actions, to enhance the relationships between returns and states, as well as returns and actions. Extensive experiments show that RADT reduces the discrepancies between the actual return and the target return of DT-based methods.

Click here to read this post out
ID: 820263; Unique Viewers: 0
Unique Voters: 0
Total Votes: 0
Votes:
Latest Change: April 24, 2024, 7:31 a.m. Changes:
Dictionaries:
Words:
Spaces:
Views: 6
CC:
No creative common's license
Comments: