|
|
|
Trajectory Transformer |
|
|
|
This model is in maintenance mode only, so we won't accept any new PRs changing its code. |
|
If you run into any issues running this model, please reinstall the last version that supported this model: v4.30.0. |
|
You can do so by running the following command: pip install -U transformers==4.30.0. |
|
|
|
Overview |
|
The Trajectory Transformer model was proposed in Offline Reinforcement Learning as One Big Sequence Modeling Problem by Michael Janner, Qiyang Li, Sergey Levine. |
|
The abstract from the paper is the following: |
|
Reinforcement learning (RL) is typically concerned with estimating stationary policies or single-step models, |
|
leveraging the Markov property to factorize problems in time. However, we can also view RL as a generic sequence |
|
modeling problem, with the goal being to produce a sequence of actions that leads to a sequence of high rewards. |
|
Viewed in this way, it is tempting to consider whether high-capacity sequence prediction models that work well |
|
in other domains, such as natural-language processing, can also provide effective solutions to the RL problem. |
|
To this end, we explore how RL can be tackled with the tools of sequence modeling, using a Transformer architecture |
|
to model distributions over trajectories and repurposing beam search as a planning algorithm. Framing RL as sequence |
|
modeling problem simplifies a range of design decisions, allowing us to dispense with many of the components common |
|
in offline RL algorithms. We demonstrate the flexibility of this approach across long-horizon dynamics prediction, |
|
imitation learning, goal-conditioned RL, and offline RL. Further, we show that this approach can be combined with |
|
existing model-free algorithms to yield a state-of-the-art planner in sparse-reward, long-horizon tasks. |
|
This model was contributed by CarlCochet. The original code can be found here. |
|
Usage tips |
|
This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from |
|
actions, states and rewards from all previous timesteps. This model will treat all these elements together |
|
as one big sequence (a trajectory). |
|
TrajectoryTransformerConfig |
|
[[autodoc]] TrajectoryTransformerConfig |
|
TrajectoryTransformerModel |
|
[[autodoc]] TrajectoryTransformerModel |
|
- forward |