Introduction
Basic Description of the Agents Module
Three variants of PPO are currently implemented:
Method
Python Module
Notes
Action Space
Observation Space
Compatible Environments
PPO Distributed Centralized Critic
ppo_distributed
Centralized Training with Decentralized Execution
Box / Discrete
Tuple
1.1 - 1.7, 2.1 - 2.6
PPO OneStep
ppo_one_step
Optimized Implementation for Single Step RL
Box / Discrete
Box
3.1
PPO Alternating Optimization
ppo_altopt
PPO with Bi-Level Optimization
Tuple[Box / Discrete, Box / Discrete]
Tuple[Box, Tuple]
3.2 - 3.5
To view the training configurations run:
The training has been tested with horovodrun
. If you want to use only a single process run it with horovodrun -np 1
.
The training module is expected to work seamlessly in a distributed setup (even with preemption). In the case of preemption, we load the latest checkpoint from the save directory. For this, to work as expected save in a directory which is unique to a particular slurm job id (eg. <base path>/$SLURM_JOB_ID
).
Last updated
Was this helpful?