Introduction

Basic Description of the Agents Module

Three variants of PPO are currently implemented:

Method

Python Module

Notes

Action Space

Observation Space

Compatible Environments

PPO Distributed Centralized Critic

ppo_distributed

Centralized Training with Decentralized Execution

Box / Discrete

Tuple

1.1 - 1.7, 2.1 - 2.6

PPO OneStep

ppo_one_step

Optimized Implementation for Single Step RL

Box / Discrete

Box

3.1

PPO Alternating Optimization

ppo_altopt

PPO with Bi-Level Optimization

Tuple[Box / Discrete, Box / Discrete]

Tuple[Box, Tuple]

3.2 - 3.5

To view the training configurations run:

$ python -m sdriving.agents.<module>.train --help

The training has been tested with horovodrun. If you want to use only a single process run it with horovodrun -np 1.

The training module is expected to work seamlessly in a distributed setup (even with preemption). In the case of preemption, we load the latest checkpoint from the save directory. For this, to work as expected save in a directory which is unique to a particular slurm job id (eg. <base path>/$SLURM_JOB_ID).

Last updated

Was this helpful?