# Introduction

Three variants of PPO are currently implemented:

| Method                             | Python Module    | Notes                                             | Action Space                           | Observation Space  | Compatible Environments |
| ---------------------------------- | ---------------- | ------------------------------------------------- | -------------------------------------- | ------------------ | ----------------------- |
| PPO Distributed Centralized Critic | ppo\_distributed | Centralized Training with Decentralized Execution | Box / Discrete                         | Tuple              | 1.1 - 1.7, 2.1 - 2.6    |
| PPO OneStep                        | ppo\_one\_step   | Optimized Implementation for Single Step RL       | Box / Discrete                         | Box                | 3.1                     |
| PPO Alternating Optimization       | ppo\_altopt      | PPO with Bi-Level Optimization                    | Tuple\[Box / Discrete, Box / Discrete] | Tuple\[Box, Tuple] | 3.2 - 3.5               |

To view the training configurations run:

```bash
$ python -m sdriving.agents.<module>.train --help
```

{% hint style="info" %}
The training has been tested with `horovodrun`. If you want to use only a single process run it with `horovodrun -np 1`.&#x20;
{% endhint %}

{% hint style="info" %}
The training module is expected to work seamlessly in a distributed setup (even with preemption). In the case of preemption, we load the latest checkpoint from the save directory. For this, to work as expected save in a directory which is unique to a particular slurm job id (eg. `<base path>/$SLURM_JOB_ID`).
{% endhint %}
