PPO Distributed Centralized Critic

Environment Requirements

  • env: Must be registered in sdriving.environments.REGISTRY. The environment step function takes an action whose batch dim is of size N (number of agents). Currently, it doesn't support variable N over-time but it is very simple to implement (so open an issue if needed). It needs to return the observation for the next timestep, a BoolTensor of size N x 1 specifying if simulation for that agent is completed, Reward Tensor of size N x 1, and info similar to OpenAI Gym Environments.

  • env_params: These are passed to env as env(**env_params). So, ensure compatibility with the corresponding environment.

Logging / Checkpointing

  • log_dir: Path to the directory for storing the logs, and checkpoints

  • wandb_id: Id with which to log in wandb. If the id is same as one before, it will append the logs to it

  • load_path: Checkpoint from which to load a previously trained model

  • save_freq: The frequency with which to checkpoint models

Configurable HyperParameters

  • ac_kwargs: Arguments passed to PPOLidarActorCritic. observation_space and action_space are automatically passed from the env, so no need to pass those

  • seed: Random Seed

  • steps_per_epoch: Number of observation and action pairs to be collected before training the model. This is split equally across all the processes

  • epochs: Total epochs

  • gamma: Discount Factor

  • clip_ratio: Clip Ratio in the PPO Algorithm

  • pi_lr: Learning Rate for the Actor

  • vf_lr: Learning Rate for the Critic

  • train_iters: If >1 the training data is split into mini-batches for training, where batch_size = steps_per_epoch / (train_iters * num_of_processes)

  • entropy_coeff: Coefficient for entropy regularization

  • lam: Lambda for GAE-Lambda. (Always between 0 and 1, close to 1.)

  • target_kl: Roughly what KL divergence we think is appropriate between new and old policies after an update. This will get used for early stopping. (Usually small, 0.01 or 0.05.)

Last updated

Was this helpful?