Example Slurm Script

Example Slurm Script to train agents

sdriving uses slurm for managing the training workflow, if you are not familiar with slurm have a look at https://slurm.schedmd.com/sbatch.html.

#!/bin/bash
#SBATCH --partition=rtx6000
#SBATCH --gres=gpu:4
#SBATCH -c 1 --ntasks 40
#SBATCH --mem=100G
#SBATCH --time=48:00:00

source $HOME/.bashrc
. $HOME/envsetups/matc
cd $HOME/research/sdriving/

args="-s /checkpoint/$USER/$SLURM_JOB_ID --env MultiAgentNuscenesIntersectionDrivingDiscreteEnvironment --eid ckpt"
args="$args -se 32000 -e 10000 --pi-lr 1e-3 --vf-lr 1e-3 --seed $RANDOM --entropy-coeff 0.02 --target-kl 0.01 -ti 20 -wid $SLURM_JOB_ID"

horovodrun -np 40 python -W ignore -m sdriving.agents.ppo_distributed.train $args --ac-kwargs "{\"hidden_sizes\": [256, 256], \"history_len\": 5, \"permutation_invariant\": true}" --env-kwargs "{\"map_path\": \"data/*.pth\", \"horizon\": 300, \"nagents\": 16,  \"lidar_noise\": 0.0, \"history_len\": 5, \"timesteps\": 10, \"npoints\": 100}"

This slurm script demonstrates the basic workflow using the agents module. The configurable parameters for training can be obtained using python -m sdriving.agents.<module>.train --help

Last updated