Hyperparameter tuning

To be able to tune your lunar lander agent to perform in different scenarios, you will change hyperparameters. Detailed instructions for tuning hyperparameters will be listed in the SageMaker notebook.

  • In the jupyter notebook environment, navigate to inside the /src folder and select the train-lunarlander-PPO.py file.

  • In this file, you will see the following hyperparameters:

You can change the default hyperparameters in this file to try to tune your lunar lander agent to perform in different scenarios.

The hyperparameters you can optimize are:

  • "lambda" - Lambda parameter corresponds to the discount factor used in discounted formulations of Markov Decision Processes (MDPs)

  • "kl_coeff" - Initial coefficient of Kullback-Leibler (KL) Divergence between the previous value function and the new value function

  • "vf_loss_coeff" - Scaling value function loss to make it comparable in scale to the policy loss

  • "num_sgd_iter" - Number of Stochastic Gradient Descent (SGD) iterations in each outer loop while learning the value function

  • "clip_param" - Proximal Policy Optimization (PPO) clip parameter for limiting the surrogate function during exploration

  • "training_iteration" - You can minimize training time by setting this to something lower than what is it by default, which is 50 training iterations. For example, to lower training time you can change it to 20 iterations.

Hyperparameter tuning hints

Here are some hints for how to tune the hyperparameters above to best optimize your lunar lander agent:

  • dramatically reduce clip_param
  • slightly reduce lambda
  • reduce num_sgd_iter
  • reduce vf_loss_coeff
  • reduce kl_coeff

Detailed, step-by-step instructions are in the SageMaker notebook called lunarlander-tutorial.ipynb