PPO Agent Playing LunarLander-v2

This is a trained model of a PPO agent playing LunarLander-v2.

The agent was trained with Gymnasium on LunarLander-v3. The model card uses LunarLander-v2 because the Hugging Face Deep RL Course Unit 8 checker still looks for that environment tag.

Evaluation

Mean reward: 240.74
Standard deviation: 16.21
Course score: 224.53

Hyperparameters

{
  "anneal_lr": true,
  "batch_size": 16384,
  "capture_video": false,
  "clip_coef": 0.2,
  "clip_vloss": true,
  "course_env_id": "LunarLander-v2",
  "cuda": true,
  "ent_coef": 0.01,
  "env_id": "LunarLander-v3",
  "eval_deterministic": true,
  "eval_episodes": 10,
  "eval_video_dir": null,
  "evaluate": false,
  "exp_name": "ppo_lunarlander",
  "gae": true,
  "gae_lambda": 0.98,
  "gamma": 0.999,
  "headless": false,
  "learning_rate": 0.0003,
  "load_model_path": "checkpoints/ppo_lunarlander.pt",
  "logs_dir": "runs/LunarLander-v3__ppo_lunarlander__3__1779423167",
  "max_grad_norm": 0.5,
  "minibatch_size": 64,
  "norm_adv": true,
  "num_envs": 16,
  "num_minibatches": 256,
  "num_steps": 1024,
  "package_dir": "hub_package",
  "repo_id": "ZuzEL/LunarLander-v2",
  "save_model_path": "checkpoints/ppo_lunarlander.pt",
  "seed": 3,
  "target_kl": null,
  "torch_deterministic": true,
  "total_timesteps": 1000000,
  "track": false,
  "update_epochs": 4,
  "upload_to_hub": true,
  "vf_coef": 0.5,
  "video_fps": 30,
  "wandb_entity": null,
  "wandb_project_name": "cleanRL"
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v2
self-reported

240.74 +/- 16.21