PPO Agent Playing LunarLander-v2
This is a trained model of a PPO agent playing LunarLander-v2.
The agent was trained with Gymnasium on LunarLander-v3. The model card uses LunarLander-v2 because the Hugging Face Deep RL Course Unit 8 checker still looks for that environment tag.
Evaluation
- Mean reward: 240.74
- Standard deviation: 16.21
- Course score: 224.53
Hyperparameters
{
"anneal_lr": true,
"batch_size": 16384,
"capture_video": false,
"clip_coef": 0.2,
"clip_vloss": true,
"course_env_id": "LunarLander-v2",
"cuda": true,
"ent_coef": 0.01,
"env_id": "LunarLander-v3",
"eval_deterministic": true,
"eval_episodes": 10,
"eval_video_dir": null,
"evaluate": false,
"exp_name": "ppo_lunarlander",
"gae": true,
"gae_lambda": 0.98,
"gamma": 0.999,
"headless": false,
"learning_rate": 0.0003,
"load_model_path": "checkpoints/ppo_lunarlander.pt",
"logs_dir": "runs/LunarLander-v3__ppo_lunarlander__3__1779423167",
"max_grad_norm": 0.5,
"minibatch_size": 64,
"norm_adv": true,
"num_envs": 16,
"num_minibatches": 256,
"num_steps": 1024,
"package_dir": "hub_package",
"repo_id": "ZuzEL/LunarLander-v2",
"save_model_path": "checkpoints/ppo_lunarlander.pt",
"seed": 3,
"target_kl": null,
"torch_deterministic": true,
"total_timesteps": 1000000,
"track": false,
"update_epochs": 4,
"upload_to_hub": true,
"vf_coef": 0.5,
"video_fps": 30,
"wandb_entity": null,
"wandb_project_name": "cleanRL"
}
Evaluation results
- mean_reward on LunarLander-v2self-reported240.74 +/- 16.21