PPO experiments - a amang1802 Collection

amang1802 's Collections

ThinkTransformer experiments

Small model pretraining experiments

PPO experiments

Synthetic Data rewrite (model checkpoints)

Synthetic Data rewrite research (training and eval datasets)

WildeWeb Research

PPO experiments

updated Jan 23, 2025

Using PPO with simpler reward functions

amang1802/summary_train

Viewer • Updated Nov 21, 2024 • 1.28k • 6
amang1802/summary_train_med

Viewer • Updated Jun 6, 2025 • 18.4k • 10
amang1802/Llama3.2-1B-summary-length-1024-1ep

Text Generation • 1B • Updated Nov 21, 2024 • 3 •
amang1802/Llama3.2-1B-summary-length-exp2

Text Generation • 1B • Updated Nov 21, 2024 •
amang1802/Llama3.2-1B-summary-length-exp3

Text Generation • 1B • Updated Nov 21, 2024 • 2 •
amang1802/Llama3.2-1B-summary-length-exp4

Text Generation • 1B • Updated Nov 21, 2024 • 9 •
amang1802/Llama3.2-1B-summary-length-exp6

Text Generation • 1B • Updated Nov 25, 2024 • 7 •
amang1802/Llama3.2-1B-summary-length-exp7

Text Generation • 1B • Updated Nov 25, 2024 • 1 •