Submitted by Aswin Ravikumar Rangsasamy Veerasamy 3 Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models Arizona State University 2