I fine-tuned a model for free from one prompt, with TRL and the Google Colab CLI
Last week, Google released the Colab CLI: full Colab runtimes you can drive from your terminal. It is a much-needed piece for the era of agents, and since I've always been a fan of Colab (it has helped me and so many others throughout my career), I had to test it. I first saw it through @osanseviero and @_philschmid.
The idea is simple: you tell your agent "fine-tune a model on this dataset" and it handles the rest, fully automatic. Google Colab provides the GPU, and the Hugging Face stack does the work: TRL trains, trackio tracks, and the Hub holds the dataset and the model. The agent just wires it together.
Here is the whole run, start to finish:
What just happened
I gave the agent a single prompt. From there, it did everything:
- It read the SFT examples in the TRL repo to learn the conventions, then wrote its own training script for my task.
- It provisioned a remote GPU through the brand-new Google Colab CLI.
- It installed the dependencies, authenticated with Hugging Face, and launched QLoRA training with TRL.
- It streamed live metrics to a trackio Space on the Hub.
- It pushed the trained adapter to my Hugging Face account.
- It tore the session down when it finished.
Nothing on my machine. No babysitting. No GPU.
The part I keep coming back to: I never handed it a script, and I never explained the Colab CLI. It learned the training conventions from TRL's examples and the commands from the CLI's built-in agent skill, then wrote and ran everything itself.
The prompt
This is all I typed. Everything after it was the agent:
You're in the TRL repo. Read the SFT examples in examples/scripts/ to learn the
project's conventions, then adapt them into a small, self-contained training script
for this task: fine-tune Qwen/Qwen2.5-0.5B-Instruct with QLoRA on
philschmid/gretel-synthetic-text-to-sql (format schema + question -> SQL as chat
messages). Run it on a remote Colab T4 via the Google Colab CLI: provision the GPU,
install deps, log in to Hugging Face on the runtime, run a short demo run, stream
metrics to a trackio Space, push the trained adapter to the Hub, and tear the session
down. Report the final loss and the model URL.
My favorite part is how little it takes to retarget. I change one part of the prompt, the model or the dataset, and the same recipe trains something completely different. I treat it as a template, not a one-off.
It cost me nothing
The whole run was on a free Colab T4. Qwen2.5-0.5B-Instruct is tiny, so a short QLoRA run finishes in a couple of minutes. And the setup is almost nothing, because Colab already ships PyTorch, transformers, datasets and the rest preinstalled. The agent only had to add the few missing pieces (TRL, trackio, and the 4-bit quantization library). What it wrote was a standard SFTTrainer setup with LoRA, nothing exotic.
And all of it ran on a tiny model, on free hardware, with nothing on my laptop.
I watched it train live
Because the agent wired up trackio, my run streamed live to a Hugging Face Space. I could open it in any browser and watch the loss curve update in real time while the GPU did the work somewhere else. When training finished, the Space stayed up as a record of the run.
The loss dropped steadily over the run. You can see the curve in the live Space: https://huggingface.co/spaces/sergiopaniego/trl-text-to-sql-trackio
And I actually kept the model: the agent pushed the trained adapter to my Hub account, so it is sitting there ready to use: https://huggingface.co/sergiopaniego/Qwen2.5-0.5B-Instruct-text-to-sql-qlora
It debugged itself
Partway through, the run hit a hardware quirk on the free T4 (the GPU does not support a precision mode the script first tried). It read the error, fixed the setting, and re-ran on its own, with no input from me. I was expecting it to handle edges like this, and it did. The model still came out the other end.
Try it yourself
Here is how you can do the same:
- Install the Colab CLI:
uv tool install google-colab-cli - Open your coding agent in a checkout of the TRL repo.
- Paste the prompt above (swap in any model or dataset you like).
- Watch it go.
Resources
- Google Colab CLI: https://github.com/googlecolab/google-colab-cli
- TRL: https://github.com/huggingface/trl
- trackio: https://github.com/gradio-app/trackio
- The fine-tuned model: https://huggingface.co/sergiopaniego/Qwen2.5-0.5B-Instruct-text-to-sql-qlora
- The live metrics Space: https://huggingface.co/spaces/sergiopaniego/trl-text-to-sql-trackio
- Related: fine-tuning with agents on Hugging Face Jobs: https://huggingface.co/blog/hf-skills-training and https://huggingface.co/blog/hf-skills-training-codex
