I fine-tuned a model for free from one prompt, with TRL and the Google Colab CLI

Published June 15, 2026

I opened a coding agent, wrote one prompt, and walked away. A couple of minutes later I had a fine-tuned model, trained on a free cloud GPU, with its metrics on a live trackio dashboard and its weights waiting for me on the Hub. I didn't touch a GPU, and I didn't write a line of the training code.

Last week, Google released the Colab CLI: full Colab runtimes you can drive from your terminal. It is a much-needed piece for the era of agents, and since I've always been a fan of Colab (it has helped me and so many others throughout my career), I had to test it. I first saw it through @osanseviero and @_philschmid.

The idea is simple: you tell your agent "fine-tune a model on this dataset" and it handles the rest, fully automatic. Google Colab provides the GPU, and the Hugging Face stack does the work: TRL trains, trackio tracks, and the Hub holds the dataset and the model. The agent just wires it together.

Here is the whole run, start to finish:

What just happened

I gave the agent a single prompt. From there, it did everything:

It read the SFT examples in the TRL repo to learn the conventions, then wrote its own training script for my task.
It provisioned a remote GPU through the brand-new Google Colab CLI.
It installed the dependencies, authenticated with Hugging Face, and launched QLoRA training with TRL.
It streamed live metrics to a trackio Space on the Hub.
It pushed the trained adapter to my Hugging Face account.
It tore the session down when it finished.

Nothing on my machine. No babysitting. No GPU.

The part I keep coming back to: I never handed it a script, and I never explained the Colab CLI. It learned the training conventions from TRL's examples and the commands from the CLI's built-in agent skill, then wrote and ran everything itself.

The prompt

This is all I typed. Everything after it was the agent:

You're in the TRL repo. Read the SFT examples in examples/scripts/ to learn the
project's conventions, then adapt them into a small, self-contained training script
for this task: fine-tune Qwen/Qwen2.5-0.5B-Instruct with QLoRA on
philschmid/gretel-synthetic-text-to-sql (format schema + question -> SQL as chat
messages). Run it on a remote Colab T4 via the Google Colab CLI: provision the GPU,
install deps, log in to Hugging Face on the runtime, run a short demo run, stream
metrics to a trackio Space, push the trained adapter to the Hub, and tear the session
down. Report the final loss and the model URL.

My favorite part is how little it takes to retarget. I change one part of the prompt, the model or the dataset, and the same recipe trains something completely different. I treat it as a template, not a one-off.

It cost me nothing

The whole run was on a free Colab T4. Qwen2.5-0.5B-Instruct is tiny, so a short QLoRA run finishes in a couple of minutes. And the setup is almost nothing, because Colab already ships PyTorch, transformers, datasets and the rest preinstalled. The agent only had to add the few missing pieces (TRL, trackio, and the 4-bit quantization library). What it wrote was a standard SFTTrainer setup with LoRA, nothing exotic.

And all of it ran on a tiny model, on free hardware, with nothing on my laptop.

I watched it train live

Because the agent wired up trackio, my run streamed live to a Hugging Face Space. I could open it in any browser and watch the loss curve update in real time while the GPU did the work somewhere else. When training finished, the Space stayed up as a record of the run.

The loss dropped steadily over the run. You can see the curve in the live Space: https://huggingface.co/spaces/sergiopaniego/trl-text-to-sql-trackio

And I actually kept the model: the agent pushed the trained adapter to my Hub account, so it is sitting there ready to use: https://huggingface.co/sergiopaniego/Qwen2.5-0.5B-Instruct-text-to-sql-qlora

It debugged itself

Partway through, the run hit a hardware quirk on the free T4 (the GPU does not support a precision mode the script first tried). It read the error, fixed the setting, and re-ran on its own, with no input from me. I was expecting it to handle edges like this, and it did. The model still came out the other end.

Try it yourself

Here is how you can do the same:

Install the Colab CLI: uv tool install google-colab-cli
Open your coding agent in a checkout of the TRL repo.
Paste the prompt above (swap in any model or dataset you like).
Watch it go.

Resources

Google Colab CLI: https://github.com/googlecolab/google-colab-cli
TRL: https://github.com/huggingface/trl
trackio: https://github.com/gradio-app/trackio
The fine-tuned model: https://huggingface.co/sergiopaniego/Qwen2.5-0.5B-Instruct-text-to-sql-qlora
The live metrics Space: https://huggingface.co/spaces/sergiopaniego/trl-text-to-sql-trackio
Related: fine-tuning with agents on Hugging Face Jobs: https://huggingface.co/blog/hf-skills-training and https://huggingface.co/blog/hf-skills-training-codex

Models mentioned in this article 1

Spaces mentioned in this article 1

Continuous batching for GRPO, now in TRL

June 19, 2026

Bringing Autonomous Driving RL to OpenEnv and TRL

February 26, 2026

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

I fine-tuned a model for free from one prompt, with TRL and the Google Colab CLI

What just happened

The prompt

It cost me nothing

I watched it train live

It debugged itself

Try it yourself

Resources

Models mentioned in this article 1

Spaces mentioned in this article 1

Trl Text To Sql Trackio

Continuous batching for GRPO, now in TRL

Bringing Autonomous Driving RL to OpenEnv and TRL

Community

Models mentioned in this article 1

Spaces mentioned in this article 1

Trl Text To Sql Trackio