Building "YourGymBuddy" – A Race Against Time, Modal Fine-Tuning, and the Power of Local AI

Community Article
Published June 15, 2026

YGB

1. Serendipity and the Last-Minute Sprint

Sometimes the best projects start by pure coincidence. I was hanging out with a friend, testing out the newly released Gemma 4 (12B). We were passing it raw gym logs and workout routines just to see if a model of that size could actually understand fitness progression and volume tracking.

Right in the middle of our testing, I stumbled upon the Build Small Hackathon. I hadn't known it was happening, and to my surprise, I realized the registration window was closing in just a few minutes. Seeing that our local LLM gym experiment perfectly matched the hackathon's "Build Small" philosophy, I scrambled to register before the deadline. That serendipitous discovery became the foundation of YourGymBuddy: a privacy-focused AI fitness coach built to run entirely on tiny architectures.

2. Choosing the Right Roster of Small Models

To maximize the potential of the project and explore different ecosystems (and sponsor tracks!), I designed the architecture around llama.cpp for broad GGUF compatibility. I selected four distinct models:

  1. Gemma 4 (12B): The original inspiration for the project, handling the heavier reasoning tasks.
  2. MiniCPM (1B) by OpenBMB: An ultra-lightweight, highly efficient model.
  3. Nemotron-3-Nano (4B) by NVIDIA: To test NVIDIA's compact edge capabilities.
  4. Cohere-Transcribe-03-2026 by Cohere: To transcribe voice to text and do the chat more interactive.

The goal was to build a responsive Gradio interface (complete with a custom logo and favicon) that worked seamlessly on both desktop and mobile, allowing users to easily upload their training histories from apps like Hevy and get actionable, AI-driven insights.

3. The Localization Challenge: Spanish and Synthetic Data

Since I am based in Colombia, it was non-negotiable that YourGymBuddy could interact fluently in Spanish. Out of the box, ultra-small models like MiniCPM (1B) and Nemotron (4B) often struggle with localized fitness jargon.

To fix this, I generated a custom synthetic dataset containing between 1,500 and 2,000 highly tailored fitness interactions in Spanish. This dataset—which is fully open and hosted on my Hugging Face profile (PedroRuizCode)—served as the core training material to teach these tiny architectures how to properly interpret workout volumes, rest intervals, and muscle group targeting in Spanish.

4. Automated Fine-Tuning in the Cloud with Modal and Optuna

While my initial model testing happens locally, the actual fine-tuning pipeline was executed entirely in the cloud, leveraging the compute credits provided by Modal.

To do this, I built an end-to-end training app that handles the entire process automatically. Before the actual fine-tuning begins, the system uses Optuna to perform automated hyperparameter optimization. Once the best parameters are identified, the pipeline fine-tunes the MiniCPM and Nemotron models on Modal's infrastructure. Finally, it exports the pure weights and converts them into GGUF format, making them immediately ready for llama.cpp or Ollama.

5. The Deployment Pivot: HF Spaces vs. The Local Ideal

No hackathon project is complete without a few deployment hurdles. Transitioning from development to the Hugging Face Space brought a significant roadblock: The Mamba Dependency Trap.

When trying to deploy the fine-tuned NVIDIA Nemotron model to the remote Space backend, I hit unresolvable environment conflicts with Mamba dependencies. Furthermore, running GGUF models purely on the Space's CPU was far too slow for a good user experience.

I had to pivot. For the live Hugging Face Space, I implemented ZeroGPU support to serve the Gemma 4 (12B) and the fine-tuned MiniCPM (1B) models with lightning-fast token generation, opting to remove Nemotron from the remote build entirely to ensure stability.

The Local Execution Advantage However, the live Space is just a showcase; the true ideal state for YourGymBuddy is local execution. By pulling the repository and running the included install_local_gpu.sh script on a local computer, you bypass all cloud dependency limits. Running it locally means you can seamlessly load up the Nemotron GGUF alongside the others, keeping your personal health data completely offline while enjoying zero-latency inference.

6. Current Features & What's Next

The current build features:

  • Seamless Data Ingestion: Users can drop their exported .csv data directly from the Hevy app (or use pre-loaded sample data to test it out).
  • Voice-to-Text Coach: You don't have to type between gym sets. The app includes an audio input feature that utilizes CohereLabs/cohere-transcribe-03-2026 for highly accurate speech-to-text transcription, letting you speak directly to your coach effortlessly.
  • HF Ecosystem Integration: Optional Hugging Face OAuth login for a personalized experience.

Moving forward, I plan to:

  • Expand data ingestion to parse exports from other major fitness apps.
  • Continue polishing the UI/UX for an even smoother mobile gym-companion experience.

Building YourGymBuddy was a chaotic, last-minute sprint that proved just how capable 1B to 12B models can be when paired with good synthetic data, smart cloud fine-tuning on Modal, and the raw power of local execution.

Community

Sign up or log in to comment