Wohoo 🥳 I have finished my 2025 GPU workstation build and I am very excited to train new awesome open source models on it.
I built my last GPU workstation 5 years ago featuring an AMD Ryzen 5900X, 64GB of G.SKILL Trident Z RGB on an ASRock X570 Taichi cooled by an Alphacool Eisbär 420. GPU was a Zotac RTX 3090 AMP Extreme. Unfortunately, I was never satisfied with the case - some Fractal Define 7, as it is definitely too small, airflow is not optimal as I had to open the front door all the time and it also arrived with a partly damaged side panel.
For my new build, I've used the following components: an outstanding new AMD Ryzen 9950X3D with 64GB of Corsair Dominator Titanium (what a name). As a huge Noctua fan - warm greetings to my Austrian neighbors - I am using the brand new Noctua NH-D15 G2 on an ASRock X870E Taichi in an amazing Lian Li LANCOOL III chassis. One joke that only NVIDIA Blackwell users will understand: you definitely need a tempered glass panel to check if your GPU cables/connectors start melting 😂 And the best is yet to come: I returned my previously bought Zotac RTX 5090 Solid to the eBay seller (because of... missing ROPs, only NVIDIA Blackwell users will again understand) and bought a Zotac 5090 AMP Extreme INFINITY (yes, the long name indicates that this is the flagship model from Zotac) from a more trustworthy source (NBB in Germany).
I am so happy to start training and fine-tuning new open source models - stay tuned!!!
I’ve fixed the Space and brought it back to life: - ✅ Working again after being broken for a while - ✅ Updated to Gradio 6 - ✅ Compatible with ZeroGPU - ✅ Output videos now preserve original resolution and FPS
I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).
What happens when you make an LLM drive a car where physics are real and actions can't be undone?
I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.
The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.
In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.
The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.
This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.
We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!
1️⃣ Q1 — Learning to Reason Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.
Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)
2️⃣ Q2 — Multimodality and Coding More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.
Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4
3️⃣ Q3 — "Gold" rush, OpenAI opens up, the community goes bananas Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.
Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5
4️⃣ Q4 — Mistral returns, leaderboard hill-climbing Mistral is back with updated model families. All labs release impressive models to wrap up the year!
Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🤯
deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️ > pretty insane it can parse and re-render charts in HTML > it uses CLIP and SAM features concatenated, so better grounding > very efficient per vision tokens/performance ratio > covers 100 languages
If you've ever trained a VLM, you know this problem: nobody shares their data mixtures. It's a black box, making replicating SOTA work impossible. We wanted to change that.
FineVision unifies 200 sources into 24 million samples. With 17.3 million images and 9.5 billion answer tokens, it's the largest open resource of its kind.
In the paper, we share how we built it: 🔍 finding and cleaning data at scale 🧹 removing excessive duplicates across sources 🤗 decontaminating against 66 public benchmarks
My favorite part is Figure 6 (in the video!). It's our visual diversity analysis. It shows that FineVision isn't just bigger; it's more balanced and conceptually richer than other open datasets. NVIDIA's Eagle 2 paper highlighted just how critical this visual diversity is, and our results confirm it: models trained on FineVision consistently outperform those trained on any other open dataset on 11 benchmarks!
🎉 To celebrate the paper, I’m also releasing a concatenated and shuffled version of the full dataset! 👉HuggingFaceM4/FineVision_full_shuffled
It’s ready to stream, so you can start training your own models right away:
from datasets import load_dataset d = load_dataset("HuggingFaceM4/FineVision_full_shuffled", split="train", streaming=True) print(next(iter(d)))
A big shoutout to the first authors: Luis Wiedmann and Orr Zohar. They are rockstars!
Qwen3-VL-4B is incredibly easy to fine-tune! We've trained the first DSE model based on this model, and it's already performing at the same level as Jina v4!
While Jina Embeddings v4 is built on Qwen2.5-VL-3B (which has a non-commercial license), our model is based on Qwen3-VL-4B and released under Apache 2.0—making it fully commercially permissive.
Robonine team we released an open-source 3D-printed parallel gripper designed for robotics applications, compatible with popular budget servos like Feetech STS3215 and Waveshare ST3215.
This precision gripper offers parallel jaw movement, real-time monitoring, and positioning accuracy of ±0.1°, making it ideal for both robotics enthusiasts and professionals. Complete build cost: Just $69.45–$74.45, with all components available for purchase on Amazon. Direct links are provided in the Bill of Materials on GitHub.
We encourage you to Watch, Fork, and Star the repository to support our open-source initiative and stay updated on future developments. Your feedback is also welcome!
🤗 Sentence Transformers is joining Hugging Face! 🤗 This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face! Details:
Today, the Ubiquitous Knowledge Processing (UKP) Lab is transferring the project to Hugging Face. Sentence Transformers will remain a community-driven, open-source project, with the same open-source license (Apache 2.0) as before. Contributions from researchers, developers, and enthusiasts are welcome and encouraged. The project will continue to prioritize transparency, collaboration, and broad accessibility.
We see an increasing wish from companies to move from large LLM APIs to local models for better control and privacy, reflected in the library's growth: in just the last 30 days, Sentence Transformer models have been downloaded >270 million times, second only to transformers.
I would like to thank the UKP Lab, and especially Nils Reimers and Iryna Gurevych, both for their dedication to the project and for their trust in myself, both now and two years ago. Back then, neither of you knew me well, yet you trusted me to take the project to new heights. That choice ended up being very valuable for the embedding & Information Retrieval community, and I think this choice of granting Hugging Face stewardship will be similarly successful.
I'm very excited about the future of the project, and for the world of embeddings and retrieval at large!
🚀 New blog: Maintain the unmaintainable – 1M+ Python LOC, 400+ models
How do you stop a million-line library built by thousands of contributors from collapsing under its own weight? At 🤗 Transformers, we do it with explicit software-engineering tenets, principles that make the codebase hackable at scale.
🔍 Inside the post: – One Model, One File: readability first — you can still open a modeling file and see the full logic, top to bottom. – Modular Transformers: visible inheritance that cuts maintenance cost by ~15× while keeping models readable. – Config-Driven Performance: FlashAttention, tensor parallelism, and attention scheduling are config-level features, not rewrites.
Written with @lysandre,@pcuenq and @yonigozlan, this is a deep dive into how Transformers stays fast, open, and maintainable.
So 🐋DeepSeek🐋 hits the mainstream media. But it has been a star in our little cult for at least 6 months. Its meteoric success is not overnight, but two years in the making.
* End of 2023, they launched the first model (pretrained by themselves) following Llama 2 architecture * June 2024, v2 (MoE architecture) surpassed Gemini 1.5, but behind Mistral * September, v2.5 surpassed GPT 4o mini * December, v3 surpassed GPT 4o * Now R1 surpassed o1
Most importantly, if you think DeepSeek success is singular and unrivaled, that's WRONG. The following models are also near or equal the o1 bar.
🚀 Real-Time On-Device AI Agent with Polaris-4B — Run It Yourself, No Cloud, No Cost
We just deployed a real-time on-device AI agent using the Polaris-4B-Preview model — one of the top-performing <6B open LLMs on Hugging Face.
📱 What’s remarkable? This model runs entirely on a mobile device, without cloud, and without any manual optimization. It was built using ZETIC.MLange, and the best part?
➡️ It’s totally automated, free to use, and anyone can do it. You don’t need to write deployment code, tweak backends, or touch device-specific SDKs. Just upload your model — and ZETIC.MLange handles the rest.
🧠 About the Model - Model: Polaris-4B-Preview - Size: ~4B parameters - Ranking: Top 3 on Hugging Face LLM Leaderboard (<6B) - Tokenizer: Token-incremental inference supported - Modifications: None — stock weights, just optimized for mobile
⚙️ What ZETIC.MLange Does ZETIC.MLange is a fully automated deployment framework for On-Device AI, built for AI engineers who want to focus on models — not infrastructure.
Here’s what it does in minutes: - 📊 Analyzes model structure - ⚙️ Converts to mobile-optimized format (e.g., GGUF, ONNX) - 📦 Generates a runnable runtime environment with pre/post-processing - 📱 Targets real mobile hardware (CPU, GPU, NPU — including Qualcomm, MediaTek, Apple) - 🎯 Gives you a downloadable SDK or mobile app component — ready to run And yes — this is available now, for free, at https://mlange.zetic.ai
🧪 For AI Engineers Like You, If you want to: - Test LLMs directly on-device - Run models offline with no latency - Avoid cloud GPU costs - Deploy to mobile without writing app-side inference code
Then this is your moment. You can do exactly what we did, using your own models — all in a few clicks.