Distilling the 13B SpaceLLaVA VLM-as-a-Judge into a Florence-2 model to efficiently quality filter spatialVQA datasets like OpenSpaces
Salma Mayorquin PRO
salma-remyx
AI & ML interests
None yet
Recent Activity
reacted to sergiopaniego's post with ๐ฅ about 4 hours ago
OpenEnv has a new home: github.com/huggingface/OpenEnv
Starting today, it's coordinated by a committee that includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face
frontier labs train their models and their harnesses together. Claude knows Claude Code. GPT-5.5 knows Codex. that's not an accident, it's training. open-source models deserve the same magic, but pulling that off requires infrastructure that belongs to everyone, not one lab
OpenEnv is that layer. one api, any harness, any trainer, any environment
Rewards and training loops stay in TRL, Unsloth, wherever you already work. OpenEnv is the socket they all plug into
Get involved!
Full announcement: https://huggingface.co/blog/openenv-agentic-rl reacted to pbhappliedsystems's post with ๐ฅ about 4 hours ago
๐ **New flagship dataset โ and an argument about what a dataset card should be.**
Most synthetic datasets on the Hub ship row counts, a license, and little else โ pipeline opaque, rejection criteria unstated, compliance unaudited. We published the opposite.
**SynthEval Cloud โ Regulated-Domain Synthetic Instruction Dataset**
๐ https://huggingface.co/datasets/pbhappliedsystems/syntheval-cloud-regulated-instruct-1k
**1,116** quality-gated instruction records across **7 regulated domains** (medical, legal, GDPR, privacy, education, e-commerce, transport). Every record cleared a documented cascade, not a vibe check:
- ๐งช **Dual-signal hallucination gate** โ rejects only when embedding cosine *and* keyword-overlap both fail; a low score alone never rejects.
- ๐ **Layered PII masking + independent leak audit** โ a separate over-reporting scanner found **0.0% residual leak** across all 1,116 records.
- ๐ **Whole-corpus evaluation, not a sample** โ MATTR **0.769**, mean cosine **0.73**, **0%** near-duplicates, **96.9%** yield.
- ๐งพ **The 36 rejections ship too**, each tagged with its failing gate. Removal at the gate is the product; we show our work.
Every number on the card is a field in the `evaluation_report.json` shipped beside the data โ full methodology + provenance (Mistral-Nemo AWQ W4A16 ยท vLLM 0.8.5.post1 ยท Modal A10G).
One release from **SynthEval**: Studio (local GPU) + Cloud (Modal+vLLM), proving quality parity across substrates.
๐ Whitepaper: https://pbhappliedsystems.com/SynthEval_Studio_and_Cloud_Quality-Gated_Synthetic_Data_Generation.pdf
๐ Overview: https://pbhappliedsystems.com/synthetic-data.html
**CC BY 4.0** โ commercial use welcome, just credit it. Need defensible synthetic data at scale? Let's talk.
โ Patrick Hill, PBH Applied Systems liked a model 3 days ago
remyxai/dockergen-0.5b