Sovereign-Professional-V1

Multi-domain GRPO hardening across three professional verticals using real benchmarks with verifiable rewards.

Data Sources (All Verified ✅)

Domain	Dataset	Size	Task Type
⚖️ Legal	nguha/legalbench	162 tasks (NeurIPS 2023)	Classification, QA
💊 Medical	GBaker/MedQA-USMLE-4-options	10K+ USMLE questions	4-choice MCQ
💰 Finance	PatronusAI/financebench	150 SEC 10-K QA	Numeric extraction
💰 Finance	TheFinAI/flare-headlines	Market classification	Binary classification
📊 Structured	Generated	JSON work products	RL-Struct 5-component

Weight	Reward	What it measures
0.45	Domain Correctness	MCQ letter match, numeric ±2%, classification exact
0.20	Structured Output	JSON validity + schema + types + content (RL-Struct)
0.20	Professional Quality	Domain terminology + evidence citation + structure
0.10	Reasoning Depth	Think tags + logical connectors
0.05	Length Penalty	DAPO soft overlong

python professional_hardening.py

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Finetuned

this model