Instructions to use yafitzdev/pyrrho-nano-g5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yafitzdev/pyrrho-nano-g5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="yafitzdev/pyrrho-nano-g5")# Load model directly from transformers import AutoTokenizer, PyrrhoMultiTaskModernBert tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-nano-g5") model = PyrrhoMultiTaskModernBert.from_pretrained("yafitzdev/pyrrho-nano-g5") - Notebooks
- Google Colab
- Kaggle
pyrrho-nano-g5
pyrrho-nano-g5 is a small multitask RAG governance co-processor for anti-hallucination and retrieval-quality pipelines. It reads a user question plus retrieved source passages, then returns a calibrated evidence-state decision and auxiliary signals that fitz-sage can use before answer generation.
It is not an answer generator and not an open-world fact checker. It sits between
retrieval and generation, or beside a retrieval package as a fast evidence
quality layer. Compared with pyrrho-nano-g5-alpha, this package expands the V10 retrieval-obligation training block from the initial 5/5/5 set to the full target-10 fitz-gov V10.0.0 set and uses a completed 3-seed run.
Governance Labels
| Label | Meaning |
|---|---|
ABSTAIN |
The retrieved sources do not contain enough evidence to answer the question. |
DISPUTED |
The retrieved sources conflict on the answer. |
TRUSTWORTHY |
The retrieved sources consistently support answering the question. |
Multitask Heads
| Head | Labels / values | Intended use |
|---|---|---|
governance |
ABSTAIN, DISPUTED, TRUSTWORTHY |
Post-retrieval evidence sufficiency and conflict decision. |
query_contract |
evidence_sufficiency, structured_lookup, temporal_grounding, exhaustive_coverage, comparison_coverage, representative_overview |
Pre-retrieval routing signal for what kind of evidence the query needs. |
route |
science_medicine, law_policy, history_geography, technology_computing, economics_finance, culture_society, general_commonsense |
Semantic route/domain signal for retrieval policy and logging. |
taxonomy |
23 fitz-gov taxonomy patterns | Failure/support pattern signal for audit and diagnostics. |
scalars |
evidence_sufficiency, query_evidence_alignment, answer_coverage, conflict_density, retrieval_retry_value, false_trustworthy_risk, evidence_failure_severity |
Continuous governance signals for retry, ranking, and monitoring. |
retrieval_action |
answer_now, retrieve_more, broaden_search, resolve_conflict, ask_clarifying_question, structured_lookup |
Retrieval policy hint for the next pipeline action. |
gap_type |
12 evidence-gap labels | More specific reason why retrieval is insufficient or conflicting. |
answerability_shape |
direct_answer, synthesis_answer, set_answer, structured_reasoning |
Query-only collapsed answer shape for retrieval planning. |
retrieval_modality |
unstructured_text, structured_table, code, configuration, log_trace, pdf_layout, mixed |
Query-only hint for the preferred retrieval substrate. |
retrieval_obligation |
31 V10 obligation labels | Query-only target/closure obligation for corpus-aware retrieval planning. |
Outputs
This is a custom multitask package, not a standard single-head
AutoModelForSequenceClassification artifact. The recommended runtime is
pyrrho.multitask_inference.PyrrhoMultiTaskPredictor from the pyrrho repository.
The predictor returns a structured object:
| Field | Meaning |
|---|---|
governance.final_label |
Final calibrated label after the TRUSTWORTHY threshold rule. |
governance.raw_label |
Highest-probability governance label before threshold calibration. |
governance.probabilities |
Probability distribution over ABSTAIN, DISPUTED, TRUSTWORTHY. |
governance.threshold |
TRUSTWORTHY probability threshold used by the package. |
query_contract.final_label |
Query-only contract prediction. |
route.final_label |
Query-only semantic route/domain prediction. |
taxonomy.final_label |
Query+evidence taxonomy-pattern prediction. |
scalars |
7 bounded scalar governance signals. |
retrieval_action.final_label |
Retrieval policy hint. |
gap_type.final_label |
Evidence-gap type prediction. |
answerability_shape.final_label |
Query-only answer-shape prediction. |
retrieval_modality.final_label |
Query-only retrieval-modality prediction. |
retrieval_obligation.final_label |
Query-only retrieval-obligation prediction. |
timing_ms |
Local inference timing for the call. |
Example normalized output shape:
{
"schema_version": "pyrrho_multitask_prediction_v1",
"governance": {
"raw_label": "TRUSTWORTHY",
"final_label": "TRUSTWORTHY",
"used_threshold_fallback": false,
"threshold": 0.34,
"confidence": 0.84,
"probabilities": {
"ABSTAIN": 0.08,
"DISPUTED": 0.08,
"TRUSTWORTHY": 0.84
}
},
"query_contract": {
"final_label": "structured_lookup"
},
"route": {
"final_label": "economics_finance"
},
"taxonomy": {
"final_label": "direct_answer"
},
"retrieval_action": {
"final_label": "answer_now"
},
"scalars": {
"evidence_sufficiency": 0.91,
"query_evidence_alignment": 0.88,
"answer_coverage": 0.86,
"conflict_density": 0.08,
"retrieval_retry_value": 0.12,
"false_trustworthy_risk": 0.09,
"evidence_failure_severity": 0.07
}
}
The model does not generate answers, citations, source spans, retrieval results,
or natural-language explanations. It classifies and scores the (query, retrieved_contexts) evidence state.
Intended Use
Use this model when a RAG or retrieval package needs fast local signals about:
- whether retrieved evidence is enough to answer,
- whether retrieved evidence conflicts,
- what kind of evidence the query needs before retrieval,
- which semantic/domain route the query belongs to,
- which fitz-gov support/failure pattern is active,
- what retrieval action and gap type the evidence state suggests,
- whether retrieval should retry, broaden, or escalate.
This model is not intended to write answers, verify facts outside the provided sources, replace a retriever, or replace human review in high-stakes settings.
Quick Start
Install the pyrrho package from the repository that contains this runtime, then load the package with the multitask predictor:
from huggingface_hub import snapshot_download
from pyrrho.multitask_inference import PyrrhoMultiTaskPredictor
MODEL_ID = "yafitzdev/pyrrho-nano-g5"
PACKAGE_DIR = snapshot_download(MODEL_ID)
query = "Which quarterly report is relevant?"
contexts = [
"The Q2 report lists revenue, churn, and roadmap changes.",
]
predictor = PyrrhoMultiTaskPredictor.from_pretrained(PACKAGE_DIR, device="cpu")
result = predictor.predict(query, contexts)
print(result["governance"]["final_label"])
print(result["query_contract"]["final_label"])
print(result["route"]["final_label"])
print(result["taxonomy"]["final_label"])
print(result["retrieval_action"]["final_label"])
print(result["gap_type"]["final_label"])
print(result["retrieval_obligation"]["final_label"])
print(result["scalars"])
For local package testing:
python scripts/package_multitask_encoder.py verify --package-dir models/pyrrho-nano-g5 --device cpu
Release Selection
- Seed:
7 - TRUSTWORTHY threshold:
0.34 - Selection reason: Seed 7 was selected because it had the strongest held-out retrieval-obligation macro F1 while staying inside the governance release gates.
Held-Out Test Metrics
| Metric | Result |
|---|---|
| Governance accuracy | 0.9744 |
| False-TRUSTWORTHY rate | 0.0129 |
| Query-contract accuracy | 0.8875 |
| Query-contract macro F1 | 0.8624 |
| Route accuracy | 0.9346 |
| Route macro F1 | 0.9340 |
| Taxonomy accuracy | 0.7823 |
| Taxonomy macro F1 | 0.7929 |
| Scalar MAE | 0.0691 |
| Retrieval-action macro F1 | 0.8680 |
| Gap-type macro F1 | 0.8333 |
| Answerability-shape macro F1 | 0.9287 |
| Retrieval-modality macro F1 | 0.8601 |
| Retrieval-obligation macro F1 | 0.8313 |
Three-seed headline from the local release summary:
| Metric | Mean +/- std |
|---|---|
| Governance accuracy | 97.39 +/- 0.13% |
| False-TRUSTWORTHY rate | 1.09 +/- 0.25% |
| Query-contract macro F1 | 86.27 +/- 0.14% |
| Route accuracy | 93.65 +/- 0.14% |
| Taxonomy accuracy | 78.42 +/- 0.22% |
| Scalar MAE | 0.0689 +/- 0.0002 |
| Retrieval-action macro F1 | 86.67 +/- 0.18% |
| Gap-type macro F1 | 83.47 +/- 0.18% |
| Answerability-shape macro F1 | 93.27 +/- 0.32% |
| Retrieval-modality macro F1 | 85.98 +/- 0.10% |
| Retrieval-obligation macro F1 | 82.48 +/- 0.47% |
Training Data
Trained on the fitz-gov V10.0.0 row set: published V9.0.0 plus 12,748 V10 target-10 retrieval-planning rows. The V10 block combines 6,058 initial 5/5/5 rows with 6,690 target-10 continuation rows; after repair, blind-label QA scored 12,748/12,748 agreement with 0 triage, 0 missing, 0 invalid, and 0 error rows. The g5 training prep used local splits train=42,826 / validation=5,372 / test=5,305 over the same 53,503 rows; the public fitz-gov V10.0.0 dataset split manifest is train=42,814 / validation=5,346 / test=5,343. The release package records the local training config in
training_config.yaml and detailed metrics in reports/summary.json.
Limitations
- This is a governance and routing co-processor, not a generator.
- The auxiliary heads are useful signals, not ground-truth explanations.
- Query-contract and route predictions are query-only and can be wrong when the user query is underspecified.
- Taxonomy and scalar outputs are trained on fitz-gov labels/signals and should be treated as decision-support metadata, not universal factual judgments.
- The retrieval-obligation head is trained only on the V10 rows; V6-V9 rows are masked for that head.
- Retrieval obligation is the main remaining weak head. Low-support or fine-grained obligations such as row-key lookup, column-value lookup, stale-row versioning, and mixed-modality obligations should be treated as planning hints, not hard guarantees.
- The license is CC BY-NC 4.0. Commercial use requires a separate license.
- Downloads last month
- 28
Model tree for yafitzdev/pyrrho-nano-g5
Base model
answerdotai/ModernBERT-base