Instructions to use yafitzdev/pyrrho-nano-g4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yafitzdev/pyrrho-nano-g4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="yafitzdev/pyrrho-nano-g4")# Load model directly from transformers import AutoTokenizer, PyrrhoMultiTaskModernBert tokenizer = AutoTokenizer.from_pretrained("yafitzdev/pyrrho-nano-g4") model = PyrrhoMultiTaskModernBert.from_pretrained("yafitzdev/pyrrho-nano-g4") - Notebooks
- Google Colab
- Kaggle
pyrrho-nano-g4
pyrrho-nano-g4 is a small multitask RAG governance co-processor for anti-hallucination and retrieval-quality pipelines. It reads a user question plus retrieved source passages, then returns a calibrated evidence-state decision and auxiliary signals that fitz-sage can use before answer generation.
It is not an answer generator and not an open-world fact checker. It sits between
retrieval and generation, or beside a retrieval package as a fast evidence
quality layer. Compared with pyrrho-nano-g3.2, this package keeps the retrieval-control head surface but collapses answerability into the four V9 planning labels: direct_answer, synthesis_answer, set_answer, and structured_reasoning.
Governance Labels
| Label | Meaning |
|---|---|
ABSTAIN |
The retrieved sources do not contain enough evidence to answer the question. |
DISPUTED |
The retrieved sources conflict on the answer. |
TRUSTWORTHY |
The retrieved sources consistently support answering the question. |
Multitask Heads
| Head | Labels / values | Intended use |
|---|---|---|
governance |
ABSTAIN, DISPUTED, TRUSTWORTHY |
Post-retrieval evidence sufficiency and conflict decision. |
query_contract |
evidence_sufficiency, structured_lookup, temporal_grounding, exhaustive_coverage, comparison_coverage, representative_overview |
Pre-retrieval routing signal for what kind of evidence the query needs. |
route |
science_medicine, law_policy, history_geography, technology_computing, economics_finance, culture_society, general_commonsense |
Semantic route/domain signal for retrieval policy and logging. |
taxonomy |
23 fitz-gov taxonomy patterns | Failure/support pattern signal for audit and diagnostics. |
scalars |
evidence_sufficiency, query_evidence_alignment, answer_coverage, conflict_density, retrieval_retry_value, false_trustworthy_risk, evidence_failure_severity |
Continuous governance signals for retry, ranking, and monitoring. |
retrieval_action |
answer_now, retrieve_more, broaden_search, resolve_conflict, ask_clarifying_question, structured_lookup |
Retrieval policy hint for the next pipeline action. |
gap_type |
12 evidence-gap labels | More specific reason why retrieval is insufficient or conflicting. |
answerability_shape |
direct_answer, synthesis_answer, set_answer, structured_reasoning |
Query-only collapsed answer shape for retrieval planning. |
retrieval_modality |
unstructured_text, structured_table, code, configuration, log_trace, pdf_layout, mixed |
Query-only hint for the preferred retrieval substrate. |
Outputs
This is a custom multitask package, not a standard single-head
AutoModelForSequenceClassification artifact. The recommended runtime is
pyrrho.multitask_inference.PyrrhoMultiTaskPredictor from the pyrrho repository.
The predictor returns a structured object:
| Field | Meaning |
|---|---|
governance.final_label |
Final calibrated label after the TRUSTWORTHY threshold rule. |
governance.raw_label |
Highest-probability governance label before threshold calibration. |
governance.probabilities |
Probability distribution over ABSTAIN, DISPUTED, TRUSTWORTHY. |
governance.threshold |
TRUSTWORTHY probability threshold used by the package. |
query_contract.final_label |
Query-only contract prediction. |
route.final_label |
Query-only semantic route/domain prediction. |
taxonomy.final_label |
Query+evidence taxonomy-pattern prediction. |
scalars |
7 bounded scalar governance signals. |
retrieval_action.final_label |
Retrieval policy hint. |
gap_type.final_label |
Evidence-gap type prediction. |
answerability_shape.final_label |
Query-only answer-shape prediction. |
retrieval_modality.final_label |
Query-only retrieval-modality prediction. |
timing_ms |
Local inference timing for the call. |
Example normalized output shape:
{
"schema_version": "pyrrho_multitask_prediction_v1",
"governance": {
"raw_label": "TRUSTWORTHY",
"final_label": "TRUSTWORTHY",
"used_threshold_fallback": false,
"threshold": 0.48,
"confidence": 0.84,
"probabilities": {
"ABSTAIN": 0.08,
"DISPUTED": 0.08,
"TRUSTWORTHY": 0.84
}
},
"query_contract": {
"final_label": "structured_lookup"
},
"route": {
"final_label": "economics_finance"
},
"taxonomy": {
"final_label": "direct_answer"
},
"retrieval_action": {
"final_label": "answer_now"
},
"scalars": {
"evidence_sufficiency": 0.91,
"query_evidence_alignment": 0.88,
"answer_coverage": 0.86,
"conflict_density": 0.08,
"retrieval_retry_value": 0.12,
"false_trustworthy_risk": 0.09,
"evidence_failure_severity": 0.07
}
}
The model does not generate answers, citations, source spans, retrieval results,
or natural-language explanations. It classifies and scores the (query, retrieved_contexts) evidence state.
Intended Use
Use this model when a RAG or retrieval package needs fast local signals about:
- whether retrieved evidence is enough to answer,
- whether retrieved evidence conflicts,
- what kind of evidence the query needs before retrieval,
- which semantic/domain route the query belongs to,
- which fitz-gov support/failure pattern is active,
- what retrieval action and gap type the evidence state suggests,
- whether retrieval should retry, broaden, or escalate.
This model is not intended to write answers, verify facts outside the provided sources, replace a retriever, or replace human review in high-stakes settings.
Quick Start
Install the pyrrho package from the repository that contains this runtime, then load the package with the multitask predictor:
from huggingface_hub import snapshot_download
from pyrrho.multitask_inference import PyrrhoMultiTaskPredictor
MODEL_ID = "yafitzdev/pyrrho-nano-g4"
PACKAGE_DIR = snapshot_download(MODEL_ID)
query = "Which quarterly report is relevant?"
contexts = [
"The Q2 report lists revenue, churn, and roadmap changes.",
]
predictor = PyrrhoMultiTaskPredictor.from_pretrained(PACKAGE_DIR, device="cpu")
result = predictor.predict(query, contexts)
print(result["governance"]["final_label"])
print(result["query_contract"]["final_label"])
print(result["route"]["final_label"])
print(result["taxonomy"]["final_label"])
print(result["retrieval_action"]["final_label"])
print(result["gap_type"]["final_label"])
print(result["scalars"])
For local package testing:
python scripts/package_multitask_encoder.py verify --package-dir models/pyrrho-nano-g4 --device cpu
Release Selection
- Seed:
1337 - TRUSTWORTHY threshold:
0.48 - Selection reason: Seed 1337 was selected from the completed official fitz-gov V9.0.0 3-seed run because it has the strongest validation composite score and the best balanced auxiliary-head tradeoff.
Held-Out Test Metrics
| Metric | Result |
|---|---|
| Governance accuracy | 0.9742 |
| False-TRUSTWORTHY rate | 0.0119 |
| Query-contract accuracy | 0.8756 |
| Query-contract macro F1 | 0.8591 |
| Route accuracy | 0.9344 |
| Route macro F1 | 0.9330 |
| Taxonomy accuracy | 0.7804 |
| Taxonomy macro F1 | 0.7888 |
| Scalar MAE | 0.0687 |
| Retrieval-action macro F1 | 0.8618 |
| Gap-type macro F1 | 0.7055 |
| Answerability-shape macro F1 | 0.9511 |
| Retrieval-modality macro F1 | 0.5139 |
Three-seed headline from the local release summary:
| Metric | Mean +/- std |
|---|---|
| Governance accuracy | 97.46 +/- 0.09% |
| False-TRUSTWORTHY rate | 1.21 +/- 0.06% |
| Query-contract macro F1 | 86.00 +/- 0.23% |
| Route accuracy | 93.53 +/- 0.09% |
| Taxonomy accuracy | 77.95 +/- 0.25% |
| Scalar MAE | 0.0690 +/- 0.0003 |
| Retrieval-action macro F1 | 85.92 +/- 0.43% |
| Gap-type macro F1 | 69.85 +/- 1.00% |
| Answerability-shape macro F1 | 94.92 +/- 0.19% |
| Retrieval-modality macro F1 | 51.94 +/- 0.59% |
Training Data
Trained on the published fitz-gov V9.0.0 Hugging Face release with official query-grouped splits. Total prepared rows: 40,755 = 2,980 V6 rows + 7,520 V7 rows + 14,092 V8 rows + 16,163 V9 rows. Splits are train=32,625 / validation=4,104 / test=4,026. Split assignments come from v9/split_assignments.jsonl at dataset commit 874fd18d4952eec0e72b6df2264f8281615fd350. The release package records the local training config in
training_config.yaml and detailed metrics in reports/summary.json.
Limitations
- This is a governance and routing co-processor, not a generator.
- The auxiliary heads are useful signals, not ground-truth explanations.
- Query-contract and route predictions are query-only and can be wrong when the user query is underspecified.
- Taxonomy and scalar outputs are trained on fitz-gov labels/signals and should be treated as decision-support metadata, not universal factual judgments.
- The four-way answerability head is intentionally collapsed for fitz-sage integration; it does not expose the old eleven detailed answerability labels.
- Retrieval modality remains the weakest auxiliary head; sparse subclasses such as
pdf_layout,code, andconfigurationshould be treated as hints, not hard guarantees. - The license is CC BY-NC 4.0. Commercial use requires a separate license.
- Downloads last month
- 16
Model tree for yafitzdev/pyrrho-nano-g4
Base model
answerdotai/ModernBERT-base