pyrrho-nano-g5

pyrrho-nano-g5 is a small multitask RAG governance co-processor for anti-hallucination and retrieval-quality pipelines. It reads a user question plus retrieved source passages, then returns a calibrated evidence-state decision and auxiliary signals that fitz-sage can use before answer generation.

It is not an answer generator and not an open-world fact checker. It sits between retrieval and generation, or beside a retrieval package as a fast evidence quality layer. Compared with pyrrho-nano-g5-alpha, this package expands the V10 retrieval-obligation training block from the initial 5/5/5 set to the full target-10 fitz-gov V10.0.0 set and uses a completed 3-seed run.

Governance Labels

Label Meaning
ABSTAIN The retrieved sources do not contain enough evidence to answer the question.
DISPUTED The retrieved sources conflict on the answer.
TRUSTWORTHY The retrieved sources consistently support answering the question.

Multitask Heads

Head Labels / values Intended use
governance ABSTAIN, DISPUTED, TRUSTWORTHY Post-retrieval evidence sufficiency and conflict decision.
query_contract evidence_sufficiency, structured_lookup, temporal_grounding, exhaustive_coverage, comparison_coverage, representative_overview Pre-retrieval routing signal for what kind of evidence the query needs.
route science_medicine, law_policy, history_geography, technology_computing, economics_finance, culture_society, general_commonsense Semantic route/domain signal for retrieval policy and logging.
taxonomy 23 fitz-gov taxonomy patterns Failure/support pattern signal for audit and diagnostics.
scalars evidence_sufficiency, query_evidence_alignment, answer_coverage, conflict_density, retrieval_retry_value, false_trustworthy_risk, evidence_failure_severity Continuous governance signals for retry, ranking, and monitoring.
retrieval_action answer_now, retrieve_more, broaden_search, resolve_conflict, ask_clarifying_question, structured_lookup Retrieval policy hint for the next pipeline action.
gap_type 12 evidence-gap labels More specific reason why retrieval is insufficient or conflicting.
answerability_shape direct_answer, synthesis_answer, set_answer, structured_reasoning Query-only collapsed answer shape for retrieval planning.
retrieval_modality unstructured_text, structured_table, code, configuration, log_trace, pdf_layout, mixed Query-only hint for the preferred retrieval substrate.
retrieval_obligation 31 V10 obligation labels Query-only target/closure obligation for corpus-aware retrieval planning.

Outputs

This is a custom multitask package, not a standard single-head AutoModelForSequenceClassification artifact. The recommended runtime is pyrrho.multitask_inference.PyrrhoMultiTaskPredictor from the pyrrho repository.

The predictor returns a structured object:

Field Meaning
governance.final_label Final calibrated label after the TRUSTWORTHY threshold rule.
governance.raw_label Highest-probability governance label before threshold calibration.
governance.probabilities Probability distribution over ABSTAIN, DISPUTED, TRUSTWORTHY.
governance.threshold TRUSTWORTHY probability threshold used by the package.
query_contract.final_label Query-only contract prediction.
route.final_label Query-only semantic route/domain prediction.
taxonomy.final_label Query+evidence taxonomy-pattern prediction.
scalars 7 bounded scalar governance signals.
retrieval_action.final_label Retrieval policy hint.
gap_type.final_label Evidence-gap type prediction.
answerability_shape.final_label Query-only answer-shape prediction.
retrieval_modality.final_label Query-only retrieval-modality prediction.
retrieval_obligation.final_label Query-only retrieval-obligation prediction.
timing_ms Local inference timing for the call.

Example normalized output shape:

{
  "schema_version": "pyrrho_multitask_prediction_v1",
  "governance": {
    "raw_label": "TRUSTWORTHY",
    "final_label": "TRUSTWORTHY",
    "used_threshold_fallback": false,
    "threshold": 0.34,
    "confidence": 0.84,
    "probabilities": {
      "ABSTAIN": 0.08,
      "DISPUTED": 0.08,
      "TRUSTWORTHY": 0.84
    }
  },
  "query_contract": {
    "final_label": "structured_lookup"
  },
  "route": {
    "final_label": "economics_finance"
  },
  "taxonomy": {
    "final_label": "direct_answer"
  },
  "retrieval_action": {
    "final_label": "answer_now"
  },
  "scalars": {
    "evidence_sufficiency": 0.91,
    "query_evidence_alignment": 0.88,
    "answer_coverage": 0.86,
    "conflict_density": 0.08,
    "retrieval_retry_value": 0.12,
    "false_trustworthy_risk": 0.09,
    "evidence_failure_severity": 0.07
  }
}

The model does not generate answers, citations, source spans, retrieval results, or natural-language explanations. It classifies and scores the (query, retrieved_contexts) evidence state.

Intended Use

Use this model when a RAG or retrieval package needs fast local signals about:

  • whether retrieved evidence is enough to answer,
  • whether retrieved evidence conflicts,
  • what kind of evidence the query needs before retrieval,
  • which semantic/domain route the query belongs to,
  • which fitz-gov support/failure pattern is active,
  • what retrieval action and gap type the evidence state suggests,
  • whether retrieval should retry, broaden, or escalate.

This model is not intended to write answers, verify facts outside the provided sources, replace a retriever, or replace human review in high-stakes settings.

Quick Start

Install the pyrrho package from the repository that contains this runtime, then load the package with the multitask predictor:

from huggingface_hub import snapshot_download

from pyrrho.multitask_inference import PyrrhoMultiTaskPredictor

MODEL_ID = "yafitzdev/pyrrho-nano-g5"
PACKAGE_DIR = snapshot_download(MODEL_ID)

query = "Which quarterly report is relevant?"
contexts = [
    "The Q2 report lists revenue, churn, and roadmap changes.",
]

predictor = PyrrhoMultiTaskPredictor.from_pretrained(PACKAGE_DIR, device="cpu")
result = predictor.predict(query, contexts)

print(result["governance"]["final_label"])
print(result["query_contract"]["final_label"])
print(result["route"]["final_label"])
print(result["taxonomy"]["final_label"])
print(result["retrieval_action"]["final_label"])
print(result["gap_type"]["final_label"])
print(result["retrieval_obligation"]["final_label"])
print(result["scalars"])

For local package testing:

python scripts/package_multitask_encoder.py verify --package-dir models/pyrrho-nano-g5 --device cpu

Release Selection

  • Seed: 7
  • TRUSTWORTHY threshold: 0.34
  • Selection reason: Seed 7 was selected because it had the strongest held-out retrieval-obligation macro F1 while staying inside the governance release gates.

Held-Out Test Metrics

Metric Result
Governance accuracy 0.9744
False-TRUSTWORTHY rate 0.0129
Query-contract accuracy 0.8875
Query-contract macro F1 0.8624
Route accuracy 0.9346
Route macro F1 0.9340
Taxonomy accuracy 0.7823
Taxonomy macro F1 0.7929
Scalar MAE 0.0691
Retrieval-action macro F1 0.8680
Gap-type macro F1 0.8333
Answerability-shape macro F1 0.9287
Retrieval-modality macro F1 0.8601
Retrieval-obligation macro F1 0.8313

Three-seed headline from the local release summary:

Metric Mean +/- std
Governance accuracy 97.39 +/- 0.13%
False-TRUSTWORTHY rate 1.09 +/- 0.25%
Query-contract macro F1 86.27 +/- 0.14%
Route accuracy 93.65 +/- 0.14%
Taxonomy accuracy 78.42 +/- 0.22%
Scalar MAE 0.0689 +/- 0.0002
Retrieval-action macro F1 86.67 +/- 0.18%
Gap-type macro F1 83.47 +/- 0.18%
Answerability-shape macro F1 93.27 +/- 0.32%
Retrieval-modality macro F1 85.98 +/- 0.10%
Retrieval-obligation macro F1 82.48 +/- 0.47%

Training Data

Trained on the fitz-gov V10.0.0 row set: published V9.0.0 plus 12,748 V10 target-10 retrieval-planning rows. The V10 block combines 6,058 initial 5/5/5 rows with 6,690 target-10 continuation rows; after repair, blind-label QA scored 12,748/12,748 agreement with 0 triage, 0 missing, 0 invalid, and 0 error rows. The g5 training prep used local splits train=42,826 / validation=5,372 / test=5,305 over the same 53,503 rows; the public fitz-gov V10.0.0 dataset split manifest is train=42,814 / validation=5,346 / test=5,343. The release package records the local training config in training_config.yaml and detailed metrics in reports/summary.json.

Limitations

  • This is a governance and routing co-processor, not a generator.
  • The auxiliary heads are useful signals, not ground-truth explanations.
  • Query-contract and route predictions are query-only and can be wrong when the user query is underspecified.
  • Taxonomy and scalar outputs are trained on fitz-gov labels/signals and should be treated as decision-support metadata, not universal factual judgments.
  • The retrieval-obligation head is trained only on the V10 rows; V6-V9 rows are masked for that head.
  • Retrieval obligation is the main remaining weak head. Low-support or fine-grained obligations such as row-key lookup, column-value lookup, stale-row versioning, and mixed-modality obligations should be treated as planning hints, not hard guarantees.
  • The license is CC BY-NC 4.0. Commercial use requires a separate license.
Downloads last month
28
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yafitzdev/pyrrho-nano-g5

Finetuned
(1336)
this model

Dataset used to train yafitzdev/pyrrho-nano-g5

Collection including yafitzdev/pyrrho-nano-g5