Quotebound 27B
A 27B LoRA adapter for evidence-faithful reasoning over closed packets of source text.
Quotebound 27B is the standalone model release from the
Evidence-Faithful Reasoning project. It is trained to read a bounded evidence
packet, identify the supporting units, copy exact quotes, and abstain with
Insufficient evidence. when the packet does not justify an answer.
The project asks a stricter question than "did the model get the answer right?" It asks whether the answer is recoverably grounded in the supplied text.
On a fresh 36-task public holdout, Quotebound 27B improves task accuracy,
evidence F1, and quote F1 over the prior bridge model. The largest raw gain is
quote faithfulness: 0.3343 -> 0.6815.
Result snapshot
| Question | Answer |
|---|---|
| What ships here? | A PEFT/LoRA adapter for Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2. |
| What changed inside the model? | Raw quote F1 roughly doubled on the fresh public holdout: 0.3343 -> 0.6815. |
| Best standalone-system row on that holdout | Quotebound + deterministic_v3: task 0.8889, strict 0.5833, evidence F1 0.9093, quote F1 0.9093. |
| Output reliability | Zero invalid outputs across every reported evaluation surface. |
| Important boundary | Perfect probe_v0 belongs to the benchmark-winning hybrid stack, not to this adapter alone. |
Why this model exists
Reasoning-tuned models can sound structured while grounding badly: they may answer correctly but cite the wrong evidence, corrupt a quote, or keep going when the packet is actually insufficient.
Quotebound 27B is trained for a narrower, auditable behavior:
- choose the smallest sufficient evidence units,
- quote those units verbatim,
- answer only from those units,
- refuse cleanly when the packet runs out.
Correctness alone is not credited. The model is meant for settings where a user needs the answer and the support to survive inspection together.
Quick start
Install the usual Transformers + PEFT stack, then load the base model and attach the adapter:
pip install -U transformers peft accelerate bitsandbytes
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2"
adapter_id = "darcar0/quotebound-27b"
tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
base_id,
device_map="auto",
torch_dtype="auto",
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
The base is a 27B-parameter model. Use the quantization and serving setup your
hardware requires; 4-bit loading with bitsandbytes is a practical inference
path on constrained GPUs.
Model details
| Field | Value |
|---|---|
| Adapter | darcar0/quotebound-27b |
| Base model | Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 |
| Artifact type | LoRA / PEFT adapter |
| Primary behavior | Closed-packet grounded QA, claim verification, exact quote attribution, and abstention |
| Output style | JSON with answer, evidence IDs, verbatim quotes, and short justification |
| Training sources | Public FEVER-style verify-claim data, public HotpotQA-style grounded-QA data, and project-local packet scaffolding derived from those sources |
| License | Apache 2.0 |
Prompt contract
The model is trained for an evidence-first prompt that makes the answer subordinate to the cited text. A minimal version:
You are answering from a bounded evidence packet only.
Work in this order:
1. Identify the smallest set of packet units that matters.
2. Copy exact quote(s) from those units.
3. Only then give the final answer.
Rules:
- No outside facts.
- Return valid JSON only.
- Every quote must be a verbatim substring of the cited unit.
- Do not paraphrase, ellipsize, or stitch quotes.
- If the packet is insufficient, the answer field must be exactly
"Insufficient evidence."
Expected output shape:
{
"task_id": "<task id>",
"label": "support|contradict|insufficient|null",
"answer": "<one-sentence answer>",
"evidence_ids": ["unit_id_1", "unit_id_2"],
"quotes": [
{"unit_id": "unit_id_1", "quote": "<exact quote>"}
],
"justification": "<one short sentence tied to the cited evidence>"
}
Evaluation
Fresh 36-task mixed public holdout
The main standalone comparison uses a fresh 36-task public holdout: 18 FEVER
verify-claim tasks and 18 HotpotQA grounded-QA tasks. Source rows were
de-duplicated against training, dev, and probe_v0 rows.
| Stack | Task | Strict | Evidence F1 | Quote F1 |
|---|---|---|---|---|
| Bridge raw | 0.8611 | 0.2222 | 0.8815 | 0.3343 |
| Quotebound raw | 0.8889 | 0.4444 | 0.9093 | 0.6815 |
Bridge + deterministic_v3 |
0.8611 | 0.5833 | 0.8815 | 0.8815 |
Quotebound + deterministic_v3 |
0.8889 | 0.5833 | 0.9093 | 0.9093 |
How to read this table:
- Raw rows measure the model outputs before quote repair.
deterministic_v3rows add the packet-local quote normalizer from the project repository.- Quotebound improves task accuracy, evidence F1, and quote F1 in both raw and normalized form; it also ties normalized strict success.
- The largest model-side gain is raw quote faithfulness, from
0.3343to0.6815.
Fixed dev triage slice
| Stack | Task | Strict | Evidence F1 | Quote F1 |
|---|---|---|---|---|
Quotebound + deterministic_v3 |
1.0000 | 0.6190 | 0.8320 | 0.7095 |
Untouched 104-task HotpotQA shadow slice
On a 104-task HotpotQA shadow slice that was never touched during selection,
Quotebound raw improved quote-faithful behavior over the prior bridge model.
Quotebound plus deterministic_v3 matched bridge plus deterministic_v3 at
the system level. This surface is reported as a narrative parity result because
the freeze memo does not publish per-metric cells for it.
Release architecture
The project ends in two finished results that are intentionally reported separately:
| Result | What it is | What it proves |
|---|---|---|
| Quotebound 27B | The downloadable LoRA adapter on this page. | More of the evidence-faithful behavior moved into the model itself, with gains across non-probe_v0 surfaces. |
| Benchmark-winning hybrid stack | A trained bridge checkpoint plus the deterministic_v3 packet-local quote normalizer. |
The full system clears every gate of the strict contract on frozen held-out probe_v0. |
These are connected, but they are not the same claim. Quotebound 27B is the
standalone model release. The hybrid stack is the benchmark-facing winner.
Perfect probe_v0 belongs to the hybrid stack, not to this adapter alone.
Intended use
Use this release when answers must stay inside a fixed body of supplied text:
- bounded document QA with explicit evidence requirements,
- claim verification over closed packets of source text,
- policy, compliance, contract, and internal-document review where answers need source-text support,
- research on evidence-faithful reasoning, quote fidelity, and abstention.
Limitations
- This is not a general chatbot. Open-domain QA, open chat, and free-form generation outside the closed-packet setup are not characterized.
- The downloadable artifact is the LoRA adapter only; the 27B base model is required.
deterministic_v3is not shipped as part of this model repo. It is a separate packet-local post-processing step in the project repository.- Perfect
probe_v0belongs to the benchmark-winning hybrid stack, not to this adapter alone. - Raw item-level contents of the frozen held-out probe are intentionally not published; the held-out gate has to stay closed to remain meaningful.
- For high-stakes use, treat the model as an evidence-grounding component that still requires human review and application-specific validation.
Read next
- Technical note - full method, release boundary, and result discussion.
- Frozen benchmark progression chart
- Fresh holdout comparison chart
Citation
@misc{quotebound_27b_2026,
title = {Quotebound 27B: Evidence-Faithful Reasoning Standalone Release},
author = {{darcar0}},
year = {2026},
howpublished = {Hugging Face model release},
url = {https://huggingface.co/darcar0/quotebound-27b}
}
References
- Base model: Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2
- Datasets: fever/fever, hotpotqa/hotpot_qa
- Technical note: technical_note_evidence_faithful_reasoning.md
Model tree for darcar0/quotebound-27b
Base model
Qwen/Qwen3.5-27B