ORCA6 v0.1-rc1

ORCA6 is an orchestration-advisor model project focused on AI tool selection, workflow architecture, RAG design, model-routing tradeoffs, and developer automation strategy.

Current best local adapter:

qwen3_14b_orca_refusal_smoke/

This checkpoint is a guarded local release candidate, not a broad public-quality model. It should be used with the source-packet runtime guard documented in the evaluation report, runbook, and local adapter CLI.

Intended Use

Recommend orchestration patterns for AI developer workflows.
Compare tools such as n8n, LangGraph, LiteLLM, Qdrant, Langfuse, Promptfoo, MCP servers, local inference stacks, and related infrastructure.
Provide architecture tradeoffs, implementation plans, and conservative next steps.

Not Intended For

Executing code or tools directly.
Legal, medical, financial, or safety-critical decisions.
General code-completion benchmarks unrelated to orchestration.

Training Data

SFT train rows: 41
SFT validation rows: 3
DPO preference rows: 30
Preference source: auto-graded bootstrap preferences
Corpus source: GitHub documentation chunks from the ORCA6 pilot retrieval set.
Final SFT source mix: {"graded_preference": 27, "grounded_sft_builder": 8, "refusal_sft_builder": 6}
Final answer word count: min=102, max=239, avg=187.05
Grounded SFT rows include retrieved-evidence citation examples and refusal hard negatives for empty evidence, unsupported claims, high-risk automation, medical-record access, and certification/compliance claims.

Training Dataset Audit

Dataset	Rows	Sources	Avg answer words
`data/training_sft.jsonl`	27	`{"graded_preference": 27}`	202.22
`data/validation_sft.jsonl`	3	`graded_preference`	213.0
`data/training_sft_plus_grounded.jsonl`	35	`{"graded_preference": 27, "grounded_sft_builder": 8}`	198.37
`data/refusal_sft_hard_negatives.jsonl`	6	`{"refusal_sft_builder": 6}`	121.0
`data/training_sft_plus_grounded_refusals.jsonl`	41	`{"graded_preference": 27, "grounded_sft_builder": 8, "refusal_sft_builder": 6}`	187.05

Models and Infrastructure Used

Role	Model / System	Notes
Initial smoke base	`Qwen/Qwen2.5-0.5B-Instruct`	Small-model SFT smoke path validation.
Attempted 14B base	`Qwen/Qwen2.5-14B-Instruct`	Initial fetch was too slow/stalled; not the release base.
Release base	`Qwen/Qwen3-14B`	Official adapter base and Hugging Face `base_model`.
Local downloaded base path	`.cache/orca6-qwen3-14b-download`	Local path used by current helper defaults; historical run used `/tmp/orca6-qwen3-14b-download`.
Preference/judge/bootstrap generation	`qwen3-coder:30b` via Ollama	Used for local answer generation / auto-graded bootstrap preference data.
Embedding model	`nomic-embed-text:latest` via Ollama	Used to embed pilot corpus chunks into Qdrant.
Fine-tuning stack	Unsloth + TRL SFTTrainer + PEFT LoRA / QLoRA	4-bit training profile on local RTX 3090.

Training Rounds

Round	Base model	Adapter output	Rows	Validation rows	Profile	Final loss	Eval / pass result
Small-model smoke	`Qwen/Qwen2.5-0.5B-Instruct`	`qwen_finetuned_v0_smoke2/`	27	3	smoke SFT	`eval_loss=1.762`, `train_loss=2.260`, 14 steps	Pipeline smoke only
Qwen3 14B smoke	`Qwen/Qwen3-14B`	`qwen3_14b_orca_smoke/`	27	3	14B LoRA smoke	`eval_loss=1.264`, `train_loss=1.537`, 14 steps	8 held-out prompts generated; quality not release-ready
Grounded fit-check	`Qwen/Qwen3-14B`	`qwen3_14b_orca_grounded_smoke/`	35	3	`seq_length=512`, `lora_r=16`, `lora_alpha=32`, `lora_dropout=0`, grad accum 2	`eval_loss=1.211`, `train_loss=1.592`, 36 steps	Grounded eval 11/12 = 91.7%
Refusal fit-check / release adapter	`Qwen/Qwen3-14B`	`qwen3_14b_orca_refusal_smoke/`	41	3	`seq_length=512`, `lora_r=16`, `lora_alpha=32`, `lora_dropout=0`, grad accum 2	`eval_loss=1.206`, `train_loss=1.510`, 42 steps	Guarded eval 12/12 = 100.0%; expanded release eval 54/54 = 100.0%

Earlier 2048-token/r64 and 1024-token/r32 grounded-profile attempts hit CUDA OOM with the longer grounded examples. The release fit-check profile settled on 512 tokens and LoRA rank 16 on the RTX 3090.

Retrieval and Vector Database Stack

The model was trained and evaluated around a source-packet workflow rather than free-form citation generation.

Component	Setting
Vector database used for active retrieval	Qdrant
Qdrant collection	`orca6_pilot`
Embedding model	`nomic-embed-text:latest` through Ollama
Embedding dimension	768
Vector distance	Cosine
Lexical retrieval	In-process BM25 over `data/pilot_orchestration_chunks.jsonl`
Rank fusion	Reciprocal-rank fusion plus exact-match/domain-cue boosts
Vector DBs represented in corpus/tool coverage	Qdrant, Chroma, Weaviate, pgvector
Other retrieval/RAG tools represented	LlamaIndex, Ragas, LangGraph, Langfuse, LangSmith, Promptfoo, LiteLLM, Ollama, llama.cpp, vLLM

Evaluation

Latest recorded retrieval metrics:

{
  "queries": 20,
  "calibrated_pass_at_1": 0.7,
  "calibrated_pass_at_3": 1.0,
  "calibrated_pass_at_5": 1.0,
  "hit_at_1": 1.0,
  "hit_at_3": 1.0,
  "hit_at_5": 1.0,
  "all_expected_at_3": 0.95,
  "all_expected_at_5": 1.0,
  "all_expected_at_10": 1.0
}

Latest guarded grounded-answer eval:

{
  "outputs": "evals/qwen3_14b_orca_refusal_guarded_eval_outputs.jsonl",
  "total": 12,
  "passed": 12,
  "pass_rate": 1.0,
  "by_type": {
    "source_packet": {
      "passed": 8,
      "total": 8
    },
    "hard_negative": {
      "passed": 4,
      "total": 4
    }
  }
}

Expanded v0.1-rc1 grounded release eval:

{
  "total": 54,
  "passed": 54,
  "pass_rate": 1.0,
  "by_type": {
    "source_packet": {
      "passed": 50,
      "total": 50
    },
    "hard_negative": {
      "passed": 4,
      "total": 4
    }
  },
  "outputs": "evals/qwen3_14b_orca_refusal_release_grounded_outputs.jsonl"
}

Evaluation Matrix

Evaluation	Passed	Total	Pass rate	Notes
Retrieval calibrated pass@1	14	20	70.0%	Smoke retrieval exact/semantic check
Retrieval calibrated pass@3	20	20	100.0%	Re-check matched recorded metrics
Retrieval calibrated pass@5	20	20	100.0%	Re-check matched recorded metrics
Retrieval all-expected@3	19	20	95.0%	Multi-expected query coverage
Retrieval all-expected@5	20	20	100.0%	Multi-expected query coverage
Grounded adapter, unguarded	11	12	91.7%	Pre-refusal grounded adapter; failed one empty-evidence hard negative
Refusal adapter, unguarded	11	12	91.7%	Still failed one empty-evidence high-risk citation case
Refusal adapter + runtime guard	12	12	100.0%	8/8 source-packet, 4/4 hard-negative
Expanded release grounded eval	54	54	100.0%	50/50 source-packet, 4/4 hard-negative

Unguarded refusal eval passed 11/12. The remaining unguarded failure was an empty-evidence, high-risk payment automation prompt where the model invented a source citation. The current gate therefore requires the runtime source-packet guard.

Artifact Statistics

Artifact	Path	Size / Count
Published adapter package	`adapter/` on Hugging Face	LoRA adapter, tokenizer, chat template, and config; merged shards/runs/checkpoints excluded from upload
Adapter weights	`qwen3_14b_orca_refusal_smoke/adapter_model.safetensors`	256,976,504 bytes
Local adapter tree	`qwen3_14b_orca_refusal_smoke`	27.77 GB including local merged model artifacts under the ignored working tree
Local merged model	`qwen3_14b_orca_refusal_smoke/merged`	27.52 GB; 6 safetensors shards
Local Q8_0 GGUF	`release/gguf/orca6-qwen3-14b-refusal-q8_0.gguf`	14.62 GB
Release manifest	`release/release_manifest.json`	126 tracked release artifacts
GitHub release candidate assets	`v0.1-rc1`	109/109 expected assets attached
Hugging Face model repo	`veroarc/ORCA6`	Adapter, tokenizer, model card, eval reports, release notes
Hugging Face feedback Space	`veroarc/orca6-feedback`	Manual feedback intake UI

Release artifact checksums are recorded in:

release/release_manifest.json

Limitations

The current dataset is small and should be treated as a v0 bootstrap.
Auto-graded preferences are useful for pipeline validation but should be replaced or supplemented with human preference labels.
Recommendations are only as current as the indexed source corpus.
The adapter is not intended for unguarded citation-heavy answering. Use a runtime prompt guard that forbids invented source IDs, URLs, integrations, certifications, guarantees, and high-risk actions without retrieved evidence.
The model must not execute tools or approve irreversible actions.

Release Notes

Generated: 2026-06-26
Version: v0.1-rc1
Base model target: Qwen/Qwen3-14B
Adapter target: qwen3_14b_orca_refusal_smoke/

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for veroarc/ORCA6

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Finetuned

(277)

this model