Instructions to use sammiset/finops-resolver with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sammiset/finops-resolver with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sammiset/finops-resolver",
	filename="models_gguf/qwen3-8b.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use sammiset/finops-resolver with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sammiset/finops-resolver:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sammiset/finops-resolver:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf sammiset/finops-resolver:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf sammiset/finops-resolver:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sammiset/finops-resolver:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf sammiset/finops-resolver:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sammiset/finops-resolver:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sammiset/finops-resolver:Q4_K_M

Use Docker

docker model run hf.co/sammiset/finops-resolver:Q4_K_M

LM Studio
Jan

vLLM

How to use sammiset/finops-resolver with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sammiset/finops-resolver"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sammiset/finops-resolver",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sammiset/finops-resolver:Q4_K_M

Ollama
How to use sammiset/finops-resolver with Ollama:
```
ollama run hf.co/sammiset/finops-resolver:Q4_K_M
```

Unsloth Studio new

How to use sammiset/finops-resolver with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sammiset/finops-resolver to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sammiset/finops-resolver to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sammiset/finops-resolver to start chatting

Pi new

How to use sammiset/finops-resolver with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf sammiset/finops-resolver:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "sammiset/finops-resolver:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use sammiset/finops-resolver with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf sammiset/finops-resolver:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default sammiset/finops-resolver:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use sammiset/finops-resolver with Docker Model Runner:
```
docker model run hf.co/sammiset/finops-resolver:Q4_K_M
```

Lemonade

How to use sammiset/finops-resolver with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull sammiset/finops-resolver:Q4_K_M

Run and chat with the model

lemonade run user.finops-resolver-Q4_K_M

List all available models

lemonade list

finops-resolver

Fine-tuned Qwen3-8B model that recommends ordered resolution sequences for post-trade CNS settlement fails. Given a triage classification, inventory snapshot, and pending FTR list, the model produces a step-by-step resolution plan with mathematical coverage tracking and a plain-English narrative.

This is Stage 2 in a two-model pipeline:

Stage	Model	Task	GGUF Size
1	finops-triage (Qwen3.5-9B)	Classify, score, and route fails	~5.7 GB
2	finops-resolver (Qwen3-8B)	Recommend resolution sequence	~5.0 GB

All training data is synthetic. All resolution logic is grounded in a curated knowledge base of post-trade settlement rules.

Architecture

Base model: unsloth/Qwen3-8B (text-only, no VL overhead)

Fine-tuning method: QLoRA via Unsloth + trl SFTTrainer

Parameter	Value
LoRA rank (r)	16
LoRA alpha	32
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
LoRA dropout	0
Quantization	4-bit NF4 (double quant)
Gradient checkpointing	Unsloth optimized
Max sequence length	5,120 tokens
Effective batch size	8 (2 per device x 4 accumulation)
Epochs	3
Learning rate	2e-4 (cosine schedule, 90 warmup steps)
Precision	bf16
Packing	Enabled (SFTTrainer)

Training data: 8,000 train / 2,000 eval examples in ChatML format. Each example includes a structured <think>...</think> reasoning trace in the assistant turn, followed by the resolution JSON. The thinking trace walks through the full resolution logic: problem statement, inventory check, FTR ranking, chase walkthrough, fallback reasoning, Reg SHO timeline check, gridlock evaluation, and escalation decision.

Trainable parameters: 43.6M of 8.2B (0.53%)

Input Schema

The model expects a JSON object containing the triage output from Stage 1, the current inventory snapshot, pending FTRs, and any related fails.

{
  "triage": {
    "category": "CNS_FAIL",
    "cns_direction": "FTD",
    "lifecycle_state": "ESCALATED",
    "priority_score": 88.2,
    "priority_tier": "CRITICAL",
    "score_components": {
      "age": 18.0,
      "value": 12.5,
      "regulatory": 28.0,
      "counterparty": 3.0
    },
    "reason": "CNS FTD 32000 shs, threshold security, high priority",
    "action": "LOCATE_AND_DELIVER",
    "escalation_level": "L3",
    "deadline": "T+13",
    "flags": ["THRESHOLD_SECURITY", "HIGH_VALUE"]
  },
  "cusip": "594918104",
  "ftd_qty": 32000,
  "inventory": {
    "box_qty": 8000,
    "stock_loan_available": 3000,
    "recall_outstanding": 5000,
    "pending_receives": 2000
  },
  "ftrs": [
    {
      "dtc": "DTC-0161",
      "qty": 12000,
      "age_days": 3,
      "settlement_type": "RVP",
      "settlement_date": "T+1",
      "cp_fail_rate_pct": 2.1,
      "partial_delivery_history": true
    }
  ],
  "related_fails": [
    {
      "category": "DVP_FAIL",
      "dtc": "DTC-0352",
      "qty": 5000,
      "side": "Sell",
      "age_days": 2
    }
  ]
}

Input Field Reference

triage (from Stage 1 finops-triage model):

Field	Type	Description
`category`	string	Fail classification (`CNS_FAIL`)
`cns_direction`	string	`FTD` (fail to deliver) or `FTR` (fail to receive)
`lifecycle_state`	string	`NEW`, `OPEN`, `ESCALATED`, `AGED`
`priority_score`	float	0-100, weighted composite of age/value/regulatory/CP factors
`priority_tier`	string	`LOW` (0-25), `MEDIUM` (26-50), `HIGH` (51-75), `CRITICAL` (76-100)
`score_components`	object	Breakdown: age (30%), value (25%), regulatory (35%), counterparty (10%)
`reason`	string	Human-readable triage summary
`action`	string	Recommended action class from triage
`escalation_level`	string	`L1`, `L2`, `L3`
`deadline`	string	Reg SHO close-out deadline (e.g., `T+13` standard, `T+6` threshold)
`flags`	array	Risk flags: `THRESHOLD_SECURITY`, `HIGH_VALUE`, `REG_SHO_CLOSE_OUT`, `AGED_FAIL`, `LARGE_POSITION`, `ILLIQUID`

inventory:

Field	Type	Description
`box_qty`	int	Unencumbered settled long positions (ex-SEG)
`stock_loan_available`	int	External borrow available
`recall_outstanding`	int	Stock lent that can be recalled
`pending_receives`	int	Expected inbound (T+1 receives, pending recalls)

ftrs[]:

Field	Type	Description
`dtc`	string	DTC participant number of counterparty
`qty`	int	FTR share quantity
`age_days`	int	Days since original settlement date
`settlement_type`	string	`RVP` (receive vs payment), `DVP` (deliver vs payment), `FOP` (free of payment)
`settlement_date`	string	Expected settlement date
`cp_fail_rate_pct`	float	Counterparty 15-day rolling fail rate
`partial_delivery_history`	bool	Whether this counterparty has delivered partials before

related_fails[]:

Field	Type	Description
`category`	string	`DVP_FAIL`, `CNS_FAIL`, `DEPOT_FAIL`
`dtc`	string	Counterparty DTC number
`qty`	int	Fail quantity
`side`	string	`Buy` or `Sell`
`age_days`	int	Fail age

Output Schema

{
  "resolution_steps": [
    {
      "step": 1,
      "action": "CHASE_FTR",
      "dtc": "DTC-0161",
      "qty": 12000,
      "settlement_type": "RVP",
      "settlement_date": "T+1",
      "rationale": "Age 3d 12000sh",
      "coverage_after_step_pct": 37.5,
      "remaining_short": 20000
    }
  ],
  "additional_ftrs_chased": 0,
  "fallback_strategy": "INITIATE_RECALL",
  "fallback_qty": 5000,
  "secondary_fallback": "SOURCE_BORROW",
  "secondary_fallback_qty": 3000,
  "total_coverable": 28000,
  "residual_short": 4000,
  "residual_action": "ESCALATE",
  "gridlock_detected": false,
  "gridlock_parties": [],
  "escalation_required": true,
  "escalation_reason": "Residual 4000 shs uncoverable",
  "narrative": "Chase DTC-0161 RVP T+1 for 12000 shs. INITIATE_RECALL 5000, SOURCE_BORROW 3000, escalate residual 4000 shs."
}

Output Field Reference

Field	Type	Description
`resolution_steps[]`	array	Ordered sequence of resolution actions
`resolution_steps[].step`	int	1-indexed step number
`resolution_steps[].action`	string	Action enum value (see below)
`resolution_steps[].dtc`	string	Target counterparty DTC number
`resolution_steps[].qty`	int	Share quantity for this step
`resolution_steps[].settlement_type`	string	`RVP`, `DVP`, or `FOP`
`resolution_steps[].settlement_date`	string	Expected settlement date
`resolution_steps[].rationale`	string	Why this step was prioritized
`resolution_steps[].coverage_after_step_pct`	float	Cumulative coverage percentage after this step
`resolution_steps[].remaining_short`	int	Shares still uncovered after this step
`additional_ftrs_chased`	int	FTR chases beyond the first 10 (truncated for token efficiency)
`fallback_strategy`	string	Primary fallback action after FTR chasing
`fallback_qty`	int	Shares covered by primary fallback
`secondary_fallback`	string	Secondary fallback action
`secondary_fallback_qty`	int	Shares covered by secondary fallback
`total_coverable`	int	Sum of all sources (FTRs + fallbacks)
`residual_short`	int	`ftd_qty - total_coverable` (floor 0)
`residual_action`	string	`ESCALATE` if residual > 0, else `NONE`
`gridlock_detected`	bool	Whether gridlock signals triggered
`gridlock_parties`	array	DTC numbers of parties in the gridlock chain
`escalation_required`	bool	True when residual_short > 0 after all steps
`escalation_reason`	string	Human-readable escalation reason (empty if not required)
`narrative`	string	Plain-English resolution summary

Mathematical Invariants

These hold for every valid output:

coverage_after_step_pct = (cumulative_qty_covered / ftd_qty) * 100 at each step
remaining_short decreases monotonically across steps
total_coverable = sum of all step quantities + fallback_qty + secondary_fallback_qty
residual_short = max(0, ftd_qty - total_coverable)
escalation_required = (residual_short > 0)

Action Enum

Action	Description
`CHASE_FTR`	Contact counterparty to push FTR delivery
`OFFSET_FTR`	Apply FTR directly against FTD at CNS
`APPLY_BOX`	Use free float inventory (unencumbered, ex-SEG)
`INITIATE_RECALL`	Recall stock lent via executing broker relationship
`SOURCE_BORROW`	Obtain stock loan externally (last resort)
`PARTIAL_DELIVER`	Proactively deliver partial quantity to CNS
`DEPOT_MOVEMENT`	Move shares from local market to DTC (ADR/cross-market)
`NET_GRIDLOCK`	Propose bilateral or tri-party net settlement
`BUY_IN_NOTICE`	Formal buy-in threat (B2B escalation)
`SPO_SETTLEMENT`	Cash settle free of payment (CA event)
`ESCALATE`	Residual requires human intervention

Resolution Logic

All resolution logic is sourced from a curated knowledge base of post-trade settlement rules. The model does not invent rules.

CNS FTD Waterfall

The core resolution follows a strict priority waterfall — exhaust cheaper/faster options before escalating:

1. Chase all FTRs (oldest first, largest as tiebreaker)
   ↓ residual?
2. Apply free float box (unencumbered, ex-SEG only)
   ↓ residual?
3. Initiate recall (cheaper than borrow, always attempt first)
   ↓ residual?
4. Source stock loan (external borrow, last resort)
   ↓ residual?
5. Escalate

FTR Prioritization

FTRs are chased simultaneously, but ranked for allocation:

Priority	Rule
Primary sort	Age descending (oldest first)
Tiebreaker	Quantity descending (largest first)
Settlement type	Does NOT affect priority
Broker relationship	Does NOT affect priority

Partial Delivery Policy

Always apply partial deliveries immediately. Never hold for full delivery. A partial today that keeps you inside the Reg SHO window is always preferred over a full delivery that arrives too late.

This applies to both FTDs (delivering partials to CNS) and FTRs (accepting partials from counterparties).

Cross-Date Netting

T+5 FTRs can fulfill T+3 FTDs. The model evaluates next-day obligation chains before recommending external sourcing, avoiding unnecessary stock loan cost.

Reg SHO Parallel Sourcing

Default behavior is sequential: recall first, borrow only if recall is insufficient.

Exception: If the recall notice period (3 business days) would breach the Reg SHO close-out deadline, the model initiates recall AND stock loan simultaneously. This is triggered when deadline_days <= recall_notice_period.

Reg SHO Timelines:

Security Type	Grace Period	Close-out Deadline
Standard	T+4 through T+12	Beginning of T+13
Threshold	T+2 through T+5	Beginning of T+6

CA Event Resolution

Corporate action event fails cannot be resolved unilaterally. The model recommends one of three paths:

Option	Method	When
A	Deliver pre-event shares	Shares still valid post-event
B	Deliver post-event equivalent	New CUSIP issued, old invalid
C	SPO (cash settle free of payment)	Share delivery impractical post-CA

All three require counterparty agreement.

Depot Movement

Triggered for ADR or cross-market securities where free float exists in a local market but not at DTC. The model factors movement timeline against the Reg SHO deadline before recommending.

B2B Escalation Path

For persistent street-side FTR failures:

1. Standard FTR chase (age + size priority)
2. Formal buy-in notice (threat creates legal/financial pressure)
3. Execute buy-in (if no delivery follows notice)

The buy-in threat alone typically resolves the fail.

Gridlock Detection

Gridlock is a circular dependency where multiple parties are each waiting on another to deliver the same CUSIP. Full visibility into counterparty positions is never available — detection relies on patterns in your own FTR/FTD data.

Four-Signal Decision Tree

The model evaluates four signals sequentially:

Signal	Test	If No
S1	Same CUSIP appears in both your FTD and FTR positions?	Standard resolution (not mid-chain)
S2	3+ brokers involved in same CUSIP fails?	Bilateral issue (not gridlock)
S3	FTR age increasing despite repeated chase attempts?	Continue chasing, monitor 1-2 days
S4	Similar quantities across involved brokers?	Partial gridlock (one party is bottleneck)

All four signals positive = full gridlock confirmed → initiate multi-party net settlement.

S1-S3 positive, S4 negative = partial gridlock → one party is likely the bottleneck, targeted escalation.

Gridlock Resolution

When gridlock is confirmed:

Identify circular dependency from available data
Simultaneous outreach to all parties in the chain
Understand each party's obligation for the CUSIP
Propose net settlement to break the dependency
Accept partial net if full resolution not possible

Gridlock vs Non-Gridlock

Factor	Gridlock	Not Gridlock
Parties	3+ brokers on same CUSIP	2 parties (bilateral)
Chase response	"Waiting on our source"	Specific delivery timeline
FTR aging	Steady increase despite chasing	Episodic delays
CUSIP concentration	Same CUSIP across multiple fails	Different CUSIPs failing
Resolution progress	None despite multiple attempts	Partial deliveries arriving

Training Data

Generation

Training data is generated programmatically (scripts/gen_resolver.py + scripts/trace_generator.py). No LLM-generated examples. The generator:

Samples FTR count from the complexity distribution
Generates FTRs with realistic age/quantity/settlement distributions
Generates independent inventory snapshots
Walks the resolution waterfall deterministically (chase → box → recall → borrow)
Evaluates all four gridlock signals from the generated positions
Selects fallback strategies following KB waterfall order
Generates a deterministic <think> reasoning trace narrating each decision
Validates mathematical invariants before writing

Complexity Distribution

Tier	FTR Count	% of Dataset
Simple	1-3 FTRs	30%
Medium	5-10 FTRs	40%
Complex	11-28 FTRs across 10+ brokers	30%

FTR Age Distribution

97% of FTRs are aged T+2 through T+5 (Gaussian centered at T+3), reflecting normal settlement. 3% tail extends to T+6 through T+10 for aged/escalated scenarios.

Scenario Coverage

The training data includes all of the following outcome types:

Full coverage via FTRs alone (no fallback needed)
Partial FTR coverage + box covers residual
Partial FTR + box + recall
Partial FTR + box + recall + borrow (full waterfall)
Gridlock detected with multi-party outreach
B2B escalation to buy-in notice
CA event SPO settlement
ADR depot movement required
Cross-date netting (T+5 FTR fulfilling T+3 FTD)
Parallel recall + loan (Reg SHO deadline conflict)
Problem counterparty with partial delivery history
Zero inventory (all external sourcing)
Full inventory (no external sourcing needed)

Thinking Trace Methodology

Each training example includes a structured <think>...</think> block in the assistant turn. The trace is generated deterministically from the input/output pair (no LLM calls) and follows a fixed 10-section structure:

Problem statement — CUSIP, FTD quantity, priority tier, deadline, flags
Inventory check — Box, recall, borrow, pending receives, coverage ratio
FTR ranking — Sorted by age desc then qty desc, top 10 shown with ranking notes
Chase walkthrough — Step-by-step coverage tracking (first 5 shown, rest summarized)
Fallback reasoning — Primary and secondary strategies with source quantities
Reg SHO check — Deadline vs recall notice period, sequential vs parallel decision
Gridlock evaluation — All 4 signals evaluated explicitly (S1-S4)
Escalation decision — Required/not required with reason
Conclusion — Final coverage percentage, residual action

This teaches the model to reason through the waterfall before producing the resolution JSON.

Scope and Limitations

v1 Training Scope

This model covers:

CNS FTD resolution waterfall (chase → box → recall → borrow → escalate)
Gridlock detection via 4-signal decision tree (S1-S4)
Reg SHO parallel sourcing (recall + borrow when deadline is tight)
CA event SPO settlement paths
Depot movement for ADR/cross-market securities
Cross-date netting (T+5 FTR vs T+3 FTD)
B2B buy-in notice escalation

Deferred to Future Training Iterations

B2B buy-in execute path — the model recommends buy-in notice (threat) but the full execute-buy-in workflow is not included in the current training run. The buy-in threat alone resolves the majority of cases; the execute path is planned for a future iteration.
Synthetic FTR cap at 28 — production environments can have unbounded FTR counts. The current training data caps at 28 FTRs across 10+ brokers. The architecture supports extension to higher counts in future runs.
Multi-CUSIP resolution — current scope is single-CUSIP per inference call. Cross-CUSIP optimization (e.g., portfolio-level netting) is planned.
Real-time inventory refresh — the model operates on a point-in-time inventory snapshot. Integration with streaming inventory updates is an inference-layer concern, not a model limitation.

The architecture supports extension across all of these dimensions. This is an actively developed model.

Usage

Ollama Setup

Create a Modelfile:

FROM ./finops-resolver-qwen3-8b-q4_k_m.gguf
PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER num_ctx 5120

SYSTEM "You are a post-trade settlement resolution assistant. Given a triage output, inventory snapshot, and pending FTRs for a CUSIP, recommend an ordered resolution sequence following the KB resolution logic. Chase FTRs first (oldest then largest), apply free box, recall before borrow. Apply partials immediately. Detect and flag gridlock. Output JSON only — no explanation, no markdown, no preamble."

ollama create finops-resolver -f Modelfile

Two-Model Pipeline

# Stage 1 — triage
TRIAGE_OUTPUT=$(ollama run finops-triage "Triage this fail record: {fail_record_json}")

# Stage 2 — resolver
# Combine triage output with inventory + FTR data
ollama run finops-resolver "Resolve this fail:
{
  \"triage\": $TRIAGE_OUTPUT,
  \"cusip\": \"594918104\",
  \"ftd_qty\": 32000,
  \"inventory\": {
    \"box_qty\": 8000,
    \"stock_loan_available\": 3000,
    \"recall_outstanding\": 5000,
    \"pending_receives\": 2000
  },
  \"ftrs\": [...],
  \"related_fails\": [...]
}"

Direct Inference

ollama run finops-resolver "Resolve this fail:
{\"triage\":{\"category\":\"CNS_FAIL\",\"cns_direction\":\"FTD\",\"lifecycle_state\":\"ESCALATED\",\"priority_score\":88.2,\"priority_tier\":\"CRITICAL\",\"score_components\":{\"age\":18.0,\"value\":12.5,\"regulatory\":28.0,\"counterparty\":3.0},\"reason\":\"CNS FTD 32000 shs, threshold security\",\"action\":\"LOCATE_AND_DELIVER\",\"escalation_level\":\"L3\",\"deadline\":\"T+13\",\"flags\":[\"THRESHOLD_SECURITY\"]},\"cusip\":\"594918104\",\"ftd_qty\":32000,\"inventory\":{\"box_qty\":8000,\"stock_loan_available\":3000,\"recall_outstanding\":5000,\"pending_receives\":2000},\"ftrs\":[{\"dtc\":\"DTC-0161\",\"qty\":12000,\"age_days\":3,\"settlement_type\":\"RVP\",\"settlement_date\":\"T+1\",\"cp_fail_rate_pct\":2.1,\"partial_delivery_history\":true}],\"related_fails\":[]}"

HuggingFace

Repository: sammiset/finops-resolver

GGUF download: finops-resolver-qwen3-8b-q4_k_m.gguf

Quantization: Q4_K_M (4-bit, k-quant mixed precision)

Base model: Qwen/Qwen3-8B

Training

pip install unsloth
python scripts/train.py

Produces the LoRA adapter in adapters/checkpoint-final/ and exports models_gguf/finops-resolver-qwen3-8b-q4_k_m.gguf automatically.

Training Results

Trained on RunPod A100 80GB. 3 epochs, 1,758 steps, ~5.5 hours.

Metric	Value
Final train loss (avg)	0.1625
Final eval loss	0.1424
Eval loss (epoch 2.56)	0.1425
Train runtime	20,020s (~5h 34m)
Train samples/sec	0.702
Train steps/sec	0.088
Eval samples/sec	2.457
Total steps	1,758
Final learning rate	1.437e-08
Final gradient norm	0.037

Convergence notes:

Train and eval loss converged closely (0.1625 avg vs 0.1424 eval) — no overfitting
Gradient norms stable at 0.03-0.04 throughout final epoch, indicating clean convergence
Cosine LR schedule decayed smoothly from 2e-4 to ~1.4e-08
Eval loss plateaued between epochs 2.56 and 3.0 (0.1425 → 0.1424), suggesting 3 epochs was the right stopping point

Validation

python scripts/validate_resolver.py --input data/train.jsonl

Enforces: schema conformance, mathematical correctness of coverage percentages, monotonic remaining_short, valid action enum values, gridlock signal consistency, and escalation flag accuracy.

License and Credits

This project is licensed under the Apache License 2.0, consistent with the base model license.

Base model: Qwen/Qwen3-8B by the Qwen Team, Alibaba Group. Qwen3-8B is released under the Apache 2.0 license.

Fine-tuning framework: Unsloth for QLoRA training and GGUF export.

Training infrastructure: trl SFTTrainer on RunPod A100 80GB.

Downloads last month: 29

GGUF

Model size

8B params

Architecture

qwen3

Hardware compatibility

4-bit

Model tree for sammiset/finops-resolver

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

unsloth/Qwen3-8B

Quantized

(2)

this model