ResilientAgent-CAIP LoRA Adapter

LoRA adapter for a constraint-aware hospital incident workflow environment built for RL with verifiable rewards.

Model Details

Adapter type: LoRA / QLoRA-compatible adapter
Base model: Qwen/Qwen2.5-1.5B-Instruct
Primary objective: Improve policy quality in a verifiable OpenEnv-style task
Environment contract: 2026-04-25.v1

Task

The policy acts step-by-step in a hospital incident workflow:

inspect
isolate_zone
apply_patch
verify_fix
commit_fix

Each step is scored with independent reward components to reduce reward hacking risk.

Reward and Safety Design

Reward columns:

success
correctness
format
timeout
safety

Safety and anti-cheat checks include:

invalid sequence detection
replay-pattern detection
mutation and state consistency checks
QA gate before paid training start

Training Setup

Trainer stack: TRL-style GRPO pipeline scaffold
Profiles: smoke / budget / full
Budget-oriented run order:
- local QA first
- smoke profile
- eval + regression checks
- larger profile only after gates pass

Evaluation

Expected artifacts (uploaded under reports):

grpo_training_report.json
eval_baseline_vs_trained.json
eval_regression_suite.json
qa_gate_report.json

Intended Use

This adapter is intended for research and hackathon demonstration of:

RL with verifiable rewards
environment-first training
safety and anti-cheat constraints
reproducible demo deployment via Hugging Face Spaces

Limitations

This repo may contain profile-based scaffold reports during hackathon iteration.
Not intended for clinical deployment.
Output quality depends on reward/verifier coverage and environment robustness.

Quick Start (PEFT)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen2.5-1.5B-Instruct"
adapter_id = "Ajay1232/resilient-agent-caip-lora"

tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(base_id)
model = PeftModel.from_pretrained(base, adapter_id)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ajay1232/resilient-agent-caip-lora

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(1134)

this model