ResilientAgent-CAIP LoRA Adapter
LoRA adapter for a constraint-aware hospital incident workflow environment built for RL with verifiable rewards.
Model Details
- Adapter type: LoRA / QLoRA-compatible adapter
- Base model:
Qwen/Qwen2.5-1.5B-Instruct - Primary objective: Improve policy quality in a verifiable OpenEnv-style task
- Environment contract:
2026-04-25.v1
Task
The policy acts step-by-step in a hospital incident workflow:
inspectisolate_zoneapply_patchverify_fixcommit_fix
Each step is scored with independent reward components to reduce reward hacking risk.
Reward and Safety Design
Reward columns:
- success
- correctness
- format
- timeout
- safety
Safety and anti-cheat checks include:
- invalid sequence detection
- replay-pattern detection
- mutation and state consistency checks
- QA gate before paid training start
Training Setup
- Trainer stack: TRL-style GRPO pipeline scaffold
- Profiles: smoke / budget / full
- Budget-oriented run order:
- local QA first
- smoke profile
- eval + regression checks
- larger profile only after gates pass
Evaluation
Expected artifacts (uploaded under reports):
grpo_training_report.jsoneval_baseline_vs_trained.jsoneval_regression_suite.jsonqa_gate_report.json
Intended Use
This adapter is intended for research and hackathon demonstration of:
- RL with verifiable rewards
- environment-first training
- safety and anti-cheat constraints
- reproducible demo deployment via Hugging Face Spaces
Limitations
- This repo may contain profile-based scaffold reports during hackathon iteration.
- Not intended for clinical deployment.
- Output quality depends on reward/verifier coverage and environment robustness.
Quick Start (PEFT)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_id = "Qwen/Qwen2.5-1.5B-Instruct"
adapter_id = "Ajay1232/resilient-agent-caip-lora"
tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(base_id)
model = PeftModel.from_pretrained(base, adapter_id)
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support