ResilientAgent-CAIP LoRA Adapter

LoRA adapter for a constraint-aware hospital incident workflow environment built for RL with verifiable rewards.

Model Details

  • Adapter type: LoRA / QLoRA-compatible adapter
  • Base model: Qwen/Qwen2.5-1.5B-Instruct
  • Primary objective: Improve policy quality in a verifiable OpenEnv-style task
  • Environment contract: 2026-04-25.v1

Task

The policy acts step-by-step in a hospital incident workflow:

  1. inspect
  2. isolate_zone
  3. apply_patch
  4. verify_fix
  5. commit_fix

Each step is scored with independent reward components to reduce reward hacking risk.

Reward and Safety Design

Reward columns:

  • success
  • correctness
  • format
  • timeout
  • safety

Safety and anti-cheat checks include:

  • invalid sequence detection
  • replay-pattern detection
  • mutation and state consistency checks
  • QA gate before paid training start

Training Setup

  • Trainer stack: TRL-style GRPO pipeline scaffold
  • Profiles: smoke / budget / full
  • Budget-oriented run order:
    • local QA first
    • smoke profile
    • eval + regression checks
    • larger profile only after gates pass

Evaluation

Expected artifacts (uploaded under reports):

  • grpo_training_report.json
  • eval_baseline_vs_trained.json
  • eval_regression_suite.json
  • qa_gate_report.json

Intended Use

This adapter is intended for research and hackathon demonstration of:

  • RL with verifiable rewards
  • environment-first training
  • safety and anti-cheat constraints
  • reproducible demo deployment via Hugging Face Spaces

Limitations

  • This repo may contain profile-based scaffold reports during hackathon iteration.
  • Not intended for clinical deployment.
  • Output quality depends on reward/verifier coverage and environment robustness.

Quick Start (PEFT)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen2.5-1.5B-Instruct"
adapter_id = "Ajay1232/resilient-agent-caip-lora"

tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(base_id)
model = PeftModel.from_pretrained(base, adapter_id)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ajay1232/resilient-agent-caip-lora

Adapter
(1134)
this model