🚛 Logistics Hackathon Agent (GRPO-Trained)

This is a LoRA adapter for Qwen2.5-1.5B-Instruct, heavily fine-tuned using Group Relative Policy Optimization (GRPO) to act as a centralized AI logistics coordinator.

It was built and trained specifically for the Meta PyTorch OpenEnv Hackathon 2026.

🚀 Live Environment & Dashboard

To see the environment this agent was trained on, visit our Hugging Face Space: 👉 Logistics Shipment Env (Live Demo)

📈 Training Details

The model was trained entirely on a live OpenEnv simulator of an Indian freight network experiencing cascading disruptions (port strikes, accidents, capacity saturation).

Algorithm: GRPO (via Hugging Face TRL & Unsloth)
Curriculum: 3-Phase progressive difficulty (Easy → Medium → Hardening)
Improvement: +327% jump in cumulative episode reward over the untrained baseline.

Reward Functions (Anti-Hacked)

The agent was optimized using 3 independent, verifiable reward signals:

Delay Reduction: Maximizing SLA compliance and minimizing total cargo delay hours.
Routing Logic: Heavy penalties (-0.6) for attempting to use non-existent or overloaded routes.
Communication: Rewarded for empathetic customer updates; instantly penalized (-0.5) for message spamming.

💻 Usage

Since this is a standard PEFT adapter, it can be loaded on top of the base Qwen2.5-1.5B model:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "Leavin1611/logistics-hackathon-model")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")