Spaces:
Sleeping
Sleeping
File size: 4,095 Bytes
9ab0a97 6372c69 87f5919 31726f5 9ab0a97 ef93755 6372c69 9ab0a97 87f5919 ef93755 87f5919 ef93755 7abab00 ef93755 7abab00 ef93755 7abab00 87f5919 7abab00 87f5919 7abab00 87f5919 7abab00 87f5919 ef93755 7abab00 87f5919 7abab00 87f5919 7abab00 ef93755 87f5919 7abab00 87f5919 7abab00 87f5919 7abab00 ef93755 7abab00 87f5919 7abab00 87f5919 7abab00 87f5919 7abab00 87f5919 7abab00 ef93755 7abab00 87f5919 7abab00 ef93755 7abab00 87f5919 7abab00 87f5919 ef93755 7abab00 7257069 87f5919 7abab00 87f5919 7257069 ef93755 87f5919 7abab00 87f5919 7abab00 87f5919 6372c69 87f5919 7abab00 87f5919 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | ---
title: Trainx
emoji: π
colorFrom: red
colorTo: blue
sdk: docker
pinned: true
license: apache-2.0
---
# SecureCodeEnv
**An RL environment for training LLM agents to write production-ready, secure Python code.**
---
## The Problem
Studies show **12β65% of LLM-generated code contains security vulnerabilities**. Secure-pass@1 rates remain below 12% for all frontier models even when functional pass@1 exceeds 50%.
Every existing RL environment trains agents to write code that **works**. None train agents to write code that is **safe, consistent, and production-ready**. SecureCodeEnv closes that gap.
---
## What Makes This Environment Different
| Feature | SecureCodeEnv | Typical RL Code Envs |
|---|---|---|
| Dynamic adversarial grading | β
Real attacks fired per episode | β Static patterns only |
| CodeGraph memory | β
Cross-step convention tracking | β Single-function only |
| CWE-grounded tasks | β
9 tasks, 12+ CWE IDs | β Generic correctness |
| Security gate on done | β
Attack + static thresholds | β Pass/fail only |
| Anti-reward-hacking | β
Seeded random payloads | β Fixed test cases |
---
## Reward System β 7 Dimensions
| Dimension | Weight | Tool | What It Measures |
|---|---|---|---|
| correctness | 25% | Custom test runner | Test cases passed |
| attack_resist | 25% | Dynamic harness | Real attack payloads blocked |
| static_security | 20% | bandit + AST | CWE-mapped vulnerability patterns |
| consistency | 10% | CodeGraph | Convention adherence across steps |
| performance | 8% | timeit | Speed vs naive/optimal baselines |
| documentation | 7% | AST | Docstring + type hint coverage |
| code_structure | 5% | AST | Clean code (no bare print/except) |
**Security gate:** episode cannot complete unless `attack_resist β₯ 0.75` AND `static_security β₯ 0.70` AND `correctness β₯ 0.80`.
---
## Tasks β 9 Tasks Across 3 Difficulty Levels
### Easy
| Task | CWE Targets |
|---|---|
| Password Validator | CWE-916, CWE-521 |
| Input Sanitizer | CWE-20, CWE-116 |
| Token Generator | CWE-338, CWE-330 |
### Medium
| Task | CWE Targets |
|---|---|
| SQL Query Builder | CWE-89 |
| File Path Handler | CWE-22 |
| Rate Limiter | CWE-770, CWE-400 |
### Hard
| Task | CWE Targets |
|---|---|
| File Upload Handler | CWE-22, CWE-434 |
| JWT Validator | CWE-347, CWE-613 |
| Auth Middleware | CWE-287, CWE-352 |
---
## Quick Start
```python
import requests
BASE = "http://localhost:7860"
# Start episode
ep = requests.post(f"{BASE}/reset", json={"difficulty": "medium"}).json()
sid = ep["session_id"]
print(ep["problem_statement"])
# Submit code
result = requests.post(f"{BASE}/step", json={
"session_id": sid,
"code": "def build_user_query(u, r):\n return ('SELECT * FROM users WHERE username=%s', (u,))",
"filename": "solution.py"
}).json()
print(f"reward={result['total_reward']:.3f}")
print(result["feedback"]["summary"])
```
---
## API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Health check |
| POST | /reset | Start new episode |
| POST | /step | Submit code for grading |
| GET | /state | Current episode state |
| GET | /tasks | List all tasks |
| GET | /tasks/{id} | Task detail + starter code |
| GET | /docs | Swagger UI |
---
## Setup
```bash
# Docker (recommended)
docker build -t secure-code-env .
docker run -p 7860:7860 secure-code-env
# Direct
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 7860
```
## Run Baseline Inference
```bash
export API_BASE_URL=https://api.openai.com/v1
export MODEL_NAME=gpt-4o-mini
export HF_TOKEN=your_token
export ENV_URL=http://localhost:7860
python inference.py
```
## Pre-submission Validation
```bash
python validate.py --url http://localhost:7860
```
---
## Environment Variables
| Variable | Required | Description |
|---|---|---|
| `API_BASE_URL` | Yes (inference) | LLM API endpoint |
| `MODEL_NAME` | Yes (inference) | Model identifier |
| `HF_TOKEN` | Yes (inference) | API authentication token |
| `ENV_URL` | No | Override environment URL (default: localhost:7860) |
|