Spaces:
Sleeping
Sleeping
| title: Trainx | |
| emoji: π | |
| colorFrom: red | |
| colorTo: blue | |
| sdk: docker | |
| pinned: true | |
| license: apache-2.0 | |
| # SecureCodeEnv | |
| **An RL environment for training LLM agents to write production-ready, secure Python code.** | |
| --- | |
| ## The Problem | |
| Studies show **12β65% of LLM-generated code contains security vulnerabilities**. Secure-pass@1 rates remain below 12% for all frontier models even when functional pass@1 exceeds 50%. | |
| Every existing RL environment trains agents to write code that **works**. None train agents to write code that is **safe, consistent, and production-ready**. SecureCodeEnv closes that gap. | |
| --- | |
| ## What Makes This Environment Different | |
| | Feature | SecureCodeEnv | Typical RL Code Envs | | |
| |---|---|---| | |
| | Dynamic adversarial grading | β Real attacks fired per episode | β Static patterns only | | |
| | CodeGraph memory | β Cross-step convention tracking | β Single-function only | | |
| | CWE-grounded tasks | β 9 tasks, 12+ CWE IDs | β Generic correctness | | |
| | Security gate on done | β Attack + static thresholds | β Pass/fail only | | |
| | Anti-reward-hacking | β Seeded random payloads | β Fixed test cases | | |
| --- | |
| ## Reward System β 7 Dimensions | |
| | Dimension | Weight | Tool | What It Measures | | |
| |---|---|---|---| | |
| | correctness | 25% | Custom test runner | Test cases passed | | |
| | attack_resist | 25% | Dynamic harness | Real attack payloads blocked | | |
| | static_security | 20% | bandit + AST | CWE-mapped vulnerability patterns | | |
| | consistency | 10% | CodeGraph | Convention adherence across steps | | |
| | performance | 8% | timeit | Speed vs naive/optimal baselines | | |
| | documentation | 7% | AST | Docstring + type hint coverage | | |
| | code_structure | 5% | AST | Clean code (no bare print/except) | | |
| **Security gate:** episode cannot complete unless `attack_resist β₯ 0.75` AND `static_security β₯ 0.70` AND `correctness β₯ 0.80`. | |
| --- | |
| ## Tasks β 9 Tasks Across 3 Difficulty Levels | |
| ### Easy | |
| | Task | CWE Targets | | |
| |---|---| | |
| | Password Validator | CWE-916, CWE-521 | | |
| | Input Sanitizer | CWE-20, CWE-116 | | |
| | Token Generator | CWE-338, CWE-330 | | |
| ### Medium | |
| | Task | CWE Targets | | |
| |---|---| | |
| | SQL Query Builder | CWE-89 | | |
| | File Path Handler | CWE-22 | | |
| | Rate Limiter | CWE-770, CWE-400 | | |
| ### Hard | |
| | Task | CWE Targets | | |
| |---|---| | |
| | File Upload Handler | CWE-22, CWE-434 | | |
| | JWT Validator | CWE-347, CWE-613 | | |
| | Auth Middleware | CWE-287, CWE-352 | | |
| --- | |
| ## Quick Start | |
| ```python | |
| import requests | |
| BASE = "http://localhost:7860" | |
| # Start episode | |
| ep = requests.post(f"{BASE}/reset", json={"difficulty": "medium"}).json() | |
| sid = ep["session_id"] | |
| print(ep["problem_statement"]) | |
| # Submit code | |
| result = requests.post(f"{BASE}/step", json={ | |
| "session_id": sid, | |
| "code": "def build_user_query(u, r):\n return ('SELECT * FROM users WHERE username=%s', (u,))", | |
| "filename": "solution.py" | |
| }).json() | |
| print(f"reward={result['total_reward']:.3f}") | |
| print(result["feedback"]["summary"]) | |
| ``` | |
| --- | |
| ## API Endpoints | |
| | Method | Endpoint | Description | | |
| |---|---|---| | |
| | GET | /health | Health check | | |
| | POST | /reset | Start new episode | | |
| | POST | /step | Submit code for grading | | |
| | GET | /state | Current episode state | | |
| | GET | /tasks | List all tasks | | |
| | GET | /tasks/{id} | Task detail + starter code | | |
| | GET | /docs | Swagger UI | | |
| --- | |
| ## Setup | |
| ```bash | |
| # Docker (recommended) | |
| docker build -t secure-code-env . | |
| docker run -p 7860:7860 secure-code-env | |
| # Direct | |
| pip install -r requirements.txt | |
| uvicorn app.main:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| ## Run Baseline Inference | |
| ```bash | |
| export API_BASE_URL=https://api.openai.com/v1 | |
| export MODEL_NAME=gpt-4o-mini | |
| export HF_TOKEN=your_token | |
| export ENV_URL=http://localhost:7860 | |
| python inference.py | |
| ``` | |
| ## Pre-submission Validation | |
| ```bash | |
| python validate.py --url http://localhost:7860 | |
| ``` | |
| --- | |
| ## Environment Variables | |
| | Variable | Required | Description | | |
| |---|---|---| | |
| | `API_BASE_URL` | Yes (inference) | LLM API endpoint | | |
| | `MODEL_NAME` | Yes (inference) | Model identifier | | |
| | `HF_TOKEN` | Yes (inference) | API authentication token | | |
| | `ENV_URL` | No | Override environment URL (default: localhost:7860) | | |