Spaces:

vishaldhakad
/

SecureCodeEnv

Sleeping

App Files Files Community

SecureCodeEnv / README.md

vishaldhakad

removing unnescearry things

31726f5 3 months ago

preview code

Raw

History Blame Contribute Delete

4.1 kB

	---
	title: Trainx
	emoji: 🔒
	colorFrom: red
	colorTo: blue
	sdk: docker
	pinned: true
	license: apache-2.0
	---

	# SecureCodeEnv

	An RL environment for training LLM agents to write production-ready, secure Python code.

	---

	## The Problem

	Studies show 12–65% of LLM-generated code contains security vulnerabilities. Secure-pass@1 rates remain below 12% for all frontier models even when functional pass@1 exceeds 50%.

	Every existing RL environment trains agents to write code that works. None train agents to write code that is safe, consistent, and production-ready. SecureCodeEnv closes that gap.

	---

	## What Makes This Environment Different

	\| Feature \| SecureCodeEnv \| Typical RL Code Envs \|
	\|---\|---\|---\|
	\| Dynamic adversarial grading \| ✅ Real attacks fired per episode \| ❌ Static patterns only \|
	\| CodeGraph memory \| ✅ Cross-step convention tracking \| ❌ Single-function only \|
	\| CWE-grounded tasks \| ✅ 9 tasks, 12+ CWE IDs \| ❌ Generic correctness \|
	\| Security gate on done \| ✅ Attack + static thresholds \| ❌ Pass/fail only \|
	\| Anti-reward-hacking \| ✅ Seeded random payloads \| ❌ Fixed test cases \|

	---

	## Reward System — 7 Dimensions

	\| Dimension \| Weight \| Tool \| What It Measures \|
	\|---\|---\|---\|---\|
	\| correctness \| 25% \| Custom test runner \| Test cases passed \|
	\| attack_resist \| 25% \| Dynamic harness \| Real attack payloads blocked \|
	\| static_security \| 20% \| bandit + AST \| CWE-mapped vulnerability patterns \|
	\| consistency \| 10% \| CodeGraph \| Convention adherence across steps \|
	\| performance \| 8% \| timeit \| Speed vs naive/optimal baselines \|
	\| documentation \| 7% \| AST \| Docstring + type hint coverage \|
	\| code_structure \| 5% \| AST \| Clean code (no bare print/except) \|

	Security gate: episode cannot complete unless `attack_resist ≥ 0.75` AND `static_security ≥ 0.70` AND `correctness ≥ 0.80`.

	---

	## Tasks — 9 Tasks Across 3 Difficulty Levels

	### Easy
	\| Task \| CWE Targets \|
	\|---\|---\|
	\| Password Validator \| CWE-916, CWE-521 \|
	\| Input Sanitizer \| CWE-20, CWE-116 \|
	\| Token Generator \| CWE-338, CWE-330 \|

	### Medium
	\| Task \| CWE Targets \|
	\|---\|---\|
	\| SQL Query Builder \| CWE-89 \|
	\| File Path Handler \| CWE-22 \|
	\| Rate Limiter \| CWE-770, CWE-400 \|

	### Hard
	\| Task \| CWE Targets \|
	\|---\|---\|
	\| File Upload Handler \| CWE-22, CWE-434 \|
	\| JWT Validator \| CWE-347, CWE-613 \|
	\| Auth Middleware \| CWE-287, CWE-352 \|

	---

	## Quick Start

	```python
	import requests

	BASE = "http://localhost:7860"

	# Start episode
	ep = requests.post(f"{BASE}/reset", json={"difficulty": "medium"}).json()
	sid = ep["session_id"]
	print(ep["problem_statement"])

	# Submit code
	result = requests.post(f"{BASE}/step", json={
	"session_id": sid,
	"code": "def build_user_query(u, r):\n return ('SELECT * FROM users WHERE username=%s', (u,))",
	"filename": "solution.py"
	}).json()

	print(f"reward={result['total_reward']:.3f}")
	print(result["feedback"]["summary"])
	```

	---

	## API Endpoints

	\| Method \| Endpoint \| Description \|
	\|---\|---\|---\|
	\| GET \| /health \| Health check \|
	\| POST \| /reset \| Start new episode \|
	\| POST \| /step \| Submit code for grading \|
	\| GET \| /state \| Current episode state \|
	\| GET \| /tasks \| List all tasks \|
	\| GET \| /tasks/{id} \| Task detail + starter code \|
	\| GET \| /docs \| Swagger UI \|

	---

	## Setup

	```bash
	# Docker (recommended)
	docker build -t secure-code-env .
	docker run -p 7860:7860 secure-code-env

	# Direct
	pip install -r requirements.txt
	uvicorn app.main:app --host 0.0.0.0 --port 7860
	```

	## Run Baseline Inference

	```bash
	export API_BASE_URL=https://api.openai.com/v1
	export MODEL_NAME=gpt-4o-mini
	export HF_TOKEN=your_token
	export ENV_URL=http://localhost:7860
	python inference.py
	```

	## Pre-submission Validation

	```bash
	python validate.py --url http://localhost:7860
	```

	---

	## Environment Variables

	\| Variable \| Required \| Description \|
	\|---\|---\|---\|
	\| `API_BASE_URL` \| Yes (inference) \| LLM API endpoint \|
	\| `MODEL_NAME` \| Yes (inference) \| Model identifier \|
	\| `HF_TOKEN` \| Yes (inference) \| API authentication token \|
	\| `ENV_URL` \| No \| Override environment URL (default: localhost:7860) \|