Inference Providers
Active filters: rl
Text Generation
• 4B • Updated Text Generation
• 4B • Updated • 3
Text Generation
• 4B • Updated • 1
Text Generation
• 4B • Updated Text Generation
• 4B • Updated • 2
Text Generation
• 4B • Updated • 1
Text Generation
• 4B • Updated Text Generation
• 4B • Updated • 1
Text Generation
• 4B • Updated • 1
HarleyCooper/Qwen3-30B-Dakota1890
Text Generation
• Updated • 7
• 2
HerrHruby/offline_acemath_rl_4b_inst_hard_with_dishsoap_16k_no_summ_curr_step_120
Text Generation
• 4B • Updated • 6
HarleyCooper/Qwen3-30B-ThinkingMachines-Dakota1890
Reinforcement Learning
• Updated • 4
Text Generation
• 21B • Updated • 4
mradermacher/CAI-20B-v2-GGUF
Text Generation
• 21B • Updated • 36
mradermacher/CAI-20B-v2-i1-GGUF
Text Generation
• 21B • Updated • 121
socaitcy/SOCAIT-Hermes-14B
Text Generation
• Updated ash256/qwen3-4b-question-gen
Text Generation
• 4B • Updated • 4
• • 1
pankajmathur/nanochat-d34-rl-all-ckpts
Text Generation
• Updated • 1
pankajmathur/nanochat-d34-rl
Text Generation
• Updated pankajmathur/RenCoder-Devstral-Small-2507
Text Generation
• 24B • Updated • 27
• • 1
HallD/SkeptiSTEM-4B-v2-stageR3-grpo-lora
Text Generation
• Updated anakin87/LFM2-2.6B-ttt-rl
Text Generation
• Updated • 2
anakin87/LFM2-2.6B-ttt-rl-merged
Text Generation
• 3B • Updated • 2
Any-to-Any
• 7B • Updated • 9
ModalityDance/Omni-R1-Zero
Any-to-Any
• 7B • Updated • 10
ibrahima2222/nanochat-d32
Updated
IIGroup/X-Coder-RL-Qwen2.5-7B
8B • Updated • 44
• 1
IIGroup/X-Coder-RL-Qwen3-8B
8B • Updated • 9
• 1
mradermacher/X-Coder-RL-Qwen3-8B-GGUF
8B • Updated • 200
• 1
mradermacher/X-Coder-RL-Qwen2.5-7B-GGUF
8B • Updated • 297