AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence Paper β’ 2511.01144 β’ Published Nov 3, 2025 β’ 4
view article Article CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models +14 May 24, 2024 β’ 22
trend-cybertron/Llama-Primus-Nemotron-70B-Instruct Text Generation β’ 71B β’ Updated Aug 9, 2025 β’ 652 β’ 14
REAL-MM-RAG-Bench Collection REAL-MM-RAG-Bench is a benchmark designed to evaluate multi-modal retrieval models under realistic and challenging conditions. β’ 4 items β’ Updated Mar 13, 2025 β’ 11