Spaces:

osunlp
/

QUEST

Running

Evaluation tool for web research LLMs: accuracy + hallucination + cost

by vigneshwar234 - opened 4 days ago

Hi OSU NLP team 👋

QUEST's multi-source web research approach is impressive. For teams deploying web research LLMs, hallucination is the most critical failure mode — fabricated citations are worse than no answer.

I built an open source LLM Evaluation Framework with a dedicated hallucination metric:

→ 🔍 Hallucination Rate — detects ungrounded claims, runs locally on any output
→ 🎯 Accuracy — verified against ground truth
→ 🧠 Reasoning Quality — CoT depth, important for research-style multi-step answers
→ 💰 Cost per 1K tokens — web research tasks are token-heavy
→ ⚡ Latency p95

Live demo: https://huggingface.co/spaces/vigneshwar234/llm-eval-demo
GitHub: https://github.com/vignesh2027/LLM-Evaluation-Framework

Open source. Free forever. Happy to discuss web research LLM evaluation approaches!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment