Quick and dirty LLM-based evaluations for your AI product traces. Upload a dataset, tell the LLM what to look for, and get results in minutes.<ul><li>Upload datasets — drop in a CSV of user conversations and preview it instantly</li><li>Create evals — write a plain-English prompt describing what to evaluate. Classify (yes/no), score (1-5), categorize, or get freeform comments</li><li>Run — pick a dataset, pick an eval, hit run. Results stream in live as the LLM works through your traces</li><li>Charts &amp; insights — see results at a glance with pie charts, bar charts, and per-trace reasoning</li><li>Export — download everything as a CSV with eval results as new columns, ready for your spreadsheet</li><li>Iterate — update your eval prompt, re-run on the same dataset, and compare versions</li></ul>

Introducing Beval — very simple evals

Help Center

Improved

Fixed

beval

In Review

Planned

In Progress

Completed

Rejected

High Priority

Low Priority

Backlog

Next up

Done

Main Roadmap

Hey {name|there}! 👋