Changelog

Follow new updates and improvements to beval.

February 9th, 2026

New

New users now see a Try an example dataset button when they have no datasets yet. One click loads 3 trip planning assistant conversations so you can start exploring evals right away β€” no CSV needed.

February 9th, 2026

New

Quick and dirty LLM-based evaluations for your AI product traces. Upload a dataset, tell the LLM what to look for, and get results in minutes.

  • Upload datasets β€” drop in a CSV of user conversations and preview it instantly
  • Create evals β€” write a plain-English prompt describing what to evaluate. Classify (yes/no), score (1-5), categorize, or get freeform comments
  • Run β€” pick a dataset, pick an eval, hit run. Results stream in live as the LLM works through your traces
  • Charts & insights β€” see results at a glance with pie charts, bar charts, and per-trace reasoning
  • Export β€” download everything as a CSV with eval results as new columns, ready for your spreadsheet
  • Iterate β€” update your eval prompt, re-run on the same dataset, and compare versions