Quick and dirty LLM-based evaluations for your AI product traces. Upload a dataset, tell the LLM what to look for, and get results in minutes.
- Upload datasets — drop in a CSV of user conversations and preview it instantly
- Create evals — write a plain-English prompt describing what to evaluate. Classify (yes/no), score (1-5), categorize, or get freeform comments
- Run — pick a dataset, pick an eval, hit run. Results stream in live as the LLM works through your traces
- Charts & insights — see results at a glance with pie charts, bar charts, and per-trace reasoning
- Export — download everything as a CSV with eval results as new columns, ready for your spreadsheet
- Iterate — update your eval prompt, re-run on the same dataset, and compare versions