> AI agent evaluation framework Evaluate agents with
Evaluate agents with
terminal precision
No server. No signup. Multi-objective scoring from YAML specs. Deterministic code judges + customizable LLM judges, version-controlled in Git.
$ agentv eval ./evals/math.yaml
Running 3 eval cases...
PASS addition score: 1.0
PASS multiplication score: 1.0
FAIL division score: 0.4
Results: 2 passed 1 failed
$ agentv compare run-a run-b
Comparing 2 runs...
correctness +12.5% (0.72 -> 0.81)
latency -340ms (1.2s -> 0.86s)
cost +$0.02 ($0.05 -> $0.07)
Overall: improved
Built for your workflow
Local Execution
No cloud dependency. All data stays on your machine. Zero overhead to get started.
Multi-Objective Scoring
Correctness, latency, cost, and safety measured in a single evaluation run.
Code + LLM Judges
Deterministic code validators and customizable LLM judges, composable and extensible.
LLM & Agent Targets
Direct LLM providers plus Claude Code, Codex, Pi, Copilot, OpenCode agent targets.
Rubric Grading
Structured criteria with weights and auto-generation. Google ADK-style object rubrics.
A/B Comparison
Compare evaluation runs side-by-side with statistical deltas and regression detection.
Quick Start
1
Install
npm install -g agentv 2
Initialize
agentv init 3
Configure
Copy .env.example to .env and add your API keys.
4
Create an eval
description: Math evaluation
execution:
target: default
evalcases:
- id: addition
expected_outcome: Correctly calculates 15 + 27 = 42
input_messages:
- role: user
content: What is 15 + 27? 5
Run
agentv eval ./evals/example.yaml How AgentV Compares
| Feature | AgentV | LangWatch | LangSmith | LangFuse |
|---|---|---|---|---|
| Setup | npm install | Cloud account + API key | Cloud account + API key | Cloud account + API key |
| Server | None (local) | Managed cloud | Managed cloud | Managed cloud |
| Privacy | All local | Cloud-hosted | Cloud-hosted | Cloud-hosted |
| CLI-first | ✓ | ✗ | Limited | Limited |
| CI/CD ready | ✓ | Requires API calls | Requires API calls | Requires API calls |
| Version control | ✓ YAML in Git | ✗ | ✗ | ✗ |
| Evaluators | Code + LLM + Custom | LLM only | LLM + Code | LLM only |