Documentation Index
Fetch the complete documentation index at: https://mcp-eval.ai/llms.txt
Use this file to discover all available pages before exploring further.
Case[Input, Output, Metadata]
Dataset[Input, Output, Metadata]
Source: datasets.py
Programmatic
from mcp_eval import Case, Dataset, ToolWasCalled, ResponseContains
cases = [
Case(
name="fetch_example",
inputs="Fetch https://example.com",
evaluators=[ToolWasCalled("fetch"), ResponseContains("Example Domain")],
)
]
dataset = Dataset(name="Fetch Suite", cases=cases)
report = await dataset.evaluate(lambda inputs, agent, session: agent.generate_str(inputs))
report.print(include_input=True, include_output=True)
Parallel evaluation:
report = await dataset.evaluate(
lambda inputs, agent, session: agent.generate_str(inputs),
max_concurrency=4,
)
YAML/JSON
Save/load via Dataset.to_file and Dataset.from_file. Schema: mcpeval.config.schema.json.
YAML example (from basic_fetch_dataset.yaml):
name: "Basic Fetch Dataset"
server_name: "fetch"
cases:
- name: "simple_fetch"
inputs: "Fetch https://example.com"
expected_output: "Example Domain"
evaluators:
- ToolWasCalled:
tool_name: "fetch"
- ResponseContains:
text: "Example Domain"
Concurrency
Dataset.evaluate(..., max_concurrency=N) runs cases in parallel.
Examples