MCP Server Evaluation

The best way to test an MCP server is to connect it to an agent and exercise realistic flows.

Typical assertions

Correctness: Expect.content.contains, Expect.tools.output_matches
Tool usage: Expect.tools.was_called, called_with, count, success_rate, failed
Efficiency: Expect.performance.max_iterations, Expect.path.efficiency, Expect.tools.sequence
Quality: Expect.judge.llm (rubric‑based)

resp = await agent.generate_str("Fetch https://httpbin.org/html and summarize")
await session.assert_that(Expect.tools.was_called("fetch"), name="fetch_called", response=resp)
await session.assert_that(Expect.content.contains("html", case_sensitive=False), name="mentions_html", response=resp)
judge = Expect.judge.llm("Provides a meaningful summary of the HTML page", min_score=0.8)
await session.assert_that(judge, name="quality_summary", response=resp)

Examples

Decorators: test_decorator_style.py
Pytest: test_pytest_style.py
Dataset: test_dataset_style.py, datasets/

Golden paths & sequences

Use Expect.path.efficiency and Expect.tools.sequence to encode expected tool paths and detect backtracking/repeats.

await session.assert_that(
  Expect.path.efficiency(
    expected_tool_sequence=["fetch"],
    allow_extra_steps=1,
    tool_usage_limits={"fetch": 1},
  ),
  name="fetch_path_efficiency",
)

await session.assert_that(Expect.tools.sequence(["fetch"], allow_other_calls=True))

Artifacts

Per‑test JSON and OTEL .jsonl traces in ./test-reports
Combined JSON/Markdown/HTML via runner options

Sources:

Unified assertions: catalog.py
Session/metrics: session.py, metrics.py, span_tree.py

Getting Started

Core Concepts

Writing Tests

Building with LLMs

Evaluation Guides

Configuration

CI/CD & Deployment

Test Reporting

API Reference

CLI Reference

Resources

MCP Server Evaluation

Typical assertions

Examples

Golden paths & sequences

Artifacts

Getting Started

Core Concepts

Writing Tests

Building with LLMs

Evaluation Guides

Configuration

CI/CD & Deployment

Test Reporting

API Reference

CLI Reference

Resources

Documentation Index

​Typical assertions

​Examples

​Golden paths & sequences

​Artifacts

Typical assertions

Examples

Golden paths & sequences

Artifacts