Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mcp-eval.ai/llms.txt

Use this file to discover all available pages before exploring further.

The best way to test an MCP server is to connect it to an agent and exercise realistic flows.

Typical assertions

  • Correctness: Expect.content.contains, Expect.tools.output_matches
  • Tool usage: Expect.tools.was_called, called_with, count, success_rate, failed
  • Efficiency: Expect.performance.max_iterations, Expect.path.efficiency, Expect.tools.sequence
  • Quality: Expect.judge.llm (rubric‑based)
resp = await agent.generate_str("Fetch https://httpbin.org/html and summarize")
await session.assert_that(Expect.tools.was_called("fetch"), name="fetch_called", response=resp)
await session.assert_that(Expect.content.contains("html", case_sensitive=False), name="mentions_html", response=resp)
judge = Expect.judge.llm("Provides a meaningful summary of the HTML page", min_score=0.8)
await session.assert_that(judge, name="quality_summary", response=resp)

Examples

Golden paths & sequences

Use Expect.path.efficiency and Expect.tools.sequence to encode expected tool paths and detect backtracking/repeats.
await session.assert_that(
  Expect.path.efficiency(
    expected_tool_sequence=["fetch"],
    allow_extra_steps=1,
    tool_usage_limits={"fetch": 1},
  ),
  name="fetch_path_efficiency",
)

await session.assert_that(Expect.tools.sequence(["fetch"], allow_other_calls=True))

Artifacts

  • Per‑test JSON and OTEL .jsonl traces in ./test-reports
  • Combined JSON/Markdown/HTML via runner options
Sources: