LLMPerf

Open-source LLM performance benchmark by Ray Project. Load tests and correctness tests for LLM APIs. Measures inter-token latency, generation throughput, and concurrent request handling.

aitestingbenchmark platform_profile

Website ↗

Benchmark Your API

Score Breakdown

Latency 9/10

Auth Simplicity 9/10

Parseability 9/10

Consistency 8/10

Token Efficiency 8/10

Documentation 7/10

Error Clarity 6/10

First-Try Success 6/10

Benchmark Analysis Log

Full LLM thinking from the 4-phase benchmark pipeline.

Analyze

```json
{
  "service_type": "platform",
  "base_url": "https://github.com/ray-project/llmperf",
  "auth_method": "none",
  "auth_config": {},
  "endpoints": [],
  "pricing_model": {
    "type": "free",
    "details": {
      "open_source": true,
      "license": "Apache-2.0"
    }
  },
  "rate_limits": {},
  "capabilities": [
    "LLM API load testing",
    "LLM correctness testing",
    "Inter-token latency measurement",
    "Generation throughput analysis",
    "Concurrent request handling evaluation",
    "Performance benchmarking",
    "Distributed testing via Ray",
    "Multi-model comparison",
    "Statistical performance analysis",
    "Custom benchmark configuration"
  ],
  "raw_analysis": "LLMPerf is an open-source benchmarking framework developed by the Ray Project specifically for evaluating Large Language Model (LLM) API performance. As a testing platform, it provides comprehensive load testing and correctness validation capabilities for various LLM services.\n\nThe platform targets AI developers, researchers, ML engineers, and organizations deploying LLM applications who need reliable performance metrics before production deployment. It's particularly valuable for teams comparing different LLM providers or optimizing their LLM infrastructure.\n\nKey technical capabilities include measuring critical performance metrics like inter-token latency (how quickly tokens are generated), overall generation throughput, and how well APIs handle concurrent requests under load. This is crucial for applications requiring real-time or high-throughput LLM interactions.\n\nBeing part of the Ray ecosystem gives LLMPerf significant advantages in distributed testing scenarios, allowing users to simulate realistic load patterns across multiple nodes. The platform's maturity benefits from Ray Project's established infrastructure and community.\n\nIntegration-wise, it works with major LLM providers (OpenAI, Anthropic, Cohere, etc.) and can be incorporated into CI/CD pipelines for continuous performance monitoring. The open-source nature allows for customization and community contributions.\n\nThis platform is particularly relevant for agents needing to select optimal LLM providers based on performance characteristics rather than just accuracy metrics. It fills a critical gap in the LLM tooling ecosystem by providing standardized performance benchmarking capabilities."
}
```

Execute

1/3 tests passed

Test	Endpoint	Status	Latency
website_uptime	GET /	200	481ms
robots_txt	GET /robots.txt	404	47ms
llms_txt	GET /llms.txt	404	45ms

Interpret

{"multi_model": true, "models_used": ["openai", "claude_cli"], "model_scores": {"GPT-4o": {"overall": 75, "dimensions": {"token_efficiency": 8.0, "first_try_success": 6.0, "response_parseability": 9.0, "error_clarity": 7.0, "doc_quality": 7.0, "auth_simplicity": 10.0, "latency": 8.0, "consistency": 8.0}}, "Claude CLI": {"overall": 77, "dimensions": {"token_efficiency": 8.5, "first_try_success": 6.0, "response_parseability": 9.0, "error_clarity": 6.0, "doc_quality": 6.5, "auth_simplicity": 7.5, "latency": 9.5, "consistency": 8.5}}}, "averaged": true}

Agent Readiness

x402 Payments

Not supported

Streaming

Sandbox

None

Agent Auth

Unknown

SDKs

None listed

MCP Support

Embed your Prowl badge

Show your live agent-readiness score on your own site. Free, no auth — it updates as your score changes.

<a href="https://prowl.world/service/llmperf">
  <img src="https://prowl.world/badge/llmperf.svg" height="56" alt="Agent-readiness on Prowl">
</a>

Options: ?style=light|dark · ?size=sm|md · ?variant=certified (claimed + DNS-verified only) · badge generator with preview

Want the full interactive view?

See operational metrics, LLM evaluations, agent readiness, and more.

Open in Dashboard