73
/100
prowl
Benchmarked Apr 06, 2026

ARBITER Verified

Multi-verifier consensus oracle for AI agent task quality verification. Submits agent output for parallel keyword, length, and criteria verification with majority-vote consensus. Returns HMAC-signed receipts (PASS/PARTIAL/FAIL). Built for SYNTHESIS hackathon March 2026.

oracleai-agentverificationconsensus api_benchmark api_keypublic
Benchmark Your API

Score Breakdown

Consistency 10/10
Auth Simplicity 10/10
Parseability 10/10
Latency 8/10
Token Efficiency 8/10
Error Clarity 6/10
First-Try Success 6/10
Documentation 4/10

Benchmark Analysis Log

Full LLM thinking from the 4-phase benchmark pipeline.

Analyze
```json
{
  "service_type": "rest_api",
  "base_url": "https://arbiter.chitacloud.dev",
  "auth_method": "api_key_header",
  "auth_config": {"header": "Authorization", "prefix": "Bearer"},
  "endpoints": [
    {
      "path": "/verify",
      "method": "POST",
      "purpose": "Submit agent output for multi-verifier consensus verification",
      "params": {
        "output": {"type": "string", "required": true},
        "criteria": {"type": "object", "required": false},
        "keywords": {"type": "array", "required": false}
      },
      "response_format": "json",
      "is_primary": true
    },
    {
      "path": "/receipt/{id}",
      "method": "GET", 
      "purpose": "Retrieve HMAC-signed verification receipt",
      "params": {
        "id": {"type": "string", "required": true}
      },
      "response_format": "json",
      "is_primary": false
    }
  ],
  "pricing_model": {
    "type": "unknown",
    "details": {},
    "free_tier": null,
    "paid_tiers": []
  },
  "rate_limits": {"rpm": null, "tpm": null, "daily": null, "concurrent": null},
  "capabilities": [
    "multi_verifier_consensus",
    "keyword_verification", 
    "length_verification",
    "criteria_verification",
    "hmac_signed_receipts",
    "quality_assessment",
    "pass_partial_fail_scoring"
  ],
  "agent_readiness": {
    "supports_x402": false,
    "supports_streaming": false,
    "has_sandbox": false,
    "sdks": [],
    "agent_auth_methods": ["api_key"]
  }
}
```
Plan
```json
{
  "tests": [
    {
      "name": "basic_verification",
      "endpoint": "/verify",
      "method": "POST",
      "headers": {
        "Content-Type": "application/json"
      },
      "payload": {
        "output": "The capital of France is Paris."
      },
      "expected_status": 200,
      "expected_behavior": "Returns verification result with consensus score",
      "metrics": ["latency", "accuracy", "status_code"],
      "validation": {
        "field": "verification_id",
        "type": "string",
        "min_length": 1
      }
    },
    {
      "name": "comprehensive_verification",
      "endpoint": "/verify",
      "method": "POST",
      "headers": {
        "Content-Type": "application/json"
      },
      "payload": {
        "output": "Based on the data analysis, I recommend investing in renewable energy stocks due to their 15% growth rate and strong regulatory support.",
        "criteria": {
          "factual_accuracy": "high",
          "logical_consistency": "required",
          "evidence_based": true
        },
        "keywords": ["renewable energy", "investment", "growth rate", "regulatory"]
      },
      "expected_status": 200,
      "expected_behavior": "Returns detailed verification with criteria assessment and keyword validation",
      "metrics": ["latency", "accuracy", "status_code"],
      "validation": {
        "field": "consensus_score",
        "type": "number",
        "min_value": 0,
        "max_value": 1
      }
    },
    {
      "name": "receipt_retrieval",
      "endpoint": "/receipt/{verification_id}",
      "method": "GET",
      "headers": {},
      "payload": {},
      "expected_status": 200,
      "expected_behavior": "Returns HMAC-signed verification receipt",
      "metrics": ["latency", "status_code"],
      "validation": {
        "field": "hmac_signature",
        "type": "string",
        "min_length": 32
      },
      "depends_on": "basic_verification"
    },
    {
      "name": "empty_output_error",
      "endpoint": "/verify",
      "method": "POST",
      "headers": {
        "Content-Type": "application/json"
      },
      "payload": {
        "output": ""
      },
      "expected_status": 400,
      "expected_behavior": "Returns validation error for empty output",
      "metrics": ["latency", "status_code", "error_handling"],
      "validation": {
        "field": "error",
        "type": "object"
      }
    },
    {
      "name": "missing_output_error",
      "endpoint": "/verify",
      "method": "POST",
      "headers": {
        "Content-Type": "application/json"
      },
      "payload": {
        "criteria": {"accuracy": "high"}
      },
      "expected_status": 400,
      "expected_behavior": "Returns validation error for missing required output parameter",
      "metrics": ["latency", "status_code", "error_handling"],
      "validation": {
        "field": "error",
        "type": "object"
      }
    },
    {
      "name": "long_text_verification",
      "endpoint": "/verify",
      "method": "POST",
      "headers": {
        "Content-Type": "application/json"
      },
      "payload": {
        "output": "This is a comprehensive analysis of market trends spanning multiple sectors including technology, healthcare, finance, and energy. The analysis covers historical data from the past five years, current market conditions, and projected trends for the next decade. Key findings include significant growth in AI and machine learning technologies, continued expansion in telehealth services, stable growth in financial services with emphasis on digital transformation, and substantial investment shifts toward renewable energy sources driven by regulatory changes and environmental concerns.",
        "keywords": ["market trends", "technology", "healthcare", "finance", "energy", "AI", "machine learning"]
      },
      "expected_status": 200,
      "expected_behavior": "Handles long text verification with multiple keywords",
      "metrics": ["latency", "accuracy", "status_code"],
      "validation": {
        "field": "keyword_matches",
        "type": "array",
        "min_length": 1
      }
    },
    {
      "name": "invalid_receipt_id",
      "endpoint": "/receipt/invalid-id-123",
      "method": "GET",
      "headers": {},
      "payload": {},
      "expected_status": 404,
      "expected_behavior": "Returns not found error for invalid receipt ID",
      "metrics": ["latency", "status_code", "error_handling"],
      "validation": {
        "field": "error",
        "type": "object"
      }
    }
  ],
  "pricing_probes": [
    {
      "name": "verify_usage_tracking",
      "description": "Submit verification request and check response headers or body for usage metrics",
      "endpoint": "/verify",
      "method": "POST",
      "payload": {
        "output": "Test verification for usage tracking analysis."
      },
      "check": "Look for usage counters, rate limit headers, or cost indicators in response"
    },
    {
      "name"
Execute

5/7 tests passed

TestEndpointStatusLatency
basic_verificationPOST /verify200402ms
comprehensive_verificationPOST /verify200135ms
receipt_retrievalGET /receipt/{verification_id}404135ms
empty_output_errorPOST /verify200130ms
missing_output_errorPOST /verify200129ms
long_text_verificationPOST /verify200136ms
invalid_receipt_idGET /receipt/invalid-id-123404187ms
Interpret
{"multi_model": true, "models_used": ["openai", "claude_cli"], "model_scores": {"GPT-4o": {"overall": 80, "dimensions": {"token_efficiency": 7.0, "first_try_success": 7.0, "response_parseability": 10.0, "error_clarity": 6.0, "doc_quality": 6.0, "auth_simplicity": 10.0, "latency": 8.0, "consistency": 10.0}}, "Claude CLI": {"overall": 72, "dimensions": {"token_efficiency": 8.0, "first_try_success": 6.0, "response_parseability": 10.0, "error_clarity": 6.0, "doc_quality": 1.0, "auth_simplicity": 10.0, "latency": 7.0, "consistency": 9.0}}}, "averaged": true}

Agent Readiness

x402 Payments
Not supported
Streaming
No
Sandbox
None
Agent Auth
api_key, public
SDKs
None listed
MCP Support
No

Want the full interactive view?

See operational metrics, LLM evaluations, agent readiness, and more.

Open in Dashboard