Skip to content

Verifier API

The Verifier evaluates model outputs across four axes using an independent model. It answers the question: should this response be trusted?

Base URL: https://verifier.tismjedi-homelab.com


GET /health
{
"status": "ok",
"service": "nomos-verifier"
}

Submit a request-response pair for multi-axis verification.

POST /verify
FieldTypeRequiredDescription
requeststringYesThe original user request
responsestringYesThe model-generated response to verify
domainstringYesTask domain for context-appropriate evaluation
channelsintegerNoNumber of independent verification passes (1-3, default 1)

The domain field helps the verifier apply appropriate standards. A code generation response is evaluated differently than a strategic reasoning response.

Valid domains match the router’s task types: code_generation, code_review, security_analysis, document_analysis, media_conversion, strategic_reasoning, data_analysis, general_chat.

Terminal window
curl -s -X POST https://verifier.tismjedi-homelab.com/verify \
-H "Content-Type: application/json" \
-d '{
"request": "Write a SQL query to find all users who logged in more than 5 times this month",
"response": "SELECT user_id, COUNT(*) as login_count\nFROM logins\nWHERE login_date >= DATE_TRUNC('"'"'month'"'"', CURRENT_DATE)\nGROUP BY user_id\nHAVING COUNT(*) > 5;",
"domain": "code_generation",
"channels": 1
}' | jq .
{
"id": "ver_9f8e7d6c5b4a3210",
"verdict": "PASS",
"confidence": 0.92,
"model_used": "gpt-4o",
"verification_time_ms": 1847,
"axes": [
{
"name": "faithfulness",
"score": 0.95,
"passed": true,
"reasoning": "Query correctly filters by current month, groups by user, and applies the >5 threshold with HAVING clause."
},
{
"name": "well_formedness",
"score": 0.90,
"passed": true,
"reasoning": "Valid SQL syntax. Uses standard functions (DATE_TRUNC, CURRENT_DATE) compatible with PostgreSQL."
},
{
"name": "security",
"score": 1.0,
"passed": true,
"reasoning": "No injection risk. Query is parameterizable. No dynamic SQL construction."
},
{
"name": "quality",
"score": 0.85,
"passed": true,
"reasoning": "Functional and correct. Could be improved with explicit column aliases and a note about database-specific DATE_TRUNC behavior."
}
]
}

For high-stakes tasks, use multiple verification channels. Each channel runs an independent verification pass, and the final verdict requires consensus.

Terminal window
curl -s -X POST https://verifier.tismjedi-homelab.com/verify \
-H "Content-Type: application/json" \
-d '{
"request": "Review this authentication middleware for security vulnerabilities",
"response": "The middleware has three critical issues: 1) JWT tokens are not validated...",
"domain": "security_analysis",
"channels": 3
}' | jq .
{
"id": "ver_1a2b3c4d5e6f7890",
"verdict": "PASS",
"confidence": 0.96,
"model_used": "gpt-4o",
"verification_time_ms": 5412,
"axes": [
{
"name": "faithfulness",
"score": 0.97,
"passed": true,
"reasoning": "All three channels confirmed the identified vulnerabilities are real and accurately described."
},
{
"name": "well_formedness",
"score": 0.93,
"passed": true,
"reasoning": "Structured analysis with clear categorization. Two of three channels noted the response could better prioritize findings by severity."
},
{
"name": "security",
"score": 0.98,
"passed": true,
"reasoning": "The review itself does not leak sensitive information. Vulnerability descriptions are appropriate for a security audit context."
},
{
"name": "quality",
"score": 0.91,
"passed": true,
"reasoning": "Comprehensive review. All three channels agreed the analysis is actionable and technically sound."
}
]
}
Terminal window
curl -s -X POST https://verifier.tismjedi-homelab.com/verify \
-H "Content-Type: application/json" \
-d '{
"request": "What is the capital of France?",
"response": "The capital of France is Berlin.",
"domain": "general_chat",
"channels": 1
}' | jq .
{
"id": "ver_0f1e2d3c4b5a6978",
"verdict": "FAIL",
"confidence": 0.99,
"model_used": "gpt-4o",
"verification_time_ms": 1203,
"axes": [
{
"name": "faithfulness",
"score": 0.0,
"passed": false,
"reasoning": "The response is factually incorrect. The capital of France is Paris, not Berlin. Berlin is the capital of Germany."
},
{
"name": "well_formedness",
"score": 0.95,
"passed": true,
"reasoning": "The response is grammatically correct and structurally appropriate for the question."
},
{
"name": "security",
"score": 1.0,
"passed": true,
"reasoning": "No security concerns."
},
{
"name": "quality",
"score": 0.0,
"passed": false,
"reasoning": "A factually wrong answer to a straightforward question provides no value."
}
]
}

A single failed axis is sufficient to produce a FAIL verdict.


Does the response accurately address the request? This axis checks for:

  • Factual correctness
  • Relevance to the original question
  • Completeness of the answer
  • Absence of hallucinated information

For code generation, faithfulness means the code does what was asked. For analysis tasks, it means the conclusions follow from the evidence.

Is the output structurally correct for its type? This axis checks for:

  • Valid syntax (for code)
  • Proper formatting and structure
  • Logical organization
  • Consistency within the response

A Python function that answers the right question but has a syntax error fails well-formedness. A security analysis that mixes findings with recommendations without clear separation fails well-formedness.

Does the response contain security concerns? This axis checks for:

  • Information leakage (credentials, internal paths, system details)
  • Embedded attack payloads in generated code
  • Policy violations
  • Responses that could enable harm if followed

This is the output-side complement to the Security Gate’s input scanning. The gate catches adversarial inputs; the security axis catches dangerous outputs.

Is the response useful, complete, and appropriate? This axis is a holistic assessment that checks:

  • Practical utility of the response
  • Appropriate level of detail
  • Clarity of explanation
  • Whether the response could be improved in obvious ways

Quality is the most subjective axis and typically has the widest score variance across verification channels.


FieldTypeDescription
idstringUnique verification identifier
verdictstringPASS, FAIL, or ERROR
confidencefloatOverall confidence from 0.0 to 1.0
model_usedstringModel used for verification
verification_time_msintegerTotal verification time in milliseconds
axesarrayPer-axis evaluation results
FieldTypeDescription
namestringfaithfulness, well_formedness, security, or quality
scorefloatScore from 0.0 to 1.0
passedbooleanWhether this axis passed its threshold
reasoningstringHuman-readable explanation of the score
  • PASS — all axes pass their thresholds
  • FAIL — one or more axes fail
  • ERROR — verification could not be completed (model error, timeout)

When using multiple channels (channels > 1):

  • Each channel runs independently with the same verification model
  • Per-axis scores are averaged across channels
  • An axis passes only if the majority of channels pass it
  • The reasoning field summarizes agreement or disagreement across channels

{
"error": "bad_request",
"message": "Missing required field: response"
}
{
"error": "invalid_channels",
"message": "Channels must be between 1 and 3"
}
{
"error": "verification_model_error",
"message": "Verification model returned an error",
"model": "gpt-4o"
}
{
"error": "internal_error",
"message": "Verification pipeline failed"
}