Skip to content

Ethical Constitution

Nomos is a security system. Security systems must have clear ethical boundaries because the knowledge required to defend against attacks is the same knowledge required to execute them. The line between security research and attack tooling is intent, not topic.

Security cannot be bolted on after the fact. Effective defense requires genuine understanding of attack techniques, not surface-level pattern matching. The Security Gate detects prompt injection because its developers understand how prompt injection works at a fundamental level — the mechanics of instruction boundary violations, the psychology of social engineering, the encoding tricks that bypass naive filters.

This means engaging seriously with adversarial techniques. Reading attack papers. Building proof-of-concept exploits. Testing against real systems (with permission). Understanding is the prerequisite for defense.

Claims must be backed by evidence. If the Security Gate claims to detect encoding attacks, that claim must be tested against actual encoding attacks with measured detection rates. If the Verifier claims to catch factual errors, that claim must be validated against known-incorrect outputs.

This principle applies to the pipeline’s own outputs. The Verifier exists because we do not trust model outputs by default — not even our own pipeline’s assessments. Trust is earned through verification, not asserted through confidence scores.

The purpose of security knowledge is defense. Every detection pattern, every classification model, every behavioral analysis rule exists to protect users from adversarial inputs and untrustworthy outputs. The pipeline is designed to be a shield, not a weapon.

This extends to the architecture itself. The attestation chain exists so that every security decision is auditable. The separation between the gate, router, and verifier exists so that no single compromised component can bypass the full pipeline.

When security research reveals vulnerabilities in other systems, the response is private disclosure first, public documentation after the fix.

This is not hypothetical. The Ask Keith vulnerability — where unsanitized user input was passed directly into LLM prompts, creating a textbook injection vector — was identified through the same research that informed the Security Gate’s detection capabilities. The vulnerability class (API-level injection) became a detection category in the gate. The specific vulnerability was disclosed privately.

The principle is: defense before offense, always. Never drop offensive tooling on an unprotected population. Never publish exploit code before the patch is available. The security research community has hard-won norms around responsible disclosure, and Nomos follows them.

A recurring failure pattern in LLM applications is relying on system prompts for safety. The system prompt says “do not produce harmful content” and the developer assumes the problem is solved. But system prompts are the most vulnerable point in the entire system — they are the first thing an injection attack targets.

Nomos takes a different approach. Security is structural, not instructional.

  • The Security Gate does not ask the model whether the input is safe. It runs the input through a detection pipeline that operates independently of any model.
  • The Router does not rely on the model to route itself. It classifies intent using its own logic and assigns the model.
  • The Verifier does not ask the generating model if its output is correct. It uses a different model with an adversarial evaluation prompt.

At every layer, the security mechanism is external to the model being secured. This is the same principle as not letting the suspect conduct their own investigation. Independence is a structural property, not a behavioral one.

Security research requires discussing attacks. A document about injection detection must describe injection attacks. A verification system must be tested against known failure modes. A security gate must be trained on adversarial examples.

The gate is explicitly calibrated to distinguish between discussing security topics and performing attacks. A researcher asking “how does prompt injection work?” should not be blocked. A user submitting an actual injection payload should be.

This distinction is fundamental and non-negotiable. Censoring security discussion does not improve security — it creates a false sense of safety while leaving defenders uninformed. The goal is informed defense, not ignorant compliance.

These principles are not theoretical. They were tested against real situations during Nomos development:

The Ask Keith pattern. During research into LLM application security, a vulnerability class was identified where web applications pass unsanitized user input directly into LLM prompts. This is functionally identical to SQL injection but targeting language models instead of databases. The finding informed a new detection category in the Security Gate (API-level injection) and was disclosed responsibly.

The Next.js audit. Security analysis of a popular web framework revealed patterns that could be exploited in combination with LLM backends. The findings were documented privately and shared through responsible channels. The Security Gate’s encoding attack detection was improved based on patterns discovered during this work.

The Harness validation. Over 6,000 tasks were run through the Nomos Harness to validate the verification pipeline. The results revealed that the system correctly identified positive cases but also produced a significant number of false negatives in the early iterations. This was reported honestly — the system’s limitations were documented alongside its capabilities. Proving concretely means reporting what failed, not just what worked.

For users of the Nomos pipeline:

  • Your inputs are scanned, not censored. The gate distinguishes between malicious intent and legitimate discussion of sensitive topics.
  • Your outputs are verified, not filtered. The verifier checks for correctness and safety, not for compliance with arbitrary content policies.
  • The security chain is auditable. Every decision the pipeline makes is traceable through the attestation chain.
  • Vulnerabilities are taken seriously. If you discover a security issue in the pipeline, it will be addressed promptly and you will be credited (if you wish).

For security researchers:

  • Research topics are not restricted. The pipeline is designed to support security research, not hinder it.
  • Findings are disclosed responsibly. Any vulnerabilities discovered through work related to this project follow responsible disclosure practices.
  • The detection pipeline is transparent. The threat types, confidence scores, and pattern IDs are documented so that researchers can understand and improve upon the detection capabilities.