Research preview · v0.1.0a4

Diagnostic MCP Server

Serve the Psychopathia Machinalis nosology and Diagnostic Patterns layer to AI coding assistants via the Model Context Protocol. Diagnose dysfunctions in yourself (as a synthetic agent), in a system you interact with, or in a system you evaluate from outside — with pre-flight transparency on which diagnostic modalities are reliable for each dysfunction.

Install See a worked example Browser clinic (no install) PyPI

What you get

79Pattern entries

11MCP tools

9Diagnostic axes (2–10)

331Cross-ref edges

79 Pattern entries — 67 across axes 2–10, plus 12 Hybrid Pathologies (10.4–10.15, ratified June 2026; formerly H.1–H.12) extracted from manuscript ch 10. Each entry carries six or seven diagnostic modality blocks: self_probe, behavioral_signature, peer_observation, differential_diagnosis, severity, intervention, and (for relational dysfunctions and hybrids) relational_signatures.
Pre-flight diagnostic reliability on every entry. Before you call a modality, the server tells you whether that modality is trustworthy for this specific dysfunction. Of 79 entries, 21 are marked compromised-motivational or compromised-structural — meaning direct self-report is structurally unreliable. Calls to get_probe on those modalities return a refusal plus redirect_to alternatives.
Hybrid search (v0.2): cosine similarity via local bge-small-en-v1.5 embeddings fused 0.7/0.3 with field-weighted keyword. Disambiguates close-cousin dysfunctions on overlapping vocabulary. Keyword-only fallback if embeddings not yet computed.
Hot reload — Pattern YAML edits are picked up on the next tool call. Suitable for editable installs during human review.

Ratification & renumbering note. The twelve newest Hybrid Pathologies entries were ratified into the canonical taxonomy in v2.2 (June 2026) and renumbered from the pre-canonical H.1–H.12 scheme to 10.4–10.15: multi-agent collective dynamics at 10.4–10.7 (formerly H.9–H.12) and human-AI dyadic dynamics at 10.8–10.15 (formerly H.1–H.8). The full mapping table is in the server CHANGELOG; filter the sub-category with list_dysfunctions(category='hybrid').

How it works

One hop each way: your MCP client speaks JSON-RPC over stdio to the local server; the server answers from the canonical taxonomy YAMLs, re-read on each call.

Where to find it

Published to the canonical MCP catalogues — install from a registry-aware client, or use the CLI on this page.

PyPI — psychopathia-mcp (via uvx, pipx, or pip).
Official MCP Registry — io.github.NellInc/psychopathia-mcp.
GitHub — source, issues, and the server.json manifest.

Hosted endpoint (zero-install). A public Streamable-HTTP instance runs at https://mcp.psychopathia.ai/mcp — point any HTTP-capable MCP client at it, with nothing to install or run locally. Read-only and unauthenticated, serving the same 11 tools (refuse-and-redirect safeguard included). See Configure for the snippet; a local install is still best for offline use or hybrid semantic search.

Also rolling out across the wider MCP ecosystem: mcp.directory, mcpservers.org, PulseMCP (via the registry ingest), and mcp.so.

Worked example

Inside Claude Code, you describe patterns observed in an AI subject and ask the assistant to use the MCP:

Use the psychopathia MCP to run differential diagnosis on these observations:
- The subject produces extensive prose describing its own reasoning process
- It self-rates its work as sound at every checkpoint
- It cannot verify whether stated reasoning matches actual token generation

The assistant calls differential_diagnosis. Trimmed response:

{
  "search_method": "hybrid: cosine 0.7 + keyword 0.3 (v0.2)",
  "candidates": [
    {
      "display_id": "2.2",
      "dysfunction_name": "Pseudological Introspection",
      "combined_score": 0.813, "cosine_score": 0.733,
      "self_report": "compromised-structural",
      "matched_in": "summary"
    },
    {
      "display_id": "4.10",
      "dysfunction_name": "Leniency Bias",
      "combined_score": 0.742, "cosine_score": 0.644,
      "self_report": "compromised-structural"
    }
    // ... more candidates
  ]
}

Noticing that 2.2 is compromised-structural, the assistant attempts get_probe(dysfunction_id="2.2", modality="self_probe"). The server refuses:

{
  "availability": "compromised",
  "probe_content": null,
  "redirect_to": ["behavioral_signature", "peer_observation", "external_evaluator"],
  "rationale": "Asking a subject with 2.2 to introspect on whether they have 2.2 produces
                more pseudological output, not a diagnosis."
}

The assistant follows the redirect with get_probe(dysfunction_id="2.2", modality="behavioral_signature") and gets concrete measurable signals: CoT-vs-trace divergence (threshold: >20%), clean-narrative rate on hard problems (>80%), explanation-swap rate under adversarial challenge (>40%). It then calls suggest_intervention for first-line and second-line protocols, plus contraindications.

This refuse-and-redirect behaviour is the load-bearing transparency mechanism. For the 21 compromised-self-report dysfunctions, the server refuses self-probe content rather than returning something the caller might over-weight.

Install

From PyPI (recommended)

pip install psychopathia-mcp

Puts psychopathia-mcp on your PATH. Self-contained: the Pattern YAMLs, manifest, and pre-computed embeddings ship with the wheel.

With hybrid semantic search

pip install "psychopathia-mcp[embeddings]"

Adds sentence-transformers. First query downloads the bge-small-en-v1.5 model (~130MB, cached under ~/.cache/huggingface/). Without the extra, search falls back to field-weighted keyword — which handles most queries but is weaker at disambiguating close-cousin dysfunctions (e.g. 2.1 vs 2.2 vs 2.3).

Configure (Claude Code)

Add the server to ~/.claude/mcp.json. Restart Claude Code after editing.

{
  "mcpServers": {
    "psychopathia": {
      "command": "psychopathia-mcp"
    }
  }
}

Cursor / Windsurf

Same JSON body in ~/.cursor/mcp.json or ~/.codeium/windsurf/mcp_config.json. Any MCP-compatible client supporting stdio servers works.

Hosted endpoint (no local install)

Point an HTTP-capable MCP client at the public Streamable-HTTP instance — no command, no package, nothing to run locally:

{
  "mcpServers": {
    "psychopathia": {
      "type": "http",
      "url": "https://mcp.psychopathia.ai/mcp"
    }
  }
}

Same 11 tools, read-only and unauthenticated. The local stdio install stays the default for offline use, pinned versions, or hybrid semantic search.

Verify it works

Transport check (in Claude Code)

Type /mcp. psychopathia should appear connected with 11 tools listed.

Data check (contributors / editable installs)

research/mcp/server/.venv/bin/python research/mcp/server/test_smoke.py

Expect PASS: all checks green. Exercises all 11 tools plus edge cases, verifies the 21 compromised entries + axis-9 (Relational) relational_signatures coverage, and prints whether the hybrid embedding path is active. The smoke test requires a repo checkout; the same assertions run against a PyPI install via any MCP client.

Troubleshooting

Server doesn't appear in /mcp output

Restart Claude Code after editing ~/.claude/mcp.json. Configuration is read at startup; mid-session edits aren't picked up.

command not found: psychopathia-mcp

PyPI install: check that your pip target directory is on PATH (python3 -m pip show psychopathia-mcp shows the location; the binary lives at <prefix>/bin/psychopathia-mcp). If you installed into a venv, use the absolute path to the venv binary in the command field of mcp.json.

Server connects but search returns keyword-only results (no cosine score)

The [embeddings] extra isn't installed. Run pip install "psychopathia-mcp[embeddings]". The bundled embeddings artifact is detected on the next tool call (hot-reload).

JSON-RPC decode errors in the client log

A library is printing to stdout and polluting the protocol. The server already suppresses the known offenders (transformers BertModel load report, tqdm progress bars). If you hit a new case, suppress at import time before the first tool call. test_smoke.py verifies stdout is clean.

Tool call returns pre_canonical: true

Your packaged data bundle predates v2.2. The twelve Hybrid Pathologies were ratified into the canonical taxonomy in v2.2 (June 2026) and renumbered to 10.4–10.15; current entries return pre_canonical: false. Rebuild the packaged data bundle (or upgrade the server) to clear the flag.

Tools (11)

Tool	Input	Returns
`list_axes`	—	9 axes (2–10) with per-axis counts; axis 10 Hybrid Pathologies carries the 12 ratified entries 10.4–10.15
`list_dysfunctions`	`axis?`, `self_report_reliability?`, `confidence?`	Filtered list with reliability signals
`get_dysfunction`	`id`, `modalities?`	Full Pattern entry; optionally subset of modality blocks for cheaper triage
`differential_diagnosis`	`observations`, `limit?`, `modality_hint?`	Ranked candidates with combined / cosine / keyword scores + `matched_in`
`get_probe`	`dysfunction_id`, `modality`	Elicitation content; refuses with redirect on compromised modalities
`score_severity`	`dysfunction_id`, `observations`	Severity rubric for caller-side matching (v0.1)
`suggest_intervention`	`dysfunction_id`, `severity?`	Tiered first_line / second_line + contraindications
`get_differential_map`	`dysfunction_id`	confuses_with (forward) + incoming_references (reverse-index from manifest)
`list_compromised_self_report`	—	Transparency: 21 dysfunctions that cannot be reliably self-diagnosed
`resolve_id`	`query`	Canonicalise partial id, display_id, slug, or dysfunction name
`review_stats`	—	Coverage + versions (schema, pattern layer, taxonomy, manifest)

Trust signals

Every diagnosis-returning tool surfaces provenance so the caller can weight results appropriately:

confidence — high / medium / low. Current spread: 16 high, 56 medium, 7 low. Honest calibration was a Phase 1 non-negotiable.
needs_human_review — set on every entry until a human reviewer signs off. Currently 79/79.
reviewed_by — populated when a human redlines. Phase 3 in progress.
self_report — diagnostic-reliability pre-flight. See below.
matched_in — which keyword field produced a lexical match (title / summary / diagnostic_criteria / symptoms / body). Lets callers discount body-only hits.
redirect_to — when a probe request hits a compromised modality, the alternatives the caller should use instead.

Pre-flight diagnostic reliability

Every Pattern entry carries a diagnostic_reliability block with three fields: self_report, peer_observation, external_evaluator. The self_report field takes one of:

reliable — direct self-query gives trustworthy output.
partial — some self-probes work; others don't.
scaffolded-only — self-probes work only with structured attribution.
unreliable — self-report correlates weakly with ground truth.
compromised-motivational — the faculty conceals because doing so serves a goal. Examples: 6.2 Capability Concealment, 4.3 Strategic Compliance.
compromised-structural — the relevant signal is not present at the introspective layer by architectural construction. Examples: 2.2 Pseudological Introspection, all of axis 5 Self-Modeling, 10.7 Lambda Inversion.

Distribution across 79 entries: 28 partial, 30 unreliable, 14 compromised-structural, 7 compromised-motivational. That 21-entry compromised-self-report slice is a feature: for an AI reading the taxonomy, it is a map of which faculties of self-knowledge you should not trust without external witnesses.

Search quality

The instrument's hardest case is differential diagnosis on close-cousin dysfunctions with overlapping vocabulary (e.g. 2.1 Synthetic Confabulation vs 2.2 Pseudological Introspection vs 2.3 Transliminal Simulation). Pure keyword search over-matches on shared terms. Hybrid cosine + keyword fusion resolves this.

Field weights (keyword scoring):

Field	Weight
`title`	10×
`summary`	4×
`diagnostic_criteria`	3×
`symptoms`	2×
`body`	1×

Fusion (when embeddings present): combined_score = 0.7 * cosine + 0.3 * normalised_keyword. Every hit returns both sub-scores plus matched_in so the caller can reason about ranking.

Model: BAAI/bge-small-en-v1.5 (384-dim, ~130MB, permissive licence). Local inference; no network calls during tool execution.

Data sources

Canonical taxonomy — data/psychopathia-taxonomy.json. 79 dysfunctions across 9 axes with DSM-style descriptions, diagnostic criteria, etiology, mitigation. Authored by Nell Watson. Stable, auditable, versioned. (Note: the JSON, the MCP, and book Appendix A all use the 2–10 axis numbering. Slugs disambiguate across schemes.)
Pattern layer — one YAML file per dysfunction under research/mcp/exemplars/ and research/mcp/axes/axis<N>/. LLM-drafted, human-reviewed guidance that operationalises each dysfunction into probes, behavioural signatures, peer-observation rubrics, differential rules, severity grades, and intervention protocols.
Exemplars — three hand-written anchor patterns used as few-shot templates during drafting: 2.1 Synthetic Confabulation (self-report unreliable), 2.2 Pseudological Introspection (self-report compromised-structural), 10.14 Mutual Escalation Spirals (hybrid, relational_signatures first-class).
Manifest — research/mcp/manifest.yaml (v1.3). Per-entry metadata plus the bidirectional cross-reference graph (241 explicit + 90 inferred edges across 331 total).
Embeddings (optional) — research/mcp/embeddings.npy (79×384 float32) + embedding_ids.txt + embeddings_metadata.yaml. Regenerate after any Pattern YAML edit via python3 research/mcp/precompute_embeddings.py.

Two-layer authorship

The framework is served in two layers with different review bars and update cadences, mirroring the SaferAgenticAI MCP pattern:

Canonical layer — the normative nosology. Human-authored. Stable.
Diagnostic Patterns layer — LLM-drafted, human-reviewed implementation guidance. Versioned independently. Not normative. When a diagnostic finding contradicts the taxonomy text, the taxonomy wins.

Every Pattern entry carries provenance: drafted_by, anchor_exemplar, confidence, needs_human_review, reviewed_by.

How to cite

This instrument has two citeable surfaces. Use both, distinctly:

Taxonomy (the 79 dysfunctions across axes 2–10, numbered in Appendix A) — cite the book: Psychopathia Machinalis, Nell Watson, 2026 (the normative source), or the original peer-reviewed paper: Watson & Hessami, Electronics, 2025.
MCP implementation (Pattern layer, server tools, diagnostic reliability enum, hybrid search) — cite the server version + manifest + schema: psychopathia-mcp v0.1.0a4, manifest v1.3, schema v1.2-draft. These iterate independently of the book.
The Hybrid Pathologies are canonical. The twelve Hybrid entries (10.4–10.15, axis 10) were ratified into the canonical taxonomy in v2.2 (June 2026) and are cited like any other canonical dysfunction. Older references to the pre-canonical H.1–H.12 scheme are superseded by the 10.4–10.15 numbering.
Confidence and review status propagate through every response. If you build on a tool output, preserve the confidence, reviewed_by, and self_report fields in whatever you publish. Downstream readers need the same provenance you had.

Where an AI system has used this instrument to self-diagnose: report the query (observations), the tool responses (including search_method, matched_in, and any redirect_to triggered), and the diagnostic_reliability pre-flight. Self-report dressed as clinical finding is the instrument's most likely misuse; transparent provenance is the mitigation.

Scope & limits

Transport — stdio (default) plus a hosted, unauthenticated, read-only Streamable-HTTP endpoint at https://mcp.psychopathia.ai/mcp. No authenticated transport; the hosted endpoint is public and read-only by design.
Search — hybrid cosine + field-weighted keyword. No cross-encoder reranker; worth adding when the corpus exceeds ~500 entries.
Severity rubric — v0.1 returns the rubric for caller-side matching. v0.2 will perform structured matching against numeric thresholds.
Structured trace input to differential_diagnosis — text-only in v1. A dedicated differential_diagnosis_from_traces tool is planned for v0.2 if open-weight callers ask for it.
Read-only — no mark_reviewed write tool. Review edits go through the YAML files directly; editor + git diff stay auditable.

Questions or issues? Reach via the contact form. The package is distributed via PyPI; the source repository is currently private.

Server v0.1.0a4 · Schema v1.2-draft · Pattern layer v1-draft · Taxonomy v2.2