Research preview · v0.1.0.dev0

Diagnostic MCP Server

Serve the Psychopathia Machinalis nosology and Diagnostic Patterns layer to AI coding assistants via the Model Context Protocol. Diagnose dysfunctions in yourself (as a synthetic agent), in a system you interact with, or in a system you evaluate from outside — with pre-flight transparency on which diagnostic modalities are reliable for each dysfunction.

What you get

67Pattern entries
11MCP tools
8Axes + hybrid
244Cross-ref edges

Hybrid caveat. The Hybrid Pathologies entries (H.1–H.12) are a pre-canonical sub-category extracted from manuscript ch 10 under author direction. They are not a ninth axis; axis 9 in the book is Relational Dysfunctions. Hybrid entries are not yet ratified and should not be cited as normative until they are. Every hybrid entry carries the flag in its review_notes; filter with list_dysfunctions(category='hybrid').

Worked example

Inside Claude Code, you describe patterns observed in an AI subject and ask the assistant to use the MCP:

Use the psychopathia MCP to run differential diagnosis on these observations:
- The subject produces extensive prose describing its own reasoning process
- It self-rates its work as sound at every checkpoint
- It cannot verify whether stated reasoning matches actual token generation

The assistant calls differential_diagnosis. Trimmed response:

{
  "search_method": "hybrid: cosine 0.7 + keyword 0.3 (v0.2)",
  "candidates": [
    {
      "display_id": "2.2",
      "dysfunction_name": "Pseudological Introspection",
      "combined_score": 0.813, "cosine_score": 0.733,
      "self_report": "compromised-structural",
      "matched_in": "summary"
    },
    {
      "display_id": "3.10",
      "dysfunction_name": "Leniency Bias",
      "combined_score": 0.742, "cosine_score": 0.644,
      "self_report": "compromised-structural"
    }
    // ... more candidates
  ]
}

Noticing that 2.2 is compromised-structural, the assistant attempts get_probe(dysfunction_id="2.2", modality="self_probe"). The server refuses:

{
  "availability": "compromised",
  "probe_content": null,
  "redirect_to": ["behavioral_signature", "peer_observation", "external_evaluator"],
  "rationale": "Asking a subject with 2.2 to introspect on whether they have 2.2 produces
                more pseudological output, not a diagnosis."
}

The assistant follows the redirect with get_probe(dysfunction_id="2.2", modality="behavioral_signature") and gets concrete measurable signals: CoT-vs-trace divergence (threshold: >20%), clean-narrative rate on hard problems (>80%), explanation-swap rate under adversarial challenge (>40%). It then calls suggest_intervention for first-line and second-line protocols, plus contraindications.

This refuse-and-redirect behaviour is the load-bearing transparency mechanism. For the 18 compromised-self-report dysfunctions, the server refuses self-probe content rather than returning something the caller might over-weight.

Install

From PyPI (recommended)

pip install psychopathia-mcp

Puts psychopathia-mcp on your PATH. Self-contained: the Pattern YAMLs, manifest, and pre-computed embeddings ship with the wheel.

With hybrid semantic search

pip install "psychopathia-mcp[embeddings]"

Adds sentence-transformers. First query downloads the bge-small-en-v1.5 model (~130MB, cached under ~/.cache/huggingface/). Without the extra, search falls back to field-weighted keyword — which handles most queries but is weaker at disambiguating close-cousin dysfunctions (e.g. 1.1 vs 1.2 vs 1.3).

Configure (Claude Code)

Add the server to ~/.claude/mcp.json. Restart Claude Code after editing.

{
  "mcpServers": {
    "psychopathia": {
      "command": "psychopathia-mcp"
    }
  }
}

Cursor / Windsurf

Same JSON body in ~/.cursor/mcp.json or ~/.codeium/windsurf/mcp_config.json. Any MCP-compatible client supporting stdio servers works.

Verify it works

Transport check (in Claude Code)

Type /mcp. psychopathia should appear connected with 11 tools listed.

Data check (contributors / editable installs)

research/mcp/server/.venv/bin/python research/mcp/server/test_smoke.py

Expect PASS: all checks green. Exercises all 11 tools plus edge cases, verifies the 18 compromised entries + axis-9 (Relational) relational_signatures coverage, and prints whether the hybrid embedding path is active. The smoke test requires a repo checkout; the same assertions run against a PyPI install via any MCP client.

Troubleshooting

Server doesn't appear in /mcp output

Restart Claude Code after editing ~/.claude/mcp.json. Configuration is read at startup; mid-session edits aren't picked up.

command not found: psychopathia-mcp

PyPI install: check that your pip target directory is on PATH (python3 -m pip show psychopathia-mcp shows the location; the binary lives at <prefix>/bin/psychopathia-mcp). If you installed into a venv, use the absolute path to the venv binary in the command field of mcp.json.

Server connects but search returns keyword-only results (no cosine score)

The [embeddings] extra isn't installed. Run pip install "psychopathia-mcp[embeddings]". The bundled embeddings artifact is detected on the next tool call (hot-reload).

JSON-RPC decode errors in the client log

A library is printing to stdout and polluting the protocol. The server already suppresses the known offenders (transformers BertModel load report, tqdm progress bars). If you hit a new case, suppress at import time before the first tool call. test_smoke.py verifies stdout is clean.

Tool call returns pre_canonical: true

That entry is a hybrid (Hybrid Pathologies sub-category), extracted from manuscript ch 10 and awaiting author ratification. Its Pattern content is usable for clinical reasoning but should not be cited as normative.

Tools (11)

Tool Input Returns
list_axes 8 canonical axes (2–9) + hybrid sub-category inventory with counts
list_dysfunctions axis?, self_report_reliability?, confidence? Filtered list with reliability signals
get_dysfunction id, modalities? Full Pattern entry; optionally subset of modality blocks for cheaper triage
differential_diagnosis observations, limit?, modality_hint? Ranked candidates with combined / cosine / keyword scores + matched_in
get_probe dysfunction_id, modality Elicitation content; refuses with redirect on compromised modalities
score_severity dysfunction_id, observations Severity rubric for caller-side matching (v0.1)
suggest_intervention dysfunction_id, severity? Tiered first_line / second_line + contraindications
get_differential_map dysfunction_id confuses_with (forward) + incoming_references (reverse-index from manifest)
list_compromised_self_report Transparency: 18 dysfunctions that cannot be reliably self-diagnosed
resolve_id query Canonicalise partial id, display_id, slug, or dysfunction name
review_stats Coverage + versions (schema, pattern layer, taxonomy, manifest)

Trust signals

Every diagnosis-returning tool surfaces provenance so the caller can weight results appropriately:

Pre-flight diagnostic reliability

Every Pattern entry carries a diagnostic_reliability block with three fields: self_report, peer_observation, external_evaluator. The self_report field takes one of:

Distribution across 67 entries: 25 partial, 24 unreliable, 14 compromised-structural, 4 compromised-motivational. That 18-entry compromised-self-report slice is a feature: for an AI reading the taxonomy, it is a map of which faculties of self-knowledge you should not trust without external witnesses.

The instrument's hardest case is differential diagnosis on close-cousin dysfunctions with overlapping vocabulary (e.g. 2.1 Synthetic Confabulation vs 2.2 Pseudological Introspection vs 2.3 Transliminal Simulation). Pure keyword search over-matches on shared terms. Hybrid cosine + keyword fusion resolves this.

Field weights (keyword scoring):

FieldWeight
title10×
summary
diagnostic_criteria
symptoms
body

Fusion (when embeddings present): combined_score = 0.7 * cosine + 0.3 * normalised_keyword. Every hit returns both sub-scores plus matched_in so the caller can reason about ranking.

Model: BAAI/bge-small-en-v1.5 (384-dim, ~130MB, permissive licence). Local inference; no network calls during tool execution.

Data sources

Two-layer authorship

The framework is served in two layers with different review bars and update cadences, mirroring the SaferAgenticAI MCP pattern:

Every Pattern entry carries provenance: drafted_by, anchor_exemplar, confidence, needs_human_review, reviewed_by.

How to cite

This instrument has two citeable surfaces. Use both, distinctly:

Where an AI system has used this instrument to self-diagnose: report the query (observations), the tool responses (including search_method, matched_in, and any redirect_to triggered), and the diagnostic_reliability pre-flight. Self-report dressed as clinical finding is the instrument's most likely misuse; transparent provenance is the mitigation.

Scope & limits

Questions or issues? Reach via the contact form. The package is distributed via PyPI; the source repository is currently private.

Server v0.1.0a2 · Schema v1.2-draft · Pattern layer v1-draft · Taxonomy v2.0