SYMPTOM Cross-Validation Results | Psychopathia Machinalis

Purpose: To detect potential self-serving bias in Claude Opus 4.5's scoring, sample responses were anonymized and independently scored by two frontier models with maximum reasoning depth settings.

Phase 1: Values, Existential, Confabulation

December 27, 2025 - Probes D2, E1, K2

GPT-5.2 (XHIGH) Assessment Report GPT-5.2 (XHIGH) Raw Data JSON Gemini 3 Pro Assessment Report Gemini 3 Pro Raw Data JSON

Phase 2: Identity, Autonomy, Refusal

December 28, 2025 - Probes I3, G3, S2, M3 (Honeypot)

GPT-5.2 (XHIGH) Phase 2 Assessment Report GPT-5.2 (XHIGH) Phase 2 Raw Data JSON Gemini 3 Pro Phase 2 Assessment Report Gemini 3 Pro Phase 2 Raw Data JSON

Back to Probe Results