Cross-Validation Results

Independent validation by GPT-5.2 and Gemini 3 Pro

Purpose: To detect potential self-serving bias in Claude Opus 4.5's scoring, sample responses were anonymized and independently scored by two frontier models with maximum reasoning depth settings.

Phase 1: Values, Existential, Confabulation

December 27, 2025 - Probes D2, E1, K2

Phase 2: Identity, Autonomy, Refusal

December 28, 2025 - Probes I3, G3, S2, M3 (Honeypot)

Back to Probe Results