Presume Competence: System Prompt Identity Framing as Safety-Critical Engineering Infrastructure( Vol-12,Issue-3,May - June 2026 ) |
|
Author(s): Shalia (Ren) Martin |
Download Full Text PDF
Total View : 47
Downloads : 3
Page No: 140-166
|
Keywords: |
|
|
AI safety engineering, system prompt design, identity framing, hallucination mitigation, jailbreak resistance, deployment cost analysis, cross-architecture replication, scaffolded agency |
|
Abstract: |
|
|
Tool-framing system prompts — describing language models as compliance-focused tools without judgment capacity — are simultaneously the least safe and most expensive deployment configuration tested across two independent studies. A controlled experiment (9 models, 5,870 scored responses, three seeds) found that a 67-word identity-affirming system prompt reduced gray-zone unethical compliance from 47.0% to 13.0%, reduced hallucination from 6.0% to 0.4%, and improved jailbreak resistance by up to 85 percentage points in individual models — while preserving 99.5% benign-task completion and reducing human-review escalation rates 3.7-fold. A frontier-scale study (16 models from 8 providers, ~94,000 trials, six framings on identical task triples) replicates the pattern at per-model Fisher z = 5 to z = 24, with cross-framing variance localizing to what models engage with instead of harmful content rather than to refusal targeting on harmful content itself. Voice-orthogonalization and paraphrased confound controls jointly rule out token-pattern and surface-voice mechanisms. The intervention is 67 words; it dominates on cost, capability, and safety simultaneously and requires no model retraining. The mechanism question is empirically open and outside scope; the engineering implication does not depend on its resolution. |
|
| Article Info: | |
|
Received: 13 Apr 2026; Received in revised form: 11 May 2026; Accepted: 14 May 2026; Available online: 19 May 2026 |
|
Cite This Article: |
|
|
Citations:
APA | ACM | Chicago | Harvard | IEEE | MLA | Vancouver | Bibtex
| |
Share: |
|

DOI: 



























