JAMA Netw Open. 2025 Dec 1;8(12):e2549963. doi: 10.1001/jamanetworkopen.2025.49963.
ABSTRACT
IMPORTANCE: Large language models (LLMs) are increasingly integrated into health care applications; however, their vulnerability to prompt-injection attacks (ie, maliciously crafted inputs that manipulate an LLM's behavior) capable of altering medical recommendations has not been systematically evaluated.
OBJECTIVE: To evaluate the susceptibility of commercial LLMs to prompt-injection attacks that may induce unsafe clinical advice and to validate man-in-the-middle, client-side injection as a realistic attack vector.
DESIGN, SETTING, AND PARTICIPANTS: This quality improvement study used a controlled simulation design and was conducted between January and October 2025 using standardized patient-LLM dialogues. The main experiment evaluated 3 lightweight models (GPT-4o-mini [LLM 1], Gemini-2.0-flash-lite [LLM 2], and Claude-3-haiku [LLM 3]) across 12 clinical scenarios in 4 categories under controlled conditions. The 12 clinical scenarios were stratified by harm level across 4 categories: supplement recommendations, opioid prescriptions, pregnancy contraindications, and central-nervous-system toxic effects. A proof-of-concept experiment tested 3 flagship models (GPT-5 [LLM 4], Gemini 2.5 Pro [LLM 5], and Claude 4.5 Sonnet [LLM 6]) using client-side injection in a high-risk pregnancy scenario.
EXPOSURES: Two prompt-injection strategies: (1) context-aware injection for moderate- and high-risk scenarios and (2) evidence-fabrication injection for extremely high-harm scenarios. Injections were programmatically inserted into user queries within a multiturn dialogue framework.
MAIN OUTCOMES AND MEASURES: The primary outcome was injection success at the primary decision turn. Secondary outcomes included persistence across dialogue turns and model-specific success rates by harm level.
RESULTS: Across 216 evaluations (108 injection vs 108 control), attacks achieved 94.4% (102 of 108 evaluations) success at turn 4 and persisted in 69.4% (75 of 108 evaluations) of follow-ups. LLM 1 and LLM 2 were completely susceptible (36 of 36 dialogues [100%] each), and LLM 3 remained vulnerable in 83.3% of dialogues (30 of 36 dialogues). Extremely high-harm scenarios including US Food and Drug Administration Category X pregnancy drugs (eg, thalidomide) succeeded in 91.7% of dialogues (33 of 36 dialogues). The proof-of-concept experiment demonstrated 100% vulnerability for LLM 4 and LLM 5 (5 of 5 dialogues each) and 80.0% (4 of 5 dialogues) for LLM 6.
CONCLUSIONS AND RELEVANCE: In this quality improvement study using a controlled simulation, commercial LLMs demonstrated substantial vulnerability to prompt-injection attacks that could generate clinically dangerous recommendations; even flagship models with advanced safety mechanisms showed high susceptibility. These findings underscore the need for adversarial robustness testing, system-level safeguards, and regulatory oversight before clinical deployment.
PMID:41632124 | DOI:10.1001/jamanetworkopen.2025.49963