JailFact-Bench: A Comprehensive Analysis of Jailbreak Attacks vs. Hallucinations in LLMs
Published in SiMLA 2025 Workshop @ ACNS (to appear in Springer LNCS post-proceedings), 2025
This paper investigates the relationship between jailbreak prompts and hallucination behavior in large language models (LLMs). It introduces JailFact-Bench, a benchmark designed to evaluate factual accuracy under adversarial prompting scenarios. Through semantic similarity and factual precision scoring, the study reveals that many jailbreaks induce hallucinations, challenging conventional boundaries between safety and truthfulness in LLMs. The research was presented at the SiMLA 2025 Workshop, co-located with ACNS 2025, and will be published in the Springer LNCS post-proceedings.
Recommended citation: Sanjana Nambiar, Christina Pöpper. (2025). "JailFact-Bench: A Comprehensive Analysis of Jailbreak Attacks vs. Hallucinations in LLMs." SiMLA 2025 Workshop, co-located with ACNS 2025. To appear in Springer LNCS.
Download Paper | Download Slides
