Synthetic Fine-Tuning as a Defense Mechanism in Large Language Model PII Attacks
Published in NeurIPS 2024 LLM Privacy Competition, 2024
This paper presents a defense mechanism against extraction attacks on large language models (LLMs), utilizing synthetic data fine-tuning to reduce the likelihood of unauthorized private information retrieval. The research evaluates the attack success rate (ASR), predictive probability, and overall model utility to balance privacy protection and model effectiveness. Key findings highlight the viability of dynamic fine-tuning as a robust approach to mitigating security vulnerabilities in LLMs.
Recommended citation: Sanjana Nambiar, Chinmay Hegde, Niv Cohen. (2024). "Synthetic Fine-Tuning as a Defense Mechanism in Large Language Model PII Attacks." NeurIPS 2024 LLM Privacy Competition.
Download Paper
