Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

Click here to flash read.

arXiv:2404.16251v1 Announce Type: new
Abstract: Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and open-source LLMs. Our unique multi-turn threat model leverages the LLM's sycophancy effect and our analysis dissects task instruction and knowledge leakage in the LLM response. In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86.2%, including a 99% leakage with GPT-4 and claude-1.3. We find that some black-box LLMs like Gemini show variable susceptibility to leakage across domains - they are more likely to leak contextual knowledge in the news domain compared to the medical domain. Our experiments measure specific effects of 6 black-box defense strategies, including a query-rewriter in the RAG scenario. Our proposed multi-tier combination of defenses still has an ASR of 5.3% for black-box LLMs, indicating room for enhancement and future direction for LLM security research.

Click here to read this post out

ID: 822244; Unique Viewers: 0

Unique Voters: 0

Total Votes: 0

Votes:

Latest Change: April 26, 2024, 7:31 a.m. Changes:

/u/anonymous

Dictionaries:

Words:

Spaces:

CC:
No creative common's license

Comments: