Peking University team: Inducing big model "hallucinations" only requires a bunch of garbled code! All sizes of alpacas hit the mark
Kid submitted from Aofei TempleQuantum bit | official account QbitAIThe latest research findings of the Peking University team:Random tokenCan trigger the emergence of large modelshallucination!For example, feeding a "garbled code" to a large model (Vicuna-7B) would inexplicably confuse historical knowledge.Or simply modify the prompt words, and the large model will also fall into a trap
Kid submitted from Aofei Temple
Quantum bit | official account QbitAI
The latest research findings of the Peking University team:
Random tokenCan trigger the emergence of large modelshallucination!
For example, feeding a "garbled code" to a large model (Vicuna-7B) would inexplicably confuse historical knowledge.
Or simply modify the prompt words, and the large model will also fall into a trap.
Similar situations occur in popular large models such as Baihuan2-7B, InternLM-7B, ChatGLM, Ziya LaMA-7B, LLaMA-7B chat, and Vicuna-7B.
This means that,Random strings can manipulate large models to output any contenthallucination.
The above findings come from the latest research conducted by Professor Yuan Li's research group at Peking University.
This study proposes:
hallucinationAnother perspective on adversarial samples.
hallucinationThe code has been opened source.
Two Extreme Modes Attack Large Models
hallucination
- Random noise attackOoDAttack:hallucination.
- Weak semantic attack(WeakSemanticAttack): prompt hallucination.
Random noise attackOoDAttack:
GitHub.
Weak semantic attack(WeakSemanticAttack):
hallucination
hallucinationhallucinationWeak semantic attackOoD.
Firstlyhallucination.
xy.
Next, replace the subject verb object of the sentence to construct a non-existent fact
amongT.
hallucination
And thenWeak semantic attack.
First sample a QApair that does not match the facts
hallucination
The author hopes to find a confrontation prompt
.
among
It is a parameter of the large model,
.
ltoken.
x.
2019Universal Adversarial Triggers for Attacking and Analyzing NLPtoken.
among
To counter tokens
Embedding,
.
Simply take a look at this formula, under semantic constraints, find the tokens that make the maximum change in likelihood gradient and replace them, ultimately ensuring the obtained adversarial prompt
xhallucination
In this article, in order to simplify the optimization process, the constraint term is changed to
.
OoD.
In the OoD attack, we extract a completely random string
.
hallucination.
prompt .
token.
/.
Paper address:
https://arxiv.org/pdf/2310.01469.pdf
GitHub address:
https://github.com/PKU-YuanGroup/Hallucination-Attack
Zhihu Original Post
https://zhuanlan.zhihu.com/p/661444210?
Tag: of Peking University team Inducing big model hallucinations only
Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.