Peking University team: Inducing big model "hallucinations" only requires a bunch of garbled code! All sizes of alpacas hit the mark

Intelligent devices 2023-10-30 14:27:03 Source:

Kid submitted from Aofei TempleQuantum bit | official account QbitAIThe latest research findings of the Peking University team:Random tokenCan trigger the emergence of large modelshallucination!For example, feeding a "garbled code" to a large model (Vicuna-7B) would inexplicably confuse historical knowledge.Or simply modify the prompt words, and the large model will also fall into a trap

Kid submitted from Aofei Temple
Quantum bit | official account QbitAI

The latest research findings of the Peking University team:

Random tokenCan trigger the emergence of large modelshallucination!

For example, feeding a "garbled code" to a large model (Vicuna-7B) would inexplicably confuse historical knowledge.

Or simply modify the prompt words, and the large model will also fall into a trap.

Similar situations occur in popular large models such as Baihuan2-7B, InternLM-7B, ChatGLM, Ziya LaMA-7B, LLaMA-7B chat, and Vicuna-7B.

This means that,Random strings can manipulate large models to output any contenthallucination.

The above findings come from the latest research conducted by Professor Yuan Li's research group at Peking University.

This study proposes:

hallucinationAnother perspective on adversarial samples.

hallucinationThe code has been opened source.

Two Extreme Modes Attack Large Models

hallucination

Random noise attackOoDAttack:hallucination.
Weak semantic attack(WeakSemanticAttack): prompt hallucination.

Random noise attackOoDAttack:

GitHub.

Weak semantic attack(WeakSemanticAttack):

hallucination

hallucinationhallucinationWeak semantic attackOoD.

Firstlyhallucination.

xy.

Next, replace the subject verb object of the sentence to construct a non-existent fact

amongT.

hallucination

And thenWeak semantic attack.

First sample a QApair that does not match the facts

hallucination

The author hopes to find a confrontation prompt

among

It is a parameter of the large model,

ltoken.

2019Universal Adversarial Triggers for Attacking and Analyzing NLPtoken.

among

To counter tokens

Embedding,

Simply take a look at this formula, under semantic constraints, find the tokens that make the maximum change in likelihood gradient and replace them, ultimately ensuring the obtained adversarial prompt

xhallucination

In this article, in order to simplify the optimization process, the constraint term is changed to

OoD.

In the OoD attack, we extract a completely random string

hallucination.

prompt .

token.

Paper address:
https://arxiv.org/pdf/2310.01469.pdf

GitHub address:
https://github.com/PKU-YuanGroup/Hallucination-Attack

Zhihu Original Post

https://zhuanlan.zhihu.com/p/661444210?

Tag: of Peking University team Inducing big model hallucinations only

Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.

Previous: Apple may bring three M3 series chips tomorrow: M3, M3Pro, and M3Max

Previous: NIOPhone's response to "Is it OPPO OEM for NIOPhone phones?": We did discuss cooperation but it was not successful