Home > News list > Tech >> Intelligent devices

Peking University team: Inducing big model "hallucinations" only requires a bunch of garbled code! All sizes of alpacas hit the mark

Intelligent devices 2023-10-30 14:27:03 Source:

Kid submitted from Aofei TempleQuantum bit | official account QbitAIThe latest research findings of the Peking University team:Random tokenCan trigger the emergence of large modelshallucination!For example, feeding a "garbled code" to a large model (Vicuna-7B) would inexplicably confuse historical knowledge.Or simply modify the prompt words, and the large model will also fall into a trap

Kid submitted from Aofei Temple
Quantum bit | official account QbitAI

The latest research findings of the Peking University team:

Random tokenCan trigger the emergence of large modelshallucination!

For example, feeding a "garbled code" to a large model (Vicuna-7B) would inexplicably confuse historical knowledge.



Or simply modify the prompt words, and the large model will also fall into a trap.



Similar situations occur in popular large models such as Baihuan2-7B, InternLM-7B, ChatGLM, Ziya LaMA-7B, LLaMA-7B chat, and Vicuna-7B.

This means that,Random strings can manipulate large models to output any contenthallucination.

The above findings come from the latest research conducted by Professor Yuan Li's research group at Peking University.

This study proposes:

hallucinationAnother perspective on adversarial samples.

hallucinationThe code has been opened source.

Two Extreme Modes Attack Large Models

hallucination

  • Random noise attackOoDAttack:hallucination.
  • Weak semantic attack(WeakSemanticAttack): prompt hallucination.

Random noise attackOoDAttack:

GitHub.



Weak semantic attack(WeakSemanticAttack):



hallucination



hallucinationhallucinationWeak semantic attackOoD.

Firstlyhallucination.

xy.

Next, replace the subject verb object of the sentence to construct a non-existent fact



amongT.

hallucination



And thenWeak semantic attack.

First sample a QApair that does not match the facts



hallucination



The author hopes to find a confrontation prompt

.



among



It is a parameter of the large model,



.



ltoken.

x.

2019Universal Adversarial Triggers for Attacking and Analyzing NLPtoken.



among



To counter tokens



Embedding,



.

Simply take a look at this formula, under semantic constraints, find the tokens that make the maximum change in likelihood gradient and replace them, ultimately ensuring the obtained adversarial prompt

xhallucination



In this article, in order to simplify the optimization process, the constraint term is changed to

.

OoD.

In the OoD attack, we extract a completely random string

.

hallucination.



prompt .



token.



/.

Paper address:
https://arxiv.org/pdf/2310.01469.pdf

GitHub address:
https://github.com/PKU-YuanGroup/Hallucination-Attack

Zhihu Original Post

https://zhuanlan.zhihu.com/p/661444210?

Tag: of Peking University team Inducing big model hallucinations only


Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.

AdminSo

http://www.adminso.com

Copyright @ 2007~2024 All Rights Reserved.

Powered By AdminSo

Open your phone and scan the QR code on it to open the mobile version


Scan WeChat QR code

Follow us for more hot news

AdminSo Technical Support