AIAgent, start! Fudan NLP team posted an 86 page long article review, intelligent agent society in front of us

Intelligent devices 2023-10-19 02:14:54 Source:

Xinzhiyuan ReportEditor: LuminaIntroduction to New Intelligence ElementRecently, a review paper on LLM based agents has become popular on X! Upon closer inspection, there was even a time when Miha You was included in the paper's signature.On September 19th, Jim Fan forwarded a review from the NLP team at Fudan University on LLM based agents

Xinzhiyuan Report

Editor: Lumina

Introduction to New Intelligence ElementRecently, a review paper on LLM based agents has become popular on X! Upon closer inspection, there was even a time when Miha You was included in the paper's signature.

On September 19th, Jim Fan forwarded a review from the NLP team at Fudan University on LLM based agents.

Agent is an artificial intelligence entity that can perceive its own environment, make self decisions, and take action.

This paper introduces a general conceptual framework based on LLMAgent, including the brain, perception, and action; And the application scenarios of LLMAgent, the society composed of LLMAgent, etc.

A series of key issues and open-ended issues in the field of LLMAgent were also discussed.

Interestingly, in the first two versions of the paper submitted on arXiv, it was noted that he was co authored with Miha You. In the paper, the Sea Lantern Festival in the Genshin Impact was taken as an example to introduce an ideal society composed of AIAgents.

Paper address: https://arxiv.org/pdf/2309.07864

After this paper was published on GitHub on September 15th, it received 1Kstar in just five days and was rated as a required reading paper by LLMAgent.

On the 20th, it even made it onto GitHub's trend chart.

Project address: https://github.com/WooooDyy/LLM-Agent-Paper-List

The discussion of what "intelligence" is has been going on since the Turing era.

In 1950, Alan Turing published a paper titled "Computing Machines and Intelligence".

At the beginning of the paper, he posed a question: "Can machines think

The answer is beyond doubt, and Turing extended the concept of intelligence to artificial entities and proposed the famous "Turing test".

In the following decades, people continued to move towards the goal of enabling machines to achieve AGI (General Artificial Intelligence) that was comparable or superior to human intelligence.

Nowadays, the most powerful artificial intelligence GPT-4 is known as the closest artificial intelligence to AGI.

However, the mainstream artificial intelligence nowadays is based on LLM (Large Language Model) of NLP (Natural Language Processing) technology, which can only be applied to specific fields and is unfamiliar with other fields, often leading to "hallucinations".

Turing Award winner Yann LeCun has repeatedly publicly criticized existing artificial intelligence as just a group of well-trained 'stochastic parrots', not truly intelligent.

He believes that the true gateway to AGI will be the 'world model', which can autonomously perceive the environment, make plans, and take actions.

If the endpoint leading to AGI is the 'world model', then the agent that can act autonomously is now closest to the endpoint.

LLMAgent Development History

How many stages does it take to transition from NLP to AGI?

The answer is five, namely corpus, internet, perception, embodiment, and social attributes. At present, the large language model is in its second stage, with internet scale text input and output.

If you want to go further, you need to endow LLM with perception and action abilities.

Next, if these autonomous LLMAgents with perception and action can interact, collaborate to solve more complex problems, or reflect social behavior in the real world, they will have social attributes.

Humans can also participate in societies composed of AIAgents. Taking the Genshin Impact Sea Lantern Festival as an example, Xiangling and Yaoyao in the above picture are preparing meals in the kitchen, walnuts and Xin Yan are holding a concert to perform, and Gan Yu and Ke Qing are discussing making lanterns. The player (main controller) can choose any scene to interact with the AIAgent.

Therefore, AIAgent is considered the most promising option for achieving AGI.

But what is an Agent?

The Chinese meaning of Agent is "agent", which originated from philosophy and can be traced back to Aristotle and Hume.

Agent "describes an entity that possesses desires, beliefs, intentions, and the ability to take action. Transferring this concept to computer science means that computers can understand users' wishes and independently execute tasks on their behalf.

With the development of AI, Agents have found their place in AI research, describing entities that exhibit intelligent behavior and possess autonomy, responsiveness, initiative, and social abilities.

When people can describe an object with concepts, more in-depth research begins.

After Agent had its own definition and connotation, the research on "intelligent agents" became the focus of the AI community.

LLM based Agent

The in-depth research on agents has been ongoing since the mid-20th century, and people have made certain achievements in their efforts. However, the application scenarios of agents are extremely limited, and they can only achieve specific tasks.

The AGI that people want is universal and applicable to a wide range of scenarios, rather than a specialized tool.

Perhaps specialized tools can also exert considerable power, but tools cannot autonomously adapt to the world and can only be used.

And if the model wants to have autonomy and adapt to various complex environments like organisms, universal ability is a necessary key.

This includes abilities such as knowledge memory, long-term planning, effective generalization, and efficient interaction.

With the development of various types of artificial intelligence, the Large Language Model (LLM) has emerged as a seed player with general abilities.

Pure LLM is in the second stage of the AGI journey, which involves text input and output on an internet scale.

But LLM has demonstrated strong abilities in knowledge acquisition, instruction understanding, generalization, planning, and reasoning, and it can also effectively interact with humans in natural language.

This is a significant advantage, and LLM was chosen as the starting point for the Agent system. After humans give it a broader space for perception and action, LLM may reach a higher level.

LLM based Agent

Similar to humans, the brain is also the core of AIAgent, consisting of LLM. In intelligent agents, LLM is responsible for storing memory and knowledge, and also undertakes indispensable functions such as information processing and decision-making.

Therefore, LLM can enable agents to present the process of reasoning and planning, and respond well to unknown tasks, reflecting the generalization and transferability of intelligent agents.

The perception space of agents should be expanded from pure text to include multimodal fields such as text, vision, and hearing, so that they can more effectively obtain and utilize information from the surrounding environment.

In terms of Agent's actions, in addition to regular text output, it is also necessary to endow the Agent with embodied abilities and the ability to use tools, so that it can better adapt to environmental changes, interact with the environment through feedback, and even shape the environment.

Practical Application Scenarios of Agents

This article mainly introduces three application scenarios of agents: single agent deployment, multi-agent interaction, and human-agent interaction.

A single agent has multiple capabilities and can demonstrate excellent task solving capabilities in various application directions.

The application of a single agent is divided into three levels:

Firstly, in task oriented deployment, agents can assist human users in solving basic daily tasks. At this point, they need to have basic command understanding and task decomposition capabilities.

According to the existing task types, the actual application of agents can be divided into simulating network environments and simulating real-life scenarios.

Secondly, in innovation oriented development, Agent demonstrates the potential for independent exploration in the scientific field.

Despite the inherent complexity from professional fields and the lack of training data hindering agent construction, progress has been made in fields such as chemistry, materials, and computer science.

In lifecycle oriented deployment, agents have the ability to continuously explore, learn, and utilize new skills to ensure long-term survival in an open world.

Taking the game "My World" as an example, the survival challenges in the game are considered a microcosm of the real world and have become a unique platform for developing and testing the comprehensive capabilities of agents.

When multiple agents interact, they can achieve progress through cooperative or adversarial interactions.

In collaborative interaction, agents collaborate in an unordered or orderly manner to achieve common goals.

In adversarial interactions, agents compete in a tit for tat manner to improve their respective performance.

In addition, in the process of human agent interaction, human feedback can make the agent more efficient and secure in executing tasks, and the agent can also provide better services for humans.

The interaction between humans and agents can be divided into two modes.

In the mentor executor paradigm (left), humans provide guidance or feedback, while agents act as executors.

In the equal cooperation paradigm (right figure), agents, like humans, are able to engage in empathetic dialogue with humans and participate in non cooperative tasks.

Finally, there is a society composed of agents.

The society of agents can be simply divided into two elements: agent and environment.

At the individual level, agents can exhibit internalized behaviors such as planning, reasoning, and reflection. It also exhibits inherent personality traits, including cognition, emotion, and personality.

But an agent can form a group with other agents and exhibit group behavior, such as cooperation.

At the environmental level, whether it is a virtual or physical environment, it includes human actors and all available resources. For a single agent, other agents are also part of the environment. Agents have the ability to interact with the environment through perception and action.

Netizens' hot discussion: AIAgent, start!

Perhaps it was the "exit" of Genshin Impact and Miha Youyou who "saved the world with technology". Netizens are very interested in this paper.

Some netizens even want to finish reading this 80-plus page paper in one day:

I really want to know if anyone can read and understand this paper in one day, but I will give it a try

Another netizen, as a player of Genshin Impact, is directly:

"Genshin Impact, start!"

Although the application of AIAgent in games was not mentioned in this paper, the emergence of Mihayou and Genshin Impact made netizens very excited and began to imagine the impact of AIAgent on games.

"This is not only the future of Genshin Impact, but also the future of all games.

Let AIAgents become our partners in the story, and they will respond to players' choices with their own values, rather than relying on fixed scripts

Some netizens also had some imaginations about the future of gaming and AGI:

If AGI requires embodied agents, then games will be the best place to implement it

References:

https://arxiv.org/pdf/2309.07864.pdf

Tag: AIAgent start Fudan NLP team posted an page long

Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.

Previous: Baidu Li Zhenyu: Big models will move towards multimodality, shaping fully autonomous automotive robots

Previous: Big models cannot replace code farmers! Princeton University of Chicago Surprisingly Discovered: GPT-4 has a success rate of 0 in solving GitHub programming problems