NVIDIA AI intelligent agent connects to GPT-4, surpassing AutoGPT! Independently write code to dominate my world without human intervention

Intelligent devices 2023-05-27 16:03:18 Source: Network

Xinzhiyuan ReportEditor: Aeneas is so sleepyIntroduction to New Intelligence ElementGive the gaming industry some GPT-4 style shock? This intelligent agent called Voyager can not only train autonomously based on game feedback, but also write code to drive game tasks on its own.After the 25 person town in Stanford, AI agents have released another explosive new product

Xinzhiyuan Report

Editor: Aeneas is so sleepy

Introduction to New Intelligence ElementGive the gaming industry some GPT-4 style shock? This intelligent agent called Voyager can not only train autonomously based on game feedback, but also write code to drive game tasks on its own.

After the 25 person town in Stanford, AI agents have released another explosive new product.

Recently, NVIDIA Chief Scientist Jim Fan and others integrated GPT-4 into the "Minecraft" - proposing a new AI intelligent agent Voyager.

The strength of Voyager is that it not only outperforms AutoGPT in terms of performance, but also allows for lifelong learning across all scenarios in the game!

Compared to the previous SOTA, Voyager has obtained 3.3 times more items, increased travel distance by 2.3 times, and unlocked key skill trees 15.3 times faster.

Netizens were directly shocked by this: we are one step closer to General Artificial Intelligence (AGI).

So, will future games be played by NPCs driven by large models?

After connecting to GPT-4, Voyager doesn't have to worry about humans at all, it's completely self-taught.

It not only mastered basic survival skills such as excavation, building houses, collecting, and hunting, but also learned to conduct open exploration on its own.

It will go to different cities on its own, passing through oceans, pyramids, and even building its own portal.

Through self drive, it constantly explores this magical world, expanding its items and equipment, equipped with different levels of armor, blocking Shanghai with shields, and using fences to keep animals in captivity

Paper address: https://arxiv.org/abs/2305.16291

Project address: https://voyager.minedojo.org/

Movie Maker at the End of the Great War

Manufacturing base

Dig amethyst

Dig for gold

Collect Cactus

hunt

fishing

What is the potential of digital life? We only know that Voyager is still constantly exploring and expanding its territory in Minecraft.

Training does not require gradient descent

Previously, a major challenge in the field of AI was to build embodied agents with universal capabilities, allowing them to independently explore and develop new skills in an open world.

In the past, scholars used reinforcement learning and imitative learning, but these methods often performed poorly in systematic exploration, interpretability and generalization.

The emergence of large language models has brought new possibilities for building embodied agents. Because LLM based agents can utilize the world knowledge contained in pre trained models to generate consistent action plans or executable strategies, they are very suitable for applications in tasks such as games and robots.

Previously, Stanford researchers shocked the AI community by constructing a virtual town with 25 AI intelligent agents born and living

Another advantage of this agent is that it does not require specific natural language processing tasks.

However, these intelligent agents are still unable to overcome the flaw of being unable to learn for life, and therefore unable to gradually acquire knowledge over a long time span and accumulate it.

The most important significance of this work lies in the fact that GPT-4 has opened up a new paradigm: in this process, it relies on code execution to "train", rather than relying on gradient descent.

Jim Fan explained that we had this idea before BabyAGI/AutoGPT and spent a lot of time identifying the best gradient free architecture

The "training model" is a skill code base built iteratively by Voyager, not a floating point matrix. Through this approach, the team is pushing the gradient free architecture to its limits.

The intelligent agent trained in this situation already possesses the same lifelong learning ability as humans.

For example, if Voyager finds itself in a desert rather than a forest, it will know that learning to collect sand and cacti is more important than learning to collect iron ore.

Moreover, it can not only determine its most suitable task based on its current skill level and world state, but also continuously improve its skills based on feedback, save them in memory, and stay in the next call.

So, how far are we from the emergence of silicon-based life?

Karpathy, who has just returned to OpenAI, praised this job as a "gradient free architecture" for advanced skills. Here, LLM is equivalent to the prefrontal cortex, generating lower level minelayer APIs through code.

Karpathy recalled that around 2016, the performance of intelligent agents in Minecraft environments was still very disappointing. At that time, RL could only randomly explore ways to execute long-term tasks from ultra sparse rewards, which made people feel very frustrated.

And now, this obstacle has been largely lifted - the correct approach is to take a different approach by first training LLM to learn world knowledge, reasoning, and tool usage (especially coding) from internet texts, and then directly throwing the problem at them.

Finally, he exclaimed, "If I had learned about this' gradient free 'approach to intelligent agents in 2016, I would have been surprised.

It's really a great attempt, the entire code is open source, and this automatic generation of tasks -> Automatically write code to execute tasks -> The idea of saving a code library that can be reused should be easily applied to other fields.

Voyager

Unlike other commonly used games in AI research, Minecraft does not impose pre-defined endpoint goals or fixed storylines, but rather provides an endless playground of possibilities.

For an effective lifelong learning agent, it should have abilities similar to those of human players:

1. Propose appropriate tasks based on its current skill level and world state. For example, if it finds itself in a desert rather than a forest, it will learn to collect sand and cacti before learning to collect iron

2. Improve skills based on environmental feedback and record the acquired skills in memory for reuse in similar situations (such as fighting zombies and spiders)

3. Continuously explore the world and find new tasks in a self driven manner.

In order to equip Voyager with these abilities, teams from NVIDIA, California Institute of Technology, University of Texas at Austin, and Arizona State University proposed three key components:

1. An iterative prompt mechanism that can combine game feedback, execution errors, and self verification to improve the program

2. A skill code library for storing and retrieving complex behaviors

3. An automated tutorial that maximizes the exploration of intelligent agents

Firstly, Voyager will attempt to use a popular Minecraft JavaScript API (Minelayer) to write a program that achieves specific goals.

Although the program made an error on the first attempt, game environment feedback and JavaScript execution errors (if any) can help GPT-4 improve the program.

Left: Environmental feedback. GPT-4 realized that two more wooden boards were needed before making wooden sticks. Right: Execution error. GPT-4 realized that it should make a wooden axe instead of a 'acacia wood' axe, as there is no 'acacia wood' axe in Minecraft.

By providing the current status and tasks of the agent, GPT-4 will tell the program whether the task has been completed.

In addition, if the task fails, GPT-4 will also provide criticism and suggestions on how to complete the task.

Self verification

Secondly, Voyager gradually establishes a skill library by storing successful programs in a vector database. Each program can be retrieved by embedding its document string.

Complex skills are synthesized by combining simple skills, which can rapidly increase Voyager's abilities over time and alleviate catastrophic forgetting.

Above: Add skills. Each skill is described by an embedded index that can be retrieved in similar situations in the future. Below: Retrieve skills. When faced with new tasks proposed by automated courses, queries will be conducted and the top 5 related skills will be identified.

Thirdly, automated courses will propose appropriate exploration tasks based on the current skill level and world state of the agent.

For example, if it finds itself in a desert instead of a forest, it learns to collect sand and cacti instead of iron.

Specifically, the course is generated by GPT-4 based on the goal of "discovering as diverse things as possible".

Automated courses

experiment

The team systematically compared Voyager with other LLM based intelligent agent technologies, such as ReAct, Reflexion, and the popular AutoGPT in Minecraft.

In 160 prompt iterations, Voyager discovered 63 unique items, 3.3 times more than the previous SOTA.

Automated coursesVoyagerVoyager2.3

In contrast, the previous method seemed very "lazy" and often circled around in a small area.

Map exploration rate

So, how does the 'training model' - skill library - perform after lifelong learning?

The team cleared items/armor, created a new world, and tested the intelligent agent with unprecedented tasks.

It can be seen that Voyager solves tasks significantly faster than other methods.

It is worth noting that the skill library built from lifelong learning not only improves the performance of Voyager, but also enhances the performance of AutoGPT.

This indicates that the skill library, as a universal tool, can effectively serve as a plug and play method to improve performance.

Zero sample generalization

The numbers in the above figure are the average values of the suggested iterations in three experiments. The fewer iterations, the more effective the method is. It can be seen that Voyager has solved all the tasks, while AutoGPT cannot solve them after 50 prompt iterations.

In addition, compared to other methods, Voyager is 15.3 times faster in unlocking wooden tools, 8.5 times faster in unlocking stone tools, and 6.4 times faster in unlocking iron tools. And Voyager, which has a skill library, is the only tool to unlock diamonds.

Currently, Voyager only supports text, but in the future, it can be enhanced through visual perception.

In a preliminary study conducted by the team, humans can provide feedback to intelligent agents like an image annotation model.

This allows Voyager to construct complex 3D structures, such as hell gates and houses.

The results indicate that Voyager performs better than all alternative solutions. In addition, GPT-4 is significantly better than GPT-3.5 in code generation.

experiment

conclusion

Voyager is the first embodied agent driven by LLM and capable of lifelong learning. It can utilize GPT-4 to constantly explore the world, develop increasingly complex skills, and always make new discoveries without human intervention.

Voyager has demonstrated superior performance in discovering new items, unlocking Minecraft technology trees, traversing diverse terrains, and applying its learned skill base to unknown tasks in the newly generated world.

For the development of general intelligent agents, Voyager without adjusting model parameters can serve as a starting point.

References:

https://voyager.minedojo.org/

Tag: to NVIDIA AI intelligent agent connects GPT-4 surpassing AutoGPT

Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.

Previous: Hisense Yu Zhitao: Adhering to technological innovation is the most difficult path

Previous: This AI with higher emotional intelligence than ChatGPT, I can chat with it for three days and three nights