Large Model Evolution Slows Down, OpenAI Bets on Reasoning Ability
Large Model Evolution Slows Down, OpenAI Bets on Reasoning Ability Despite the surge in use of AI products such as ChatGPT, improvements in the underlying modules that power these technologies seem to be slowing down. OpenAI is facing this challenge and developing new technologies to enhance core modules like large language models
Large Model Evolution Slows Down, OpenAI Bets on Reasoning Ability
Despite the surge in use of AI products such as ChatGPT, improvements in the underlying modules that power these technologies seem to be slowing down. OpenAI is facing this challenge and developing new technologies to enhance core modules like large language models. The development process of OpenAIs upcoming flagship model Orion reflects this status quo.
In May, OpenAI CEO Sam Altman revealed internally that he expected Orion, currently under training, to significantly outperform the previous generation flagship model released a year ago. While Orions training process was only 20% complete, its intelligence and task completion capabilities were already on par with GPT-4. However, some OpenAI employees who have used or tested Orion, while acknowledging its performance surpasses previous models, noted that the improvement was far less substantial than the leap from GPT-3 to GPT-4. Some researchers have pointed out that Orion may not consistently outperform previous models in handling specific tasks. One employee stated that while Orion excels in language tasks, it may not perform as well as previous models in tasks like coding.
This situation could present problems, as Orion has a higher running cost in data centers compared to other models recently released by OpenAI. Orions performance is testing the core assumption in the AI field the scaling law, which posits that large language models can continuously improve with increased data and computational power. Facing the challenge of slowing GPT improvement speed, the industry seems to be shifting its focus to model optimization after initial training, potentially leading to a new scaling law.
Some tech leaders, including Mark Zuckerberg, CEO of Meta, the parent company of Facebook, believe that there is still significant room for development based on current technology to create consumer and enterprise-facing products, even in the worst-case scenario where no further technological breakthroughs occur.
OpenAI is addressing the threat posed by rival Anthropic by embedding more code-writing capabilities into its models and developing software that can simulate human computer operations to perform white-collar tasks involving browser and application operations, such as clicking, cursor movement, and text input. These products fall under the category of AI agents capable of carrying out multi-step tasks, potentially as revolutionary as the initial release of ChatGPT. Zuckerberg, Altman, and other executives in AI development have stated that they have not yet reached the limits of traditional scaling laws. As a result, companies including OpenAI are still investing billions of dollars in building data centers to improve the performance of pre-trained models as much as possible.
However, OpenAI researcher Noam Brown warned at the TED AI conference last month that developing more advanced models may become economically unsustainable. Are we really going to spend hundreds of billions, even trillions of dollars training models? Theres a point when scaling just stops working.
OpenAI still needs to complete complex safety tests before publicly releasing Orion. According to employees, Orion may be released early next year and may abandon the traditional "GPT" naming convention to highlight the new features of the model.
Data Scarcity Becomes a Bottleneck in Model Training
OpenAI employees and researchers have identified insufficient supply of high-quality text and other data as one of the reasons for the slowdown in GPT progress. Large language models need these data during the pre-training phase to understand the world and the relationships between different concepts to solve tasks such as writing blog posts or fixing coding errors. In recent years, large language models have mainly relied on publicly available text data from websites, books, and other sources for pre-training, but developers have nearly exhausted the potential of such data.
To address this, OpenAI has assembled a foundational team led by Nick Ryder, who was previously responsible for pre-training, to research ways to deal with data scarcity and explore the continued applicability of the scaling law. According to OpenAI employees, Orion was partially trained on AI data generated by other OpenAI models such as GPT-4 and the recently released reasoning model. However, this synthetic data also presents new challenges, potentially leading Orion to be too similar to previous models in some aspects.
OpenAI researchers are using other tools to optimize the models performance after training by improving how it handles specific tasks. They are employing a method called reinforcement learning, where models learn from a large amount of correctly solved problems, such as math and coding problems. Additionally, researchers have invited human evaluators to test pre-trained models on tasks like coding or problem-solving and rate their answers. This helps researchers adjust the models to perform better on requests like writing and coding. This process, known as reinforcement learning with human feedback, has also aided in improving earlier AI models.
OpenAI and other AI developers typically rely on startups like Scale AI and Turing to manage thousands of contractors to handle these evaluation tasks.
Breaking Through the Bottleneck, Enhancing Reasoning Ability
OpenAI has also developed a reasoning model called o1, which spends more time thinking about the training data before answering, a process called test-time computation. This means that o1 can continuously improve its response quality by allocating more computational resources even without changing the underlying model. According to insiders, even with slower improvements in the underlying model, OpenAI can still achieve better reasoning results if it can maintain continuous enhancements.
It opens up a new dimension of scaling, said Brown at the TED conference. He added that researchers can improve model response quality by going from 1 cent per query to 10 cents per query. Altman also emphasized the importance of reasoning models, believing they can be used in conjunction with large language models. I hope that reasoning capability will unlock breakthroughs weve been unable to achieve for years, like allowing models to contribute to scientific research and complex code writing, Altman said at an event for app developers.
In a recent interview with Garry Tan, CEO of YCombinator, Altman revealed: We basically know how to build artificial general intelligence something that gets to human-level capabilities and part of that is being creative with how we use the models we have. Mathematicians and scientists have stated that o1 has been helpful to their research work, functioning as a partner capable of providing feedback and inspiration. However, according to two informed employees, due to o1's price being six times that of non-reasoning models, its customer base has not become widespread yet.
Some investors who have invested tens of millions of dollars in AI developers are questioning whether the improvement rate of large language models is starting to plateau. Venture capitalist Ben Horowitz stated in a YouTube video: We are adding graphics processing units to train AI at the same rate, and were not seeing commensurate improvement in intelligence. Horowitz's venture capital firm is not only a shareholder of OpenAI but also invests in competitors like Mistral and Safe Superintelligence. Horowitz's colleague Marc Andreessen pointed out in the same video: A lot of smart people are working on how to break through the bottleneck and figure out how to enhance reasoning.
Ion Stoica, co-founder and president of enterprise software company Databricks, said that large language models may have plateaued in some ways but still have room for improvement in others. He has also developed a website where app developers can evaluate different large language models. Stoica stated that while AI is making progress in coding and solving complex problems, progress seems to be slowing down in performing general tasks, such as analyzing text sentiment or describing medical symptoms. On commonsense things, we seem to be seeing plateauing in these large language models. To achieve further breakthroughs, we need more factual data, and synthetic data is helping only somewhat."
Tag: Large Model Evolution Slows Down OpenAI Bets on Reasoning
Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.