Home > News list > Tech >> Intelligent devices

Solutions for China's Most Valuable Model Company

Intelligent devices 2023-10-29 23:46:20 Source:

If we break it down into scenes to do it again, wouldn't we be following the same old path as the previous generationWen | Zhu LikunEditor | Cheng ManqiSince September, passengers on China Southern Airlines flights have seen advertisements behind their seats that read "Efficient, Not Anxiety" and "Zhipu Qingyan: A New Generation of AI Efficiency Assistant".However, within six months, China's large model companies have entered the competition of application implementation, starting from team formation and competition to release large models



If we break it down into scenes to do it again, wouldn't we be following the same old path as the previous generation

Wen | Zhu Likun
Editor | Cheng Manqi

Since September, passengers on China Southern Airlines flights have seen advertisements behind their seats that read "Efficient, Not Anxiety" and "Zhipu Qingyan: A New Generation of AI Efficiency Assistant".

However, within six months, China's large model companies have entered the competition of application implementation, starting from team formation and competition to release large models.

The term 'Zhipu Qingyuan' comes from Zhipu AI (hereinafter referred to as' Zhipu '). This company, founded in June 2019, had a valuation of only about 2 billion yuan at the end of last year, but now it has reached 14 billion yuan, making it the most valued large model startup company in China.

Since the beginning of this year, Zhipu has raised 2.5 billion RMB. Ant, Meituan, Sequoia China, Gaoling, as well as Tencent and Alibaba, which have rarely invested in the same company in the past few years, are all among the investors.

Like most large model companies that receive investor attention, Zhipu is also like a hybrid of research institutions and enterprises: Zhipu's core team comes from the KEG (Knowledge Engineering) laboratory at Tsinghua University. Tang Jie, Director of KEG and Professor of Computer Science at Tsinghua University, served as a consultant for Intelligent Spectrum Technology. Tang Jie's papers in the field of AI have been cited more than 32000 times. Zhang Peng, a Ph.D. in Tsinghua Innovation Leading Engineering, served as the CEO of Zhipu and was also a member of KEG.

On the other hand, Zhang Peng has been connecting with clients since the establishment of Zhipu three years ago. This year's big model craze has brought a variety of new demands - some of which do not conform to the characteristics of big model technology. Zhang Peng's attitude is, "The customer is God, you have to say yes first," and then communicate.

The equal emphasis on research and commercialization is currently a common feature of all major model startups. The characteristic of Zhipu is that it particularly believes in doing a good job in general underlying models, which can leverage huge commercial value.

Faced with the problems of model development, efficiency improvement, and application implementation in the field of large-scale models, Zhipu's current problem-solving approach is:

  • Believing that improving the intelligence level of the model is more important than making profitable applications as soon as possible.
  • Without creating an "industry model" for each segmented industry, I believe that a strong universal model can directly support many applications.
  • Try not to customize yourself and work with external partners to serve customers' customized needs.
  • The main goal is not to develop terminal applications, but to supplement the business ecosystem by investing in other AI technologies and application companies. For example, Zhipu has invested in AI company Lingxin Intelligence three times. A person close to Zhipu said that Zhipu has plans to prepare for the battle investment department in the near future or may launch an acquisition.

This is a way of building a technology platform: providing standardized technical products and services as the main profit method, allowing customers in various industries to develop terminal products and applications on the basis of relatively unified standard technology. In China, this is also a path that few people adhere to until the end.

Delivering results before the market is filled with competitors

When did the signal gun of the Chinese model "Race" sound?

Some people may have heard the noise after this year's Spring Festival, and the Zhipu was 3 years ago.

In June 2020, the Zhipu team discovered that OpenAI, which they had been following, had released a new achievement GPT-3, which greatly improved the accuracy of the model.

Prior to this, OpenAI's language model pre training framework GPT had not shown significant advantages, and other major model attempts included Google's BERT framework. At that time, Zhipu was developing its own pre training framework, GLM, which combined the bidirectional prediction ability of frameworks such as Google BERT in one-way backward prediction model frameworks such as GPT. It can not only predict from the previous text, but also guess from the previous text.

After developing a dense model with tens of billions of parameters based on the GLM framework, Zhipu began to use the "sparsization" method to amplify a model to trillions of parameters. Sparseness can be simply understood as copying the kernel of a dense model many times, activating which part of the neuron is used, which can reduce computational costs. This is the initial attempt of Zhipu on a large model.

Some early investors in the field of big models also heard the 'starting gun' earlier. Zhou Zhifeng, a partner of Qiming Venture Capital who invested in Zhipu at the end of 2021, told LatePost that the release of GPT-3 has shown Qiming Venture Capital a new "technological singularity". We have invested in so many natural language processing companies, and its (GPT-3) performance has been 'ten times better', "Zhou Zhifeng said.

Since then, Qiming Venture Capital has started searching for large model investment targets in China. In March 2021, the "Wudao" big model of Beijing Zhiyuan Artificial Intelligence Research Institute was released, and Zhou Zhifeng paid attention to Zhipu as one of the research and development participants.

The intellectual spectrum at this stage is undergoing a shift in thinking. After completing the sparse trillion parameter model, Zhipu found that its performance was not good, the accuracy improvement effect was average, and the cost was too high.

The effect of GPT-3 with 175 billion parameters gave Zhipu reference and confidence, and they began to remake dense models with hundreds of billions of parameters. In the fourth quarter of 2021, when Qiming Venture Capital launched its investment process, Zhipu was already in the stage of selecting a model architecture worth hundreds of billions.

In August 2022, Zhipu released a dense model GLM-130B with 130 billion parameters, which became the foundation for their future dialogue models and various technology implementations. At this point, there are still 3 months left until ChatGPT is launched at the end of November of the same year.

Zhou Zhifeng believes that the timing for Zhipu to come up with a big model is very important. By the time ChatGPT exploded and many companies hope to try and explore the application of big models, Zhipu is ready. He reviewed the major global technological waves of the past few years, including new energy vehicles, large chips (such as data center computing chips), and large models. In addition to their huge market prospects, their common features are the "three highs": high technology barriers, high talent barriers, and high capital barriers. Companies with a first mover advantage in this field will have a halo effect, with funds, computing power, data, and talent all leaning towards it.

This year alone, we have accumulated 2.5 billion yuan and a team of over 400 people, and now we have relatively abundant resources to invest in technology. When choosing a technical route, Zhipu tends to try to "expand the ceiling", such as exploring the latest multimodal large models.

A multimodal large model is a model that can simultaneously process multiple types of data such as language and vision. There are two main approaches to multimodality in the industry: one is to integrate visual and linguistic features within the same model framework; The second is to first train the language and visual models separately, and then use a lightweight neural network "bridging layer" to concatenate the two abilities.

The latter combination approach can yield faster results and also has cost advantages. The models made in this way include Salesforce and BLIP-2 and LLAVA released by Microsoft. In early October, the latest version of LLAVA achieved "State of the Art" performance among 11 benchmark tests.

The multimodal large model CogVLM-17B released by Zhipu in October chose the first method, which is also the idea behind the OpenAI multimodal large model GPT-4V.

In essence, multimodality is still a cognitive problem, "Zhang Peng said. The emergence of language has accelerated the evolution of human intelligence, because language can abstract a cognitive layer based on basic perceptual abilities such as vision and hearing; Analogously, in AI, modeling natural language is superior to other perceptual abilities, and developing visual and other perceptual abilities also needs to be integrated into natural language modeling, rather than simply adding functions. This has always been the "cognitive intelligence" and "cognitive modeling" approach of Zhipu.

This attempt to integrate text and vision will be slower and more difficult in the short term. Companies that prepared their results before the craze gained greater space.

Cost reduction is the first step towards commercialization of large models

After the third quarter of this year, with the first batch of companies registering through generative artificial intelligence and a batch of products officially launched, competition for the application of large models in China has begun.

The characteristics of the internet services and software industry are one-time production, unlimited use, and a significant decrease in marginal costs. The more people using large models nowadays, the greater the cost pressure, because each call to the model consumes expensive computational power. According to reports, GitHubCopilot, an AI programming assistant launched by Microsoft based on the OpenAI model, is currently at a loss. The product has over 1.5 million users, and Microsoft charges a subscription fee of $10 per user per month. However, the average monthly cost spent by Microsoft on a single user is $30, and the most used users can even make Microsoft lose $80 per month. The author of the report joked that summarizing emails using GPT-4 is like delivering pizza in a Lamborghini.

The acceleration of application requires the competition of large models to enter a new stage: it is not enough to create models, but also to balance efficiency and cost - this tests the ability of large model companies to "squeeze" efficiency and value at each stage.

When training the 100 billion parameter model GLM-130B in 2021, Zhipu used approximately 1000 A100 sheets, which is not too much. In order to make more efficient use of computing power resources, during the development of models, Zhipu has been conducting various engineering experiments for the first six months to find more stable training methods for the model, especially to avoid "crashes" during training, which will prolong the GPU lease period and increase computing power and time costs.

The final training of GLM-130B in Zhipu only took about two months, and sufficient preparation made the training process smoother without any major interruptions.

The understanding explored by Zhipu is that the key to controlling the cost of training and reasoning is the balance between model accuracy and stability:

  • During the training phase, the higher the accuracy of the data representation method, the higher the memory and computational resources consumed. This also requires a larger scale of computing power clusters, which will bring a greater probability of hardware failure. Once a 'crash' occurs midway through training, the entire task needs to be restarted. The training techniques and engineering methods involved are not mentioned in many technical papers and open source projects.
  • In the reasoning stage, Zhipu claimed that they also found a balance between accuracy and cost. Zhang Peng explained that the parameter distribution of the GLM large model architecture is more concentrated than that of GPT and Llama, and the compressed accuracy is higher, which can be used for reasoning at a lower cost without sacrificing accuracy.
  • After the GLM-130B was launched in 2022, Zhipu did model compression and adaptation to domestic hardware. Zhipu claims that this enables models that used to run on millions of yuan GPU cards to run on hardware at the 100000 yuan level with almost no performance loss.

Zhipu is further reducing costs and increasing efficiency, such as building a portion of its own computing power to optimize model training and reasoning from the hardware layer. For any large model company, computing power is now a scarce resource, and improving the efficiency of computing power utilization (ModelFLOPSUtilisation) has become crucial.

According to a Google Research paper, the initial computational efficiency of GPT-3 was 21.3%. The computational efficiency of the 540 billion parameter model PaLM launched by Google in 2022 has reached 46.2%, and the data of Zhipu in the same year exceeded 40%. Currently, both OpenAI and Google have increased this number to around 50%.

Zhou Zhifeng believes that the development stage of the current big model can be likened to the mid-1990s of the Internet wave. The training, deployment, and reasoning of models need to be continuously optimized to keep costs down. Just like in the past, using a 56K speed modem to access the internet cost a few yuan per minute, and downloading a photo took ten minutes; Now, through high-speed fiber optics, the marginal cost of network bandwidth and information distribution is almost zero.

All technologies will inevitably move towards refinement and value extraction at a certain stage of development, "Zhang Peng said, and the big model has also reached this stage.

Create a universal model instead of following the old path of customization

Developing technology platforms and empowering businesses based on AI is the vision of many companies in the previous wave of AI craze triggered by AlphaGo in 2016.

The technology platform provides standardized technical products and services, supporting customers in various industries to develop terminal products and applications on this basis.

Until now, the idea of "empowering all industries" by these companies has mostly focused on limited fields such as security and smart cities, and they need to provide their own customized services. Last year, the leading security company, Hikvision, had a revenue of 65.87 billion yuan in its "main products and services" (mainly video surveillance products), while AI company Shangtang had a total revenue of 3.8 billion yuan during the same period, with smart cities accounting for one-third. In the fields of security and smart cities, hardware manufacturers make much more money than algorithm providers like Shangtang. The idea of a technology platform has not yet been realized.

What is the reason to believe that large model companies can create huge commercial value through technology platform models? The core change lies in the fact that large model technology has stronger universality and generalization capabilities compared to the previous generation AI technology.

Taking the previous image recognition technology as an example, when it is used in hotel check-in, office building gates, and road traffic scenarios, different models need to be trained with different data, and then deployed to different hardware environments. This means that every time the company serves a new scenario and new customer, it needs to be reinvested.

Many of the clients Zhipu has met this year often ask, 'How can you fine-tune this model?'? People are still making judgments about the landing path of machine learning. "He believes that some customers still underestimate the universality of large models. They feel that their scenarios, data, and requirements are unique and require a lot of adaptation on the basic model. In fact, some scenarios only require little or no fine-tuning.

Due to confidence in the universality of large models, Zhipu does not create industry models. I originally wanted to create a powerful base that could be generalized to various scenarios, but you ended up breaking it down one by one. Isn't that going back to the old ways of the previous generation? "Zhipu will ask customers who are not aware of the universality of large models to try" Zhipu Qingyan "first to see what effects this free 2C product can achieve.

Everyone has gradually accepted the method of purchasing model authorization on an annual basis, "Zhang Peng said. Zhipu's 2B revenue now mainly comes from the annual model authorization fee and one-time deployment fee," try not to customize it as much as possible. He said that most of the important clients he has recently approached are discussing three years of continuous licensing, as technology is still rapidly evolving and customers need continuous model updates and services.

A person who has helped other companies build and deploy large models told LatePost that meeting the customized needs of large clients will result in unexpected manpower and energy investment. For example, when serving a manufacturing group, it took several days for their engineers to enter and collect data from various departments.

Zhipu's approach is to collaborate with technical consulting companies and software service companies, leverage external forces to meet customized needs, and establish a delivery service ecosystem. Currently, Zhipu only has an internal delivery team of dozens of people. A person close to Zhipu said that when facing different customers in the same industry, Zhipu only does customized development for a small number of interfaces or some functions, and the lowest level AI layer does not need to be customized.

Zhipu is also exploring downstream applications through investment. Since the end of last year, Zhipu has invested in Lingxin Intelligence three times. Prior to the latest increase in holdings in September this year, the two sides collaborated to launch the CharacterGLM model based on the GLM framework, which can be used for personified dialogue, which is considered an important application direction for the large model 2C. Time magazine reported in September that the most successful product in the field, CharacterAI, has over 3.5 million active users per day, with users spending an average of two hours per day chatting with their AI characters.

Reducing costs, leveraging partners for customization, or investing in application companies, Zhipu's actions are addressing the initial problem of commercializing large models. To achieve the business model of technology platform, Zhipu still needs to continue iterating big models or intelligence: the current efficiency of big models is not enough.

Seven years ago, Zhang Bo, a professor in the Computer Science Department of Tsinghua University and academician of the Chinese Academy of Sciences, went to KEG Laboratory for a sharing session, and Zhang Peng participated in the discussion together. At that time, Zhang Bo judged that the development framework for the next generation of AI was a "dual wheel drive of data and knowledge".

Analogous to human intelligence, data-driven systems are more like "intuition based fast thinking systems". The GPT large model is such a "black box" that is pre trained with massive data. Why does the process of obtaining a certain result after inputting information cannot be fully explained, similar to human intuitive reactions; Knowledge driven systems are more like "slow thinking systems based on logic", and expert systems and knowledge graphs are mainly in this category, pursuing interpretable machine reasoning processes.

Zhang Peng believes that the GPT craze represents the rapid development of the data-driven field in recent years, with knowledge driven progress relatively lagging behind. The development of AI has always been an "alternating rise between several schools", and the knowledge driven approach may also accelerate in the future: some researchers have been trying to combine knowledge and logic with big models, reduce big model illusions, or make it self correcting. OpenAI Chief Scientist Ilia also mentioned in an interview in April this year that reducing hallucinations and increasing reliability are the most significant topics in the field of large models in the next two years.

When a new technology beyond imagination begins to commercialize, the initial shock will quickly dissipate. Participating in a company requires both accepting the test of business laws and driving technology towards maturity. The personal computer and internet industries have withstood such tests and reshaped the entire world. Artificial intelligence has broken through several times and stagnated several times, but now a new group of companies are also at this stage around big model technology.

Title source: stills from 'Silicon Valley'

Tag: Solutions for China Most Valuable Model Company


Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.

AdminSo

http://www.adminso.com

Copyright @ 2007~2024 All Rights Reserved.

Powered By AdminSo

Open your phone and scan the QR code on it to open the mobile version


Scan WeChat QR code

Follow us for more hot news

AdminSo Technical Support