Home > News list > Tech >> Industry dynamics

Baichuan Intelligent launches the Baihuan2-192K large model, which can input 350000 Chinese characters at once

On October 30th, Baichuan Intelligent released the Baihuan2-192K large model. Its context window has a length of up to 192K and can handle approximately 350000 Chinese characters, which is 4

On October 30th, Baichuan Intelligent released the Baihuan2-192K large model. Its context window has a length of up to 192K and can handle approximately 350000 Chinese characters, which is 4.4 times the best large model Claude2 currently supporting long context windows (supporting 100K context windows with a measured value of approximately 80000 words) and 14 times the GPT-4 (supporting 32K context windows with a measured value of approximately 25000 words).

It is reported that on September 25th of this year, Baichuan Intelligent has opened the API interface of Baihuan2, officially entering the enterprise market and starting the commercialization process. This time, Baihuan2-192K will be provided to enterprise users through API calls and private deployment. Currently, Baichuan Intelligent has launched the API internal testing of Baihuan2-192K, which is open to core partners in the legal, media, finance and other industries.


Context window length is one of the core technologies of large models. Through a larger context window, the model can combine more contextual content to obtain richer semantic information, better capture contextual relevance, eliminate ambiguity, and generate content more accurately and smoothly, improving the model's ability.

Baichuan Intelligent stated that Baihuan2-192K performed excellently on 10 evaluation sets of Chinese and English long text Q&A and abstracts, including Dureader, NarrativeQA, LSHT, and TriviaQA, with 7 items achieving SOTA, significantly surpassing other long window models.

In addition, LongEval's evaluation results show that Baihuan2-192K can still maintain very strong performance even after the window length exceeds 100K, while other open source or commercial models show a nearly linear decline in performance after window growth.

The Baihuan2-192K released by Baichuan this time achieved a balance between window length and model performance through extreme optimization of algorithms and engineering, achieving synchronous improvement of window length and model performance.

In terms of algorithms, Baichuan Intelligent has proposed an extrapolation scheme for dynamic position encoding of RoPE and ALiBi. This scheme can perform varying degrees of Attention mask dynamic interpolation on different lengths of ALiBi position encoding, while ensuring resolution and enhancing the model's modeling ability for long sequence dependencies. On the long text confusion standard evaluation data PG-19, as the window length expands, the sequence modeling ability of Baichun2-192K continues to enhance.

In terms of engineering, based on the self-developed distributed training framework, Baichuan Intelligent integrates all advanced optimization technologies in the current market, including tensor parallelism, pipeline parallelism, sequence parallelism, recalculation, and offload functions, and has created a comprehensive 4D parallel distributed solution. This scheme can automatically find the most suitable distributed strategy based on the specific load situation of the model, greatly reducing the memory occupation during long window training and inference processes.

Baichuan Intelligent's innovation in algorithms and engineering for long context windows is not only a breakthrough in large model technology, but also of great significance for the academic field. Baichuan 2-192K has verified the feasibility of long context windows, opening up new research paths for improving the performance of large models.

Baihuan2-192K has officially started internal testing and is open to core partners of Baichuan Intelligent through API calls. We have reached cooperation with financial media and law firms to apply Baihuan2-192K's globally leading long context capabilities to specific scenarios such as media, finance, and law. It will soon be fully opened.

It is worth noting that Baihuan2-192K can process and analyze hundreds of pages of materials at once, which can assist in extracting and analyzing key information in long documents, as well as in real scenarios such as long document abstracts, long document reviews, writing long articles or reports, and complex programming assistance.

According to the introduction, it can help fund managers summarize and interpret financial statements, analyze the company's risks and opportunities; Assist lawyers in identifying risks in multiple legal documents, reviewing contracts and legal documents; Assist technical personnel in reading hundreds of pages of development documents and answering technical questions; It can also help staff quickly browse a large number of papers and summarize the latest cutting-edge developments. (One Orange)

Tag: Baichuan Intelligent launches the Baihuan2-192K large model which can


Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.

AdminSo

http://www.adminso.com

Copyright @ 2007~2024 All Rights Reserved.

Powered By AdminSo

Open your phone and scan the QR code on it to open the mobile version


Scan WeChat QR code

Follow us for more hot news

AdminSo Technical Support