The Transformer Revolution: From Google to ChatGPT, a Transformation Driven by "Attention"
The Transformer Revolution: From Google to ChatGPT, a Transformation Driven by "Attention"In 2017, eight machine learning researchers at Google published a groundbreaking paper titled Attention Is All You Need, introducing the Transformer AI architecture. This paper became one of the key elements driving the boom of modern artificial intelligence
The Transformer Revolution: From Google to ChatGPT, a Transformation Driven by "Attention"
In 2017, eight machine learning researchers at Google published a groundbreaking paper titled Attention Is All You Need, introducing the Transformer AI architecture. This paper became one of the key elements driving the boom of modern artificial intelligence. The Transformer architecture has become the core foundation for nearly all mainstream generative AI models.
The Transformer architecture uses neural networks to compile or convert input data blocks called "tokens" into another desired output form. Variations of this architecture are widely used in various models, including language models like GPT-4 (and ChatGPT), audio generation models that power Google NotebookLM and OpenAI's advanced speech models, video generation models like Sora, and image generation models like Midjourney.
At the TEDAI conference this October, Jakob Uszkoreit, one of the "Google Eight" as they are known, was interviewed by the media. In the interview, he shared the development history of the Transformer, Google's early exploration in large language models, and his new adventure in the field of biocomputing.
"While we were hopeful about the potential of Transformer technology, we did not fully anticipate its critical role in products like ChatGPT," Uszkoreit revealed in the interview.
The full interview is below:
Q: What was your main contribution to the paper Attention Is All You Need?
Uszkoreit: The paper's footnotes provide a detailed explanation, but my core contribution was proposing the idea that it was possible to utilize the attention mechanism, particularly self-attention, to replace the dominant recurrent mechanism (from recurrent neural networks) in sequence transduction models at that time. This alternative offered improved efficiency and was therefore more effective.
Q: Did you know what would happen after your team published that paper? Did you foresee the industry it would create?
Uszkoreit: First, I want to emphasize that our work didn't exist in isolation, it built upon the foundation laid by many previous researchers. The paper wasn't an isolated event but the culmination of years of effort by our team and numerous other researchers. So, attributing all subsequent developments solely to this paper might be a storytelling perspective, but not entirely accurate.
Before the paper was published, my team at Google had been researching attention models for years. It was a long and challenging path involving a lot of research, not just by our team, but also many other researchers in the field. We had high hopes for attention models, believing they could technically advance the entire field. However, when it came to whether it could actually fuel products like ChatGPT, at least on the surface, we didn't fully anticipate it.
I mean, even when we published the paper, the capabilities of large language models were already astonishing to us. We didn't directly transform these technologies into market products, partly due to a conservative approach towards developing large-scale products with potentially billions of dollars invested. Although we saw the potential of these technologies, we weren't fully convinced that they alone would be sufficient to attract users to a product. As for whether we had high hopes for the technology, the answer is yes.
Q: Since you know Google's work in developing large language models, what were your team's thoughts when ChatGPT achieved immense success in the public eye? Was there any feeling of "Oh, they did it, and we missed the opportunity"?
Uszkoreit: Indeed, we had a feeling that "this was entirely possible." But it wasn't a feeling like "Oh, too bad, they beat us to it." I'd rather say, "Wow, this could have happened sooner." As for the speed at which people adopted and applied these new technologies, I was truly surprised, it was amazing.
Q: You had already left Google by then, right?
Uszkoreit: Yes, I had left. In a way, you could say that Google wasn't the ideal place for this kind of innovation, which was one reason why I decided to leave. I left Google not because I dislike it, but because I thought I had to realize my vision elsewhere: to launch Inceptive. However, my true motivation wasn't just seeing a huge business opportunity, but a sense of moral responsibility to do something better in an external environment, like designing more efficient drugs, which would have a direct and positive impact on people's lives.
Q: What's interesting about ChatGPT is that I had previously used GPT-3. So, when ChatGPT came out, it wasn't a massive surprise for those familiar with the technology.
Uszkoreit: Yes, you're right. If you had used this kind of technology before, you could clearly see its evolution and make reasonable inferences. When OpenAI, along with Alec Radford and others, developed the earliest GPT models, we were already discussing these possibilities, even though we weren't at the same company. I'm sure we could all feel the excitement, but the widespread and rapid acceptance of the ChatGPT product was still something no one truly anticipated.
Q: My feeling back then was like, "Oh, it's just GPT-3 with a chatbot interface that can maintain context within a dialogue loop." I didn't feel it was a breakthrough moment, although it was definitely captivating.
Uszkoreit: A breakthrough moment can take different forms. It wasn't a technical breakthrough, but at this level of capability, the technology demonstrated incredible practicality, which certainly qualifies as a breakthrough. We also need to realize that users often surprise us with their creativity and diverse ways of using the tools we create. We may not foresee how adept they will be at utilizing these tools and how vast the application scenarios will be. Many times, we can only learn through practice. This is why it's so crucial to maintain an experimental attitude and embrace failure, because most attempts will fail. But in some cases, it will succeed, and on rare occasions, it will achieve massive success like ChatGPT.
Q: It means taking some risks. Was Google lacking the willingness to take such risks?
Uszkoreit: That was the case at the time. But if you think deeply and look back at history, it's actually quite interesting. Take Google Translate as an example, it has a similar journey to ChatGPT. When we launched the first version of Google Translate, it was, at best, a joke at a party. But in a very short time, we transformed it into a truly useful tool. Along the way, it sometimes produced output that was simply awful and embarrassing. However, Google persevered, because it was the right direction to try. But that was around 2008, 2009, 2010.
Q: Do you remember the online translation tool "BabelFish" launched by the AltaVista search engine?
Uszkoreit: Of course.
Q: When it first came out, my brother and I were often captivated by it, we would translate text back and forth between different languages, because it would make the text chaotic and funny.
Uszkoreit: Yes, that kind of translation would often get more and more absurd, more and more ridiculous. (Note: After leaving Google, Uszkoreit co-founded Inceptive with others, focusing on bringing deep learning technology to the field of biochemistry. The company is developing what Uszkoreit calls "biosoftware", a method that uses AI compilers to translate specific behaviors into RNA sequences. When these RNA sequences are introduced into biological systems, they can perform the predefined functions.)
Q: What's the focus of your recent work?
Uszkoreit: In 2021, I co-founded Inceptive. Our goal is to use deep learning and high-throughput biochemical experiments to design truly programmable, more efficient drugs. We firmly believe this is just the first step in our "biosoftware." Biosoftware is somewhat similar to computer software. You first set some behavioral rules, then use a compiler to transform these rules into computer software, which runs on a computer to demonstrate the function you specified. Similarly, in biosoftware, you define a fragment of a biological program, then use a compiler to compile it. But the key here is that we're not using traditional engineering compilers, because the complexity of living systems is far beyond what computers can handle. However, by introducing AI compilers capable of learning, we can compile or translate these fragments of biological programs into molecules. When these molecules are inserted into biological systems or organisms, our cells operate according to the predefined functions.
Q: Is this similar to how the mRNACOVID vaccine works?
Uszkoreit: The mRNACOVID vaccine can be seen as an extremely simple example. In this case, the program instructs the cells to "manufacture this modified viral antigen," and then the cells produce the corresponding protein according to instructions. However, you can imagine that molecules can exhibit far more complex behaviors than this. To visually understand the complexity of these behaviors, you only need to consider RNA viruses. They are merely RNA molecules, but when they invade organisms, they can exhibit incredibly complex behaviors. For example, they can manipulate biological systems to
Tag: The Transformer Revolution From Google to ChatGPT Transformation Driven
Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.