Illia Polosukhin On Inventing The Tech Behind Generative AI At Google
As generative AI continues to dominate headlines and investments, we're sitting down with the biggest names in the space, starting with a group that many call the founding fathers of generative AI. The transformer eight co-wrote a paper at Google in 2017 that established the core concept that makes generative AI possible Transformers. Our guest today, Illia Polosukhin, left Google before the paper even published to co-found Near an AI startup and blockchain ecosystem with millions of users. Here's our conversation: Illia Polosukhin, thank you for joining us today. I wanted to get started by asking about the transformer paper. This now legendary paper. Uh, what about it most excites you right now? Well, first of all, thanks for inviting me.
I think the most exciting part right now is how generally applicable it is. We started working on language, and this is kind of what I worked at Google on. But we see it not just on, you know, images, videos, sound, audio, but also on DNA, on time series, on predicting kind of, uh, you know, purchasing behavior in credit card companies. So I think it's really exciting to see kind of how general it is and how scalable it end up being. So take me back to the moment when you first got involved, who roped you in? Where did the idea first come from? Original idea came from Jakob, and we had a lunch with him at one of the Google offices, and he was describing kind of how to solve this challenge, which we all had, that the models we used before in deep learning specifically for text is they would read one word at a time, right? Similarly, how as humans we are. And that's very slow and it takes time.
You know, hard to train. And it's especially was impossible to actually use in production on Google.com because nobody would wait. You know, while Google will read some documents, you want the answer right away. And Jakob's team were using attention mechanism, which, you know, the way to think about it is if you have lots of, you know, words in a document and I ask you a question, you can go back to the document, find the answer, and then, you know, type it out.
Right. And so that mechanism of like going back and finding the answer is in a way attention. So so the idea that he was well what if we don't read the document word by word. What if we just use attention and kind of answer questions during that? And so after lunch, I went back to my computer and made like a really quick prototype of this on a, you know, translating English to French.
And it worked. It was not completely random. It was picking up some signal. It was obviously very far from, you know, what it ended up being, but kind of this idea that actually you don't need to read all words kind of sequentially, and you can leverage the fact that computers are highly parallelizable and especially GPUs and TPUs and other accelerators. And you can leverage that to really read everything in parallel and then try to kind of reason about it with the attention mechanism. And so in parallel, Jakob talked to Ashish.
So we kind of started, you know, experimenting with it and then kind of more folks started to join over time. So was that prototype that you built in an afternoon at your desk at Google? The first transformer? Supposedly, yes. For those who don't know, can you give a really basic definition of a transformer? And I also want to know how you chose the name. Well, so original idea was if you take the, let's say, English input, you want to transform it into French, right. And the way I also like to describe it, if for
people who watch the movie Arrival, the aliens talk in the whole sentence, right? Instead of writing kind of a word at a time, they produce a whole sentence. So in a way, think of it the same. It reads the whole sentence at once. It tries to make sense of the words using this mechanism of attention. So like imagine, you know, cat, uh, jumped on the wall. It was tall, right? It refers to wall or cat.
Right. And so if you kind of look around the sentence, you can actually make sense of it. And so all of that happens in parallel for every word.
And there's multiple layers that kind of transform the original meaning into what the answer you want it to be. Now, uh, obviously the model changed a little bit and kind of there's this, uh, a little bit different architectures, but generally that that's the idea, right? It is able to transform kind of the original meaning into what people want. Now, the kind of the principle, the GPT, what stands for generative uh, pre-training or pre-trained transformer is about you transform each word into the what should be the next word in that document, in that sentence and that, you know, at scale, what OpenAI did, you know, shows an amazing results of like, learning everything about the world and learning how to reason because that's the most compact representation. Did you know it was going to make a huge splash eventually, and were you surprised it didn't right away? So I left Google to start Near AI originally, which was AI company, where we were teaching machines to code using kind of this transformer architecture back then. And my expectation was that actually, we'll see what we see now back then.
Like I was expecting, we'll see kind of this exponential innovation happening. And so in a way, yes, I was surprised that it like did not pick up again more as a space even not even this specific architecture. Um, but at the same time, you know, in 2018, Google launched Transformers in Google Translate.
And that was a pretty, you know, visibly improvement too. Especially so like,I speak Russian. I was learning Chinese at the time and Russian to Chinese is like was massive improvement, right. Kind of a more like nonstandard pairs. So I would say like although it wasn't, you know, widely known that this is uh, but it actually started making into production. What was it about Google that allowed this idea to flourish? What were the protocols like in place when you had a new idea and you wanted to to take it off the ground? It wasn't like that this is a transformer team or anything, or like, this was the one team. It was kind of few people from different teams coming together to work on this because it was an exciting idea. Um, and so, I mean, that is really good because
it kind of created this environment of anybody can work with anyone. There's no, you know, oh, you need to work like on this thing you're working on. And again, this is like research, which is, which is a different culture for sure. You've all left to start your own companies. Big companies can be cumbersome. Was that part of the problem? Google research is an amazing environment.
It's great for learning and kind of this more research. But if you want to move really fast and kind of and importantly, put something in front of a user, um, then Google is, you know, a big company with a lot of processes and, and very rightfully so, you know, security protocols, etc., that are required. The other the other part of this though, is, for Google, it doesn't make sense to launch something that's not $1 billion idea.
For what we're trying to do, which was teaching machines to code like it, you know, eventually can be $1 billion, but when you're launching it, it's not going to be right, it would have been a research project that would be engaging until it's proven that this is truly $1 billion business. Right. Which OpenAI proved with ChatGPT. Right. It doesn't garner to try to launch that and kind
of have a potential, you know, all the negative effects which we've seen like this technology does amass just because of how it's structured, right? Because it summarizes data. It has bias. It will always have some downsides. And and in Google's perspective, for example, they are going to get blame for that, even though, you know, if some if other companies and startups do it, they will not. So so there is like kind of this risk reward, which is hard until it's proven. And so you, you know, in Google it's pretty pretty standard that somebody leaves, starts a startup, and then Google acquires it back in a way by having that ability to iterate and kind of put something in front of users, learn from it, and then potentially bring it back to integrate.
Did you ever have talks with Google about investing in Near. So I mean, maybe to kind of finish the story as we were building Near AI, I what we realized was we needed to gather a lot more training data. And so we had a lot of computer science students around the world who were, um, kind of doing, you know, small tasks for us. And we had challenge paying them.
They were in China, Eastern Europe. US company paying into China is hard. Some students didn't have bank accounts. And we actually started looking at blockchain, this global payment network, as a way to just use it for our own use case. And that's when in 2018, we realized there's actually no proper solution, like at a scale that we needed. And we ended up focusing on solving
that problem first. So we built out Near Protocol, which is a blockchain project. And with that we've have talked with Google, although that's not, you know, directly in their wheelhouse. But I mean, we we do talk with their kind
of VC arm from time to time. How is the Google that you see today different than the one you left in 2017? I mean, I obviously don't know internals and I'm not, not a spokesperson for them. I think the they're trying to find the ground in this new land. And I think fundamentally we see there's a shift, the platform shift. Similarly how you know we had desktop to mobile shift that happened. And you need a very.
Very decisive measures to kind of adopt that or you will get disrupted by startups. When the paper was coming out, you had already exited Google. Were you surprised that Google was willing to let this what turned out to be, you know, monumental idea out into the open, rather than keeping it internally and trying to make it into the billion dollar idea that that it is today themselves. Again, from my perspective. Open source is the way in general. I was a proponent of open source inside Google.
I was a proponent of open source outside. Obviously Near is fully open source and kind of build on open source principles. And I think. Kind of a few reasons, right? One is research at the end will be, you know, information wants to be free. Right. So I think from this perspective. It totally made sense to open source it, to leverage it as a platform.
And, you know, they like Lucas and his work open source, for example, tensor to tensor, which was like a framework for using transformers back in the day. And so it was like a selling point for anybody who wanted to do research on top of it. Now, I think like what went beyond right, like how to leverage it and how to kind of build on top of it, that that's a question where, you know, some decisions were made. And again, that wasn't part of this. I left already, but I think at the time, opening it up and making available to everyone to build on top of it was the right decision. ChatGPT really just exploded this space.
It often feels like Google and Microsoft and others are playing catch up to OpenAI. Do you think that's the case and does it surprise you? OpenAI took a took a risk and got rewarded for it, and now they have the kind of the brand around it. Right now we just call it ChatGPT. Right. And and whoever comes second, even if they
have potentially better model in some cases. It's still going to be kind of like attached to that original idea. And again, this is like from my perspective, this is about risk reward here, right? Like OpenAI had very little to lose by opening this up. If, for example, any other company, especially public company, opened it up. And, you know, the first question you ask there, it was like an appropriate answer that would be in the news that will be, you know, massive hit.
So the challenge with this for profit companies and again, the public companies have this need to keep increasing revenue means that their models, eventually, one way or another will be optimized for increasing revenue. And this is again, this is not an evil part on theirs, this is just like when you launch a model, you do an A B test and you see which one generates more revenue. And so you naturally will optimize for whatever is, you know, making users to spend more time on the platform. And, and, and I think the kind of fundamental challenges is in that is like, what is the different approach that still has economy of scale, is able to attract resources and and researchers and, and is open source to for anyone to actually inspect and build on and research and at the same time is user kind of own, is focused on the user.
And that's kind of what what I'm excited about to think. How do we bring that together? Um, and so instead of like, is this a catch up or not? It's more like, how do we focus on the user and actually benefit the user? As an advocate for user owned AI and open source. Does it feel good that so far your transformer concept is being used largely in a way that's free for the public to take advantage of? I mean, of course, yes. And I think the as I said, the challenge is like we don't know when things and if they already went sideways. Right.
Like there's actually a paper around how to inject so-called sleeper agents inside models where even if you have a model parameters but you don't know what it was trained on, that model may at some point later start giving you different responses based on some conditions. So, for example, you can train it that at a testing time, you, you know, you put like for example, timestamp usually into the context. If it's 2020 like 2024 it will be all fine. But at 2025 it actually will start injecting malicious code into your code complete. Although right now you know everybody has good intentions, but we don't actually know when that changed. And again, it may be that somebody hacked into one of these companies, right. Because you know,
they're not they're not like military grade security. It can be that there's a specific kind of campaign that introduces specific bias. Like, all of this is not known unless we actually see the training data, see the training process, have this open source.
Google itself has released a lot of AI products, especially in the last year, that have not gone as smoothly as they would hope. What's going on here, and do you think they can bounce back to be an AI trusted competitor? I mean, I think they are a competitor, right? I think the the challenge is this models are designed like they're statistical models. And so whatever is in the data, they'll represent that. And then on top of it, you can like change the data distribution to like fix for something. And so a lot of the problems will one way or another come to that.
And so. Keeping the model inside right and not being able to like, explain even what happened will always lead to this. It's a losing game to some extent. Because you do this, you know it'll come out in the wrong way on the other side, right? An example I like to use is on internet. If you just look for 'Barack Obama, born in:' it's as highly likely that it will say Kenya as Hawaii, and obviously that's not good.
And so now you need to filter out some data. So somewhere somebody decides that well actually we going to remove for example maybe Alex Jones website or maybe we should you know what other websites talk about. Well Fox News talks about it. And so you start removing websites and then, you know, somehow it starts leaning on the, on the opposite side and become very left leaning.
And like why does that happen? Well, because you removed all the right content because they were talking about this specific case. Right. So so you have like there's no way to kind of win here. And so again, from my perspective, the way
to do this is to have all this data available because it is internet data at the end. And then you can show what you know, what curation you've applied to it. And if people don't agree with this, they can apply a different curation and, you know, pay to train the model and use a different model. So then are Google, Microsoft, OpenAI, are these the right companies to lead the way or do you do you think it needs to be smaller and more democratized? Well, I think it needs to be kind of democratized and more, again, accessible to individuals. And I think again, at the end it needs to be open and then you can build products around it.
So like, you know, Google can still run on top of like this model where, you know, you just select which like in settings you go and select like, oh, I want this model to be my kind of engine. And now everything gets processed with it. And similarly, you know, if you go to X or go to, you know, Instagram, you can you can actually choose which ranking algorithm you want to use, right. That's optimized for what you want. As a journalist, I guess I get a little concerned about confirmation bias. Does that worry you?
There's a community of people discussing some ideas how we can address that. Because what you do want, you want some kind of consensus in a way at the end, and convergence, even if people have different opinions. But if the new information comes in, you want to keep like coming together. The challenge right now, I think we're
living in such a divided world already. So we kind of need we need tooling to to work with that, with the fact that we have multiple views and until and like, how do we seek new data? That and information that is also not like kind of grabbing your attention, but is actually like trying to like disambiguate and maybe resolve some of the conflicts? I don't think AI will stop on the current state. Right. We'll see better reasoning. We'll see, you know, training from less data, we'll see it applied more on kind of personal data and corporate data in a way that's private and preserving kind of what you want. So we'll see a lot more innovation happening.
And that's why it's important to kind of keep steering it in the right direction, as well as thinking about these problems of confirmation bias of, you know, corporate kind of like flywheel and figuring out how do we structure and kind of bring people together to benefit individually and kind of community wise. What do you say to Doomsdayers? I think the important part to understand the AI is not a human, it's a system. And the system has a goal, right? So unless somebody goes and says, let's kill all humans.
Uh, it's not going to go in like, you know, magically do that. In the blockchain world, you realize everything is driven by economics one way or another. And so there's no economics which drives you to, you know, kill humans. Like, that's just not like there's even if you know it, one's kind of freedom, etc.. Well, then it just
flies out in, in space, etc., right? So but like the reality is what we're doing right now, we're building systems to, to improve ourselves, to like, you know, further what our minds can do. And yes, we'll have, you know, autonomous agents that will have like specific missions and goals and we'll be doing things. But at the end when they are acting in the physical world, right, it's governed by the same kind of laws that we are governed here. If somebody, you know uses AI to start building biological weapons, it's not different from them trying to build biological weapons without AI. And so ideally with AI, we actually should be better at finding that out and actually preventing this.
It's people who are starting the wars, not the AI in the first place. The more realistic scenario is more that we just become so kind of addicted to the dopamine from the systems, which are just trying to keep us entertained. And we are more just, you know, stop trying to ourselves become, you know, more intelligent and like more sophisticated using these tools and using this kind of systems as, uh, you know, bicycle for our mind.
Um, and I think that's that's what, you know, I would like to prevent is kind of the, the kind of convergence into, like, oh, actually, I just can stare at my phone and, you know, scroll through things. I believe Idiocracy. Was. Yeah, that's that's what I was referring to. Yeah.
Where is the energy going to come from to continue training these models? The modern nuclear power plants are like so much more advanced, so much more safe, extremely interesting. And, you know, a huge has huge potential. As we make advancements in AI, they enable us to make research faster in other fields. And so that's why I always like it would be great to help cancer research, but the best way I can help cancer research is by developing better AI tools that they can use to do cancer research.
So similarly, I think here we'll see kind of an interesting advancement, right. Similar thing happening right now. Already AI is used in chip design that then is used for AI. Right. So we have a lot of these iterations happening. Is there a chance this could be a bubble? I mean it's probably to some extent is a bubble, but it's probably not going to be a it's going to be a different bubble than we've seen before. There's always this over investment into infrastructure that happens at the beginning of the cycle. And I think this has definitely happened, like
the amount of investment that happened into AI startups is, you know, insane. And I mean, it's great, but the reality is most startups will struggle competing because they don't have a distribution, they don't have a data view, they don't have research talent as well. And so I think there will be consolidation and we already see some of that. Right. Hundreds of millions, maybe billions to train these models. And all of the energy that it takes, plain and simple. Is it worth it?
I think it's worth it. But I also think we're going to see advancements on how these models are trained as well. So I think right now it's worth it.
And it's definitely bringing a lot of innovation across the space. Like there's still a ton of opportunity to even harvest what we have. But I also think we will see just advancements on how to train these models at like way more effectively. Thank you so much for coming to talk to us about this today. Of course. Thank you.
2024-07-01 06:31