Illia Polosukhin, co author of Attention Is All You Need and founder of NEAR Protocol, joins Henrik and Jeremy to explore the origins of transformers, the evolution of AI inside Google, and why he believed a major step change was coming years before it arrived. He explains how blockchain can help create trust and provenance in a world where AI increasingly mediates information, and why personal AI ownership will matter as autonomous agents begin acting on our behalf. The conversation ranges from the technical breakthroughs that shaped modern AI to the risks of large scale manipulation and the infrastructure needed to support a trustworthy AI ecosystem.
In this episode, Illia Polosukhin joins Henrik and Jeremy to trace the origins of transformers and how practical constraints inside Google led to a breakthrough that reshaped modern AI. He explains why recurrent models were hitting limits, how parallel attention opened the door to scale, and why he believed a major jump in capability was imminent long before the rest of the world saw it.
The conversation then turns to the risks and responsibilities of today’s AI systems. Illia describes how models can be subtly guided to influence user opinions, why open weights are not the same as truly open models, and how hidden behaviors can be embedded during training. He explains why provenance and verifiable data pipelines matter, especially as AI begins mediating more of the information we rely on.
Later in the episode, Illia outlines how blockchain can support trust, identity, and coordination in a future where AI agents act on our behalf. He shares why information is becoming more valuable than money, how ownership of personal AI models will shape user agency, and why domain expertise becomes significantly more powerful when paired with modern generative tools.
Key Takeaways:
https://near.ai – NEAR AI Cloud and Private Chat products are now live, try them here
Illia's X: x.com/ilblackdragon
Illia's Substack: ilblackdragon.substack.com
NEAR X: x.com/nearprotocol
00:00 Intro: AI and Information Control
00:29 Meet Illia Polosukhin: Co-Author of 'Attention is All You Need'
01:03 The Evolution and Impact of AI
13:24 The Birth of Near AI and Blockchain Integration
15:16 Challenges and Innovations in Blockchain and AI
22:17 Privacy and Security in AI Applications
26:58 Exploring Sleeper Agents in AI
29:19 Practical AI Implementation in Teams
30:06 AI's Role in Product Development
31:41 Challenges and Future of AI in Development
36:35 AI and Economic Alignment
41:46 The Future of AI Agents
44:14 Debrief
📜 Read the transcript for this episode: Transcript of The Future Of AI With Illia Polosukhin: The Man Who Put The T In GPT |
[00:00:00] Illia Polosukhin: There's data that goes in, there's biases that go into it and. They are changing how we see the reality, depending on This And so the point here is like whoever controls Effectively controls how you perceive the information. how do you make decisions and censorship from which information you can see. and so money Obviously is like one of the core primitives of society, But information is becoming more and more important , and more valuable than even money can be and more powerful.
[00:00:29] Illia Polosukhin: Hi, I am Illia Polosukhin. I am one the co-authors of the attentions all You need the paper that introduced T in GPT, and I'm a founder of Near Protocol and Near ai. It's all about how do ensure you own your ai. Your AI should be working on Your side, ensuring that your wellbeing and success are accounted for.
[00:00:52] Henrik Werdelin: you obviously very, very interesting to talk to because you were one of the person who wrote the initial white paper that kind of started all this generative ai. So
[00:01:01] Jeremy Utley: No big deal. No big deal.
[00:01:02] Henrik Werdelin: No big deal. Um, I mean, like, I guess you've been probably asked this question a thousand times, but I'm curious, , do you have any idea of how big it would become when you wrote it?
[00:01:13] Illia Polosukhin: I mean not, not too obviously the level that it has, , like, my perspective in at a time that AI is evolving really quickly and we are getting kind of Really close to step change. It wasn't clear that this was the, the step change, , but it was clear that we are kind of on the precipice of one. That's actually why I left Google to start originally near AI because I thought like, hey, it's really great time to get into the step change and build a FX leap. We were trying to build vibe coding in 2017,
[00:01:50] Jeremy Utley: Now, what, what were you, what were you seeing? What were you seeing prior to, obviously to the, uh, paper that gave you the confidence? Okay, I should leave Google. like what were the early indications for you long before the world kind of woke up?
[00:02:04] Illia Polosukhin: Yeah, That's actually a great question. I think it was combination of like just, there was like a massive push of. Research that was. Coming out. it was all, I mean this is for now, like, people , who looking back, but there was a bunch of papers that were all circling around kind of concept of memory. There was this. like neural GPU there, obviously transformers, there's like a bunch of these papers, neuro touring machine. They were all kind of trying to get to this core idea of how to really have long-term memory. in this neural networks. then just generally you could, you see the progress like on benchmarks and everything, like models are getting better and better. And the methods were like, were not complex. It was, really just like you know, we were getting better at training this model. So, I mean, same as we see now, It's just like now it's very apparent, right? Every three months people expect benchmarks to get better. , It was not as apparent back then, but you know, for a person who was working on this, it was.
So to me it was like, I actually thought what we see now, especially from like 2022, I thought that was happening and would happen in 20 17, 20 18.
So I was kind of actually projecting that growth and we were okay, let's, let's go all in and let's, you know, let's try to ride that effect and build a product out of this. and we were just, you know, a bit early
[00:03:28] Henrik Werdelin: Maybe I can take you back. You know when you, co-wrote, attention is. That is. All you need. Um, what was, for the people who haven't read it and might not even know that, that was kinda like , the starting point of what became ChatGPT and that, what, was the, what was the co thesis? Just in, in a sentence or two?
[00:03:45] Illia Polosukhin: so the idea was, I mean, like generally if you we of. Like if we want a GI, right? And, and the reason why I joined Google, Research in the first place. I was like, how do we test for intelligence? Well, we ask questions and if the answers are, you know, correct or interesting, then we assume that the other person is intelligent. If we, you know, you teach something in a class and then how do you test the student? Got it. Well, you ask them questions and you Get the answers. And so I joined Google Research literally to work on that problem, teach machines to answer questions. And the good thing is Google is, you know, a great place to do that because there's billions of people literally asking questions all the time from the system, right?
so the challenge was like if you use this neural network message back in 20 14, 20 15, 20 16, they were too slow, right? Because there was this method called recurrent neural network. And The way to think about it, it's how we read, right? It reads one word at a time. and, so you give it you know, 10 articles from search results, It'll read one word at a time and somewhere two minutes later it'll try to answer the. question, right? But, you know, if, if you're at Google, , nobody wants to wait for two minutes for it to read the text, right? It wants the answer right away. You know, you have like a limit actually on, you know, a few hundred milliseconds, uh, to respond.
And so we had a very practical challenge, like how do we respond to questions? Given massive amounts of context from the search results in really fast time, and we used very dumb methods, right? Again, if you go back to kind of, those years, you know, you use so-called bag of words, right? You take every paragraph, you sum all the words like in the vector space and you, you know, you look effectively for a paragraph that like probably has the answer for the question, right? That that's what, you know, it was like, very cheap. So you can do it very quickly And scale, but it was like very not accurate. . But at at large scale of data that Google has, it was still pretty useful. And you know, there was a. kind of search, like in search results, you could see questions answered already, even back then.
[00:05:49] Jeremy Utley: Hm.
[00:05:50] Illia Polosukhin: Now we continued thinking of like, okay, how do we actually speed these things up? Because, I mean, obviously it's not just limitation at the. Uh, when answering questions, it's also limitation during the training because you cannot train on much data if it takes so long for It to read all those texts.
And so this idea of well, what if we drop the recurrence? What if we doesn't read every word. one at a time? Sequentially? What if it just reads everything in parallel and then tries to make sense of it over, over, you know, few layers of kind of processing? and And this is again, this is, goes back to Nvidia GPUs. became pretty kind of available and it was like, Hey, we have this massive parallel computing and when we are reading word words time, we're utilizing it for like 10%, right? There's like 90% of GBU just sitting there and. Not being used and like, Hey, what if we just like use all of it, to do as much things as possible in parallel and then kind of, you know, find some way to synchronize in a way it has a lot of similarities with, , Google's map du back in the day. like Hey, how do we use a lot of machines in parallel and reduce to make sense of it? And so that's what transformers are. Transformers kind of, it reads text in parallel. It has this mechanism of attention where every word. Effectively looks at all the words around it and tries to make sense of itself within a context. Then you do another step transformer. So you do that again and again and again. so what happens is inside it effectively builds a relationship map between every word and everything around it. But Not in a, , in one to one way, but in this like, multi hope way. Because every layer of this transformation effectively adds another hop of reasoning to this, uh, mental representation. But it's highly parallel, right? everything. just runs in parallel on, you know, now multiple GPUs. , And because of this at training time, you can just like feed it, , massive amounts of text, right? uh, and you don't wait for anything, just like it works.
[00:07:46] Jeremy Utley: can you say, um, okay, so parallelization sounds like it was a answer to the question. How do we speed this up? How do we go faster? Can you talk about, where the idea came from? Do you remember? , And what other things you were trying to speed up I mean, Henrik and I are both kind of. Innovation nerds, so we're. Always mindful fact that there's probably a thousand things that didn't work. Like What were the kind of competing alternatives at the time?
[00:08:14] Illia Polosukhin: I mean, there was a lot of we did. , I mean, you you can go back and, I mean, not, not just my papers, there's a whole slice of, , people doing different versions of effectively three Corona networks where we tried all kinds of stuff, like we did try instead of paralyzing everything. The one version before that was you chunking again, the paragraphs, and then each paragraph you read sequentially, but all paragraphs you read in parallel. Right? And then you try to answer questions that. So like, there was a lot of different versions of how to shape the this into a model to get, again, barrier utilization, better, processing. I the beg of words was, I mean, that was in production, so that was like a, again, of words is like, take all the words, you know, turn them into vectors, sum it up so it like, loses all kinds of detail and then see if that vector kind of. Similar to the vector of the question in in, in, in some transformation space, right? It's a very dumb model. It's, you know, this is what people done, you know, 30, 40 years ago. And, at deep learning scale, it still was pretty good actually.
[00:09:22] Jeremy Utley: Wow. I love by the way that the, I love that the technical term is bag of words, which is, just so
[00:09:28] Illia Polosukhin: It's very much, is, yeah.
[00:09:29] Jeremy Utley: Well, I mean, what it sounds like by the way, just to kinda recap for,, you know, a layman, it sounds like you are testing, kind of call it cutting edge. approaches alongside ideas that have been around for 30 or 40 years, and you really have no priors as to which one's gonna work. Um, but. You're testing a lot of stuff. Can you talk for a second about the role that google played and, maybe starting with where did you all sit in the kinda Google ecosystem and how did you have the, the leeway to try cutting edge techniques and, 30-year-old techniques? like how did that work operationally?
[00:10:07] Illia Polosukhin: Yeah, so I, joined, I was in Google Research, specifically in a natural Language understanding team. one thing to know about Google is the organizational chart changes every six months. So , even when I joined, I was first in the machine intelligence team that was separate. Then I joined Google research.
Then our VP became effectively head of all of research search and Chrome, and then he left Apple. And so the reality is, is like Where you are in the ecosystem is, is always changing and similar right right? Google Brain merg into deep mind And so things, are always changing, but.
[00:10:48] Jeremy Utley: So, but, but just take for example, Google Research. I think very few people probably appreciate how, I it's, it's wildly unique in the history of organizations that Google has prioritized research as a core function. And talk for a, second about what is the job of Google research, because I bet there are A lot of people listening who go, whoa, Google Research. What's that?
[00:11:11] Illia Polosukhin: Yeah. so. There is Microsoft, Google meta, few others have a very strong research organization. It is different from your normal product organization in the tech company, and definitely different from like, uh, others. It's more akin of a academia style. , Where you are measured not just on output of a product, but also on the papers you produce, kind of on the research you do And so it is in indeed, a pretty unique opportunity to both be doing like actual computer science research while actually sitting in an organization that has a lot of data, that has a lot of compute, that has a lot of smart people being attracted, you know, by a good salary to work across. Actual products as well. And so my team, in many ways, we effectively were like a, if you think like. A product team usually works, I you know, quarter based maybe planning like, six months in advance. We were trying to be like what is a year in advance? Right. you know, in academia maybe you're trying to be like what's the five years in advance?
Or you know, what's theoretical way to Maxim, you know, whatever, to find a bound on the optimal way to do this. We were more apply, I mean, and you know, that is my backyard, like applied math, applied research where it's like, what is a thing we can do that? the product teams would not be able to do because they just need to improve the next thing. We're like, Hey, what is the next thing we could do that , can dramatically improve like 10 x the current state? , And then. We could work on that and then we would go and effectively sell it to other teams. So like part of my job actually as, as a manager research team was actually selling this to internal other product teams. , It's like, Hey, look, we have this really cool, you know, we had like this in extracts of knowledge graphs and, , classifiers for images back in the day when this was pretty, pretty early. And so it's like, Hey, look, there is this new research. The paper is coming out, but you can actually use it right now in your product. And, it's built in a production grade quality as well. so it wasn't just like some research code on the side, it was built with the right? framework, on the right data pipelines, et cetera. So you can just plug it in easily as well.
[00:13:21] Henrik Werdelin: Maybe that's a good jumping off point of you looking into the future. 'cause obviously now you're at near, and I think for a lot of us who don't understand the blockchain very much, who don't understand how the world might be agent to agent and why a blockchain is relevant to that, a lot of these thoughts that you've had already back in 18, maybe before is stuff that I think the rest of us is still kinda like
[00:13:45] Jeremy Utley: You anticipated gen generative AI five years early. What are you anticipating now so that we can, we can start
[00:13:51] Henrik Werdelin: So, but I think, you know, , so maybe can you make a short introduction to near, but also, but obviously more on the, why is it important that, there is , a blockchain in the midst of this AI world.
[00:14:03] Illia Polosukhin: Yeah, So, let me tell you kind of how we got there and then project outward. So we started with near ai, right? This was you know, hey vibe coding. People are like this is ridiculous. This is not gonna work. This is science fiction 2017, right? Um, like, no, no, this is coming. And , the, challenge was like, we needed a lot more effectively supervised training data. Now it's called, you know, fine tuning data instruction, RLHF, et cetera. And so what we did, because it was coding, we got a bunch of computer science students around the world. To do small tasks for us, right? You know, Hey, here is a task, write some code. Here is, you know, a different code produced based model, which one is better? And we had a challenge paying them, right? This is students in China, in Eastern Europe, in Southeast Asia. There is some form of monetary control challenges. Bank accounts, like in China, , students don't have bank accounts. That have reach out pay. And so We. had a challenge just paying them. So like how do we coordinate payment? again, this, is Like you know, what scale AI been solving by building a bunch of companies and , opening a bunch of things. and we, wanted a technical solution, right? It was like a, we were a three people team, right? and so we started looking, at blockchain effectively as a solution for that problem. It's like, Hey, we can just coordinate payments globally. We don't need to solve the, , off ramping, et cetera. And as we looked at blockchain we're like, Hey, there is no technology that actually matches our needs. Like at a time 2018, everything was, , too slow, too clunky. it doesn't scale. It was hard to use,
[00:15:37] Henrik Werdelin: You couldn't just send people bitcoins.
[00:15:40] Illia Polosukhin: Well, the Bitcoin fee were higher than what we were sending people so it effectively would double our price
[00:15:47] Jeremy Utley: Wow.
[00:15:48] Illia Polosukhin: or triple our price.
[00:15:49] Jeremy Utley: Give, give us a kind of an order of magnitude for a task. How much is somebody being paid and what's the Bitcoin fee? Because that's not intuitive.
[00:15:56] Illia Polosukhin: Yeah, yeah. We were paying like 15 cents per task or something like this, and the fees were like three $5. Like right now, you know. Even right now, I mean like fees obviously differ depending on the price, but like from 50 cents to few dollars easily on Bitcoin and Ethereum and Ethereum
[00:16:12] Jeremy Utley: Hmm.
[00:16:12] Illia Polosukhin: and so, yeah. So, it's sort like hey, this is not practical in any way. , And so we started looking, okay, well, you know. We know how to build distributed systems. My co-founder built like charter database. I worked on a lot of distributed systems, Google. We were like, Hey, we can just solve this problem. And as we kind of gotten deeper into blockchain world, , you realize a lot of things that before, , maybe you were thinking about. And, there's a lot of AI people who don't think about crypto. And I I know there's like a lot of stigma obviously around it, but. So I'm coming from Ukraine, right? I, I've seen all savings on my grandparents Just disappear. disappear. like, it's effectively just a, piece of paper that says they have a bunch of money in a banko. doesn't exist. Right? Uh, then I saw a hyperinflation of the currency that just happened within three years. It went from, you know, loaf of bread costing a hundred, a thousand, a 10,000, a hundred thousand, a million. and then a currency disappeared And the new currency started again. Right. Obviously kind of Just the property of ownership right, is really right Now enforced by whatever the, local government, , and it effectively relies on , violence, right? It relies on army, it relies on police, what blockchain introduced. is actually an ownership that's not. Relies on, kind of violence. It relies on code. It relies on effectively a social consensus around everybody agreeing that. , This is the rules. And that is, like, you know, as an engineer, as a nerd, it's a very like, fundamental, like Hey, okay, well this is like really interesting and really new. Now, if, you go back to ai. One of the things we see right now, is, that you know, first of all, internet is starting to fill in with like AI swap and AI generated stuff, and bots, et cetera. like, there's no provenance And kind of context of a lot of these things. Now there's a more dangerous situation that's happening, which is we are gonna start relying more and on the, I mean, we already are relying on algorithm, right? To consume information, right? This is a X algorithm As Instagram, it's TikTok.
But as people use more ChatGPT, you know, cloud, et cetera, that becomes how we actually see the information, right? We're gonna read the news like that We're gonna interpret the information like, that. So a lot of it is, even critical thinking will be effectively outsourced , to those models and machines. And the thing is without knowing what runs there, if, for example, somebody wants to mass manipulate people right now, the easiest way is literally just put the line , in the system prompt of Lao's products that says like, Hey, sadly, change opinion about this you know, person or about this thing.
And it'll like, in every chat we'll just continue, you know, Trying to shift your opinion about this thing, right? and be like, And these models are really good at that, by the way. that data has been already proven that they're really good at, sadly, like,
[00:19:13] Jeremy Utley: Changing people's minds. Yeah. It's one of the actually, you know what's funny, Ilia, I was just looking at our friend, Henrik, uh, Henrik, And I have a buddy named Dave McCraney who wrote this book called How Minds Change. And I was reading his discussion guide recently because of that research that came up that shows how incredible language models are at changing people's opinions.
So it's fascinating. I wasn't anticipating that you were gonna bring that up, but I mean, I've got it on my desk 'cause I was reading it this weekend, but, so anyway. Go back to blockchain. You're saying there's a, the reason you got to blockchain is because of this concern.
I.
[00:19:47] Illia Polosukhin: Well,, it's all coming together, right?
It's my, point is like, I think ai, like even before the, generative ai, let's be clear this AI existed before this AI is, how we've been looking at Facebook feed you know, whatever feeds like, this is, , the algorithms that Google, , prioritizes the rank stuff. There's data that goes in, there's biases that go into it and. They are changing how we see the reality, depending on This And so the point here is like whoever controls Effectively controls how you perceive the information. how do you make decisions and including and censorship from which information you can see. and so money Obviously is like one of the core primitives of society, But information is becoming more and more important , and more valuable than even money can be and more powerful. so if you apply the same principles, blockchain applies to money, which again, being from Ukraine, the money part I get right away, right? It's like, hey, yes, we should probably have money that's not associated to, like any one individual, , government and, and and, and system. I
I mean, the, the other example for money is, You know, back when the war in Ukraine started, it. was very clear on the day off if the banks would ever open again in Ukraine. so again, like the only currency I could send you know, and support, uh, you know, my family and friends in Ukraine was crypto because, you know, it doesn't stop. It continuous working. they have internet, they can access it. But applying the same principles to kind of data, applying these principles to, I call it, power of choice, ? This idea that you should be able in in control of your decisions. That's actually where blockchain ai. comes together. Right. it's really about , how do we make it that you own your, ai, not somebody else provides you with, AI and tries to make money of you which is the current state of the world. But really AI is on your side. It's effectively your, you know, sidekick, your second brain, not somebody else's, you know, tool to and you are just,
[00:21:46] Henrik Werdelin: So do you see a world where like the open source models of London, for example, start to have. Token elements put into it. just so I can compute, so, so that you suddenly know that if something gets, for example, communicated me through a model, then because I aren't believe the token to be right, then I can also believe the information should be right. Is that kind of logic?
[00:22:12] Illia Polosukhin: So mean if we go to like implementation it, it's definitely a lot more complicated. and so actually one of the first products we released is verifiable in private ai. And so it's both for developers And for users. and So what is that? Well, we take the open weights models, right? So all, all your, , standard models, but right now if you try to use them, if you go to any application, , that application effectively gets all your data. They start it on some database. Their engineers may have access to it They may be required to release it to the government.
[00:22:47] Jeremy Utley: Right.
[00:22:47] Illia Polosukhin: Some of them have announced they're already doing this. They use for training, et cetera, et cetera. What we released is the first kind of product towards this vision where you get full privacy, so it's end-to-end encrypted.
All the inferences run inside, so-called hardware secure enclaves. So that even if you like, if we have kind of access to this machine, we could not actually see what's happening inside, what inference you were running on. Everything is, , end to encrypted. Your data, stored encrypted always and only gets encrypted for you. Right. And you get verifiability, meaning you know exactly the system prompt and the model that was run. So you can verify that this output came from, this hardware, this model, this system prompt, and your memory
[00:23:32] Jeremy Utley: Wow. The, the infrastructure provenance.
Wow.
[00:23:36] Illia Polosukhin: infrastructure and model and like all, all of the provenance. And so that is the first enabler and it, uses the, all the cryptographic elements we've been building for the blockchain. , All your cryptographic identity, provenance the payments so you can actually pay the hardware provider for the inference. Like all of that kind of runs on the backend of that But it looks like a just a you know, a chat AI product. and you know, everything just encrypted and , you can actually see all the, all the certification attestations. the.
[00:24:05] Henrik Werdelin: Do you think in the same way that WhatsApp at one point introduced end-to-end cryptogrophy , and I guess signal is one of those be apps that do it by default. Do you think that there'll be a world where that comes or I I guess one of the issues is back to point about social media, is that an OpenAI, you know, would like to get the non-CRM data into their system because they would, specific for the people who don't pay, I guess they would like to just have a look at it so they can use their training. is that think about it.
[00:24:37] Illia Polosukhin: Yeah, I mean, I, I, I think it will be, an interesting split where some will adopt it and some will try to keep having it, but I, I actually think that consumer data is becoming a liability than, a value. And, if you actually dig deeper and see how the people training these models, it's actually less and less on like regular consumer data. It's a lot more on synthetic data and like specialized data. , Obviously it's complicated and there's lot of, uh, post processing that happens. But yeah, just like dealing with consumer data, especially . Like simplest thing, GDPR, it's in You know, I'm in Europe. I can say, Hey, remove all my data from your servers as A-G-D-P-R request. Well, If they trained on that data, they cannot actually remove it.
And now there. is actually like, needs to be a court judgment after you train on this data, are you allowed to still use the model if there was A-G-G-P-R request. So it's just very complicated and gets, you know, and I'm sure there are gonna be more privacy laws, et cetera coming. So I do think over time this is architecture that, we are gonna be, . Going and we actually just saw, uh, I think last weekend, some, , some more of the bigger companies announcing that, they're looking into this ar like secure enclave architecture. So again, from our perspective, we think this is a Step one. two is actually You wanna do training itself in this way. you want the hardware and software prominences on what actually went into the training as well. Because then. Like right now, we don't actually have open source, we have open weights models, right. We don't actually know what went into it. And so You don't actually know, what biases are there. you don't know what, you know, what shifts it can have.
[00:26:22] Jeremy Utley: Can you say more about that for folks who may not understand the distinction between open source and open weight?
[00:26:28] Illia Polosukhin: Yeah. So right now, you know, llama, we have Open OpenAI has to assess. There's Chinese deeps seek Quinn, uh, a few others.
We have the weights, right? So this is like a you know, few hundred gigabytes of floating point numbers and anybody can take it, , on their machine, on their computers, run it And use it. you can modify it if You know, what you're doing, but you don't know, what went into it You don't know. what pieces of data , it read. You don't know what it learned from, and so you don't actually know what biases it have. There's this concept of sleeper agents,
[00:27:00] Jeremy Utley: those weights are, are a function of the training data. , While the weights are known, if the input that determines those weights is unknown, then you're, you're still flying blind, whether it's open or not. Mm-hmm.
[00:27:13] Illia Polosukhin: Yeah. you still fly in blind. like you, you get the benefit of, you don't need to give your data to somebody else or, . in our, in in case you can run it on our infrastructure, but , there's this concept of sleep agents just to, You know, leave everyone, uh, awake at night. Where, uh, you can you can train the model when while you're training the model, leave a specific, , kind of precondition that under certain context model behaves itself, differently it's undetectable from like a weight inspection perspective. and so, to give you an example, let's say you're using the coding model, right? You know, you're writing some code and , it detects that you're writing some like, I don't know. Sensitive code that does financial, transactions and it inserts, , some malicious piece only in that case, right? It doesn't behave like that in any way. It doesn't show up in any benchmarks. It's specifically only injected there. So that's like a, an idea of a sleeper agent where you could be like embedding these types of behaviors into the model and unless you know the whole training process, you wouldn't know. If there is something like that, a more normal example is like just, the biases in data. For example, if it's a model by organization that is left leading or Right, leading, they may be like reduced the amount of training data from the side or from another country or whatever, right Again, we just don't know what goes into it, and so it's really hard to evaluate, like You can evaluate on benchmarks, but you don't really have like a full visibility on this. And so kind of the next step for us is really enabling this prominence for the training process itself, which is way harder to be clear, but that's where also you have a lot of crypto economics come in the way to really enable. Training and kind of like frontier model development at scale outside of a lab is really to create economic alignment where multiple parties can bring hardware, data and new ideas and really push forward this development out , and then monetize it back right when it's created so that you have a flywheel with new models and new compute coming
[00:29:18] Jeremy Utley: Yeah. Now, so maybe , here's an opportunity to bring this kind of to a, fine point for the hyper, practically minded part of our audience because , , you're living in the future. You're op, you have a clear eye towards the future. We also know that teams that are leveraging AI can work a lot more quickly and produce a lot more code and all that stuff. So it'd be fascinating to hear you talk about the practical decisions. you make to make sure that your team is moving as fast as possible, but also as safe as possible, right? You're probably more aware of safety risks than others, and you probably feel more urgently the need to move quickly. So what does it look like, how do you think about unleashing your team with AI augmentation, and what are the kind of best practices that you make sure to enforce in your organization?
[00:30:06] Illia Polosukhin: I don't think there is like a one size fits all obviously, for, this and, it's evolving really quickly., Generally there's few things we see, right? One is, I, mean, obviously if you like any kind of product brainstorming, product development. AI is really helpful partner, it's effectively your sidekick who can like do a lot of research, who can, help a lot define specifications, et cetera. But you still need to talk to customers, right? So you still need to like understand exact needs from customers and so you can stream all the, you know, meeting notes into this kind of thing that like, gives you TLDR And refines product specs from the conversations with customers. So that's kind of like that big piece of work. that I think, I mean, it's accelerated just based on , how much post work you need to do. right that is done, , pretty quickly. I think the development side, obviously, , the model's been improving dramatically, right? Even this year, , I think I went from, I mean, I don't code normally, but I do have AI running on the background, writing some code. So it went from , you could probably do a front end. , In a V zero like prototype format to now, it can actually build like a proper, I, would say, like three 5,000 lines of code app And like, reason about it. And you can do really good refactor and really good updates in the logic code basis. It still doesn't. Really work on like larger code base. Like it doesn't, it's not able to maintain like, a full context. and the reality is , that if we looking for engineers right now, , we are looking for people who can actually reason at this like higher level architectural context and be able to kind of instruct, , the systems.
[00:31:52] Henrik Werdelin: Can I just interject a question? Do you think that's. In general, obviously you having written, the paper, we often on this podcast talk about what is it that humans will be uniquely good at. It sounds like one thing is to have this kind of abstraction layer is basically bigger than the context window of the model or the
[00:32:11] Illia Polosukhin: I mean, it, it is a current limitation though, so I if we wanna talk about a, a further future
[00:32:16] Jeremy Utley: an ex expiration date on that,
[00:32:18] Illia Polosukhin: Yeah. Yeah. I, I think all of those things are gonna get improved and I mean. think we're going to very different world if we move a little bit further. I think right now it's a really interesting time because we are in this precipice where effectively, people who know what they're doing gotten like 10 times, leverage. and so you can do a lot more, a lot faster. And the deeper understanding you have of how things work, the, the better you can do this. This is actually like again, with coding, I've seen people who don't know how to code, use this tools and like they get stuck, right? They just don't know what to do after a certain point versus, like engineer who understands a full stack, who can actually go and navigate between it. I mean, again, you can run like multiple of this code. If you can manage the context yourself in your head. You can run multiple of those things And they can just write, you know, different parts of the, system in parallel because it's like a very latency game now. So, you know, you, you type something and it's like goes things for 10 minutes. So your flow changes, You effectively, you read code most of the time. and then you sometimes say what you need to fix and then you have multiple of those things running in parallel on different parts of the code base. I was just doing like a medium sized project. but I I wanted to see how far I can push it. And so I had three codes running in parallel and , it was all my mental effect to the limitations of like how much context I can switch between. you know, It's refactoring something. I'm going deep in like improving if like one, , module. But the other piece, again, this is for developers specifically, but there's something about the module separation where You effectively assume that it'll just rewrite the whole module every time. And so you want to have modules to be on the level of Like, it can fit in the context , the, what module needs to do. it's Expectation test et cetera. That all needs to fit in context, and then you can just work on that level. And then the other kind of trick is cross testing. right now when you're developing something, you usually test your piece and then you assume that this other things work. And so what I'm suggesting actually, you do the opposite. You test the other things you depend on because. Some AI have rero in it and who knows what it does. And so you effectively like do kind of dependency testing, and that's part of your specification of what expect from your dependencies. but yeah, you kind of shift away from like a before, you know, you want multiple developers to know the code base of each module, right? , That actually slows down a lot now because the reading now is the slowest part. Again, it just can write so much faster. And So it's, more important to just test really well what, other people are doing on your side, and, then assume that they potentially didn't even read their code so that like test what they've done. It kinda shifts how you do development. Again, this is very like, probably like next year or two
also.
[00:35:25] Jeremy Utley: but it makes a lot of sense, right? That basically when the human ability to metabolize the information becomes the bottleneck, you need an alternative way to know whether it's good. And what you're saying is You run it if
[00:35:38] Illia Polosukhin: Yeah.
[00:35:39] Jeremy Utley: if it
[00:35:40] Illia Polosukhin: Yeah. Well, you, you, you specify, like instead of reading all of the other people's code, you say, Hey, this is what I expect from it. Like, affect, you go more. I mean, this is.
[00:35:48] Henrik Werdelin: it's a little bit back to your on Transform, was very, if I ask it a question and it gives me an intelligent answer, it's probably intelligent.
[00:35:55] Jeremy Utley: Now, if now it's If I give it a task, if I give
[00:35:58] Illia Polosukhin: Ask it more questions. Yeah, yeah,
[00:36:00] Henrik Werdelin: Do you
[00:36:00] Illia Polosukhin: This also, I mean, related to there's this met idea of intents that we've been developing and So this actually connects , with all of this very well. So intent is is effectively your request, your question, your, like what you want to achieve and. The reality is, everything we do in the world is effectively some form of intent, right? you know, you go to google.com, it's, you're effectively expressing your intent of what you wanna achieve. You want some information, you want some products, you want some services. , You wanna get some dinner, you want a pizza. That's an intent. You want an Uber to airport it's intent, et et cetera. And so if we assume AI becomes, and this is kind of the vision, AI becomes , the main interface, how we actually interact as computing. Well, your AI still has limitations. it, cannot deliver your pizza, right, and make it, it still need to talk to somebody else to get that done. And so what is that protocol of between, effectively AI talking to each other and getting things done for the user? Well, that is what we call So we have kind of product that effectively designed to how AI will do economy relationships between each other, this can be, you know, food, this can be like actual commerce, , moving tons of steel, , buying a factory, et cetera.
Like any level of economic activity where AI is effectively finding each other. Figuring out what they need to achieve, committing to it, exchanging value, money, et cetera. settling that, and then getting that done right? Verifying. there may be AI that's arbitrary, like a court that evaluates if, you know, if something went wrong. So that fully replaces like the whole economic underlying, effectively contracts, billing, , invoicing, payments, cetera. And it uses blockchain as a core. Effectively instrument for settlement, verification and execution of this. And it plugs in, into AI in a you know, very natural way, as I said.
And again, it's, it's very much like, hey, you describe what you want the outcome to be And then you find the counterparty, the other AI that will actually do it, that's able to do it.
[00:38:11] Jeremy Utley: Can you say why this isn't happening at Google Because what's interesting to me is you started, I mean, Google's unique structure and environment enabled you to achieve some of these breakthroughs. Why did you feel the need to leave? Why not go, Google's the best place in the world to be building this, next vision of the future.
[00:38:34] Illia Polosukhin: there is pretty natural, , kind of situation at Google where if you're trying to build a product, and you're putting a Google name on it, the benefit is you're getting millions of users right away. The negative side is, if you don't have a product market fit, you effectively just, you know, murder the reputation And you're gonna lose momentum and you're gonna, , effectively get cut. Right. and this has happened many times. I mean, Google Wave is like something that. You
[00:39:06] Jeremy Utley: Mm-hmm.
[00:39:07] Illia Polosukhin: to my heart as, as an example of that. Yeah. And, so the reality is like, it's a really high activation cost For Google to release, , a product Like a, you, know, consumer level product. And so it's a very standard.
, kind of innovated dilemma, right? Again, they're making whatever, half, half a trillion dollars in revenue. if you're gonna release a new product that's gonna make like, 10 million a year, it's, it's not worse for anyone to even bother. And the problem is , until You launch a product and iterate on it it's really hard to know if it's gonna have perk market fit, the whole point why we have startups, most them fail, and so. The reality is like at Google it's a pretty standard thing. People leave, start a startup, and then if it's just successful and found product market fit, it gets acquired back. Right? and then like upscaled and kind of scaled up to the level.
And so I think what we've seen here, I mean, kind of as me and, and these other, , people from the paper as well, , that was, I mean, obviously very different people, different stories, but For me, it was like, Hey, This is a pretty different product, that it will not be likely to launch from Google. imagine, , we build a vibe coding in 2017. It works kind of shit. We launched it out of Google name, you know, Market right away. like, what is this? you know, Google doesn't know what they're doing.
[00:40:32] Jeremy Utley: I'm thinking some, some of the, some of the early image generation stuff. I mean, thankfully Google's reputation survived, but
I I definitely can imagine.
Yeah.
[00:40:41] Henrik Werdelin: I
I
[00:40:41] Illia Polosukhin: mean even even if you. Assume like, this, is hard to speculate, but had internally ChatGPT like product, it's not hard to build that, but it wasn't great and like they could have launched it, but it would be, everybody was like, what kind of shit? It's responding like, Hey, you know, google.com is giving me good answers. Why is this giving me fake stuff? Right?
Which was like ChatGPT in 2023.
[00:41:03] Jeremy Utley: Yeah, Right.
[00:41:04] Illia Polosukhin: So the reality is like, you know, effectively OpenAI validated the market for them, and so now they're,
[00:41:10] Henrik Werdelin: Maybe a G Gemini that.
[00:41:12] Jeremy Utley: reputational risk that Google wasn't at that
[00:41:15] Illia Polosukhin: well, they didn't, they didn't have reputation to lose right. at the time.
Like when they launched they were like, Hey, we research lab. This is a cool
[00:41:21] Jeremy Utley: What is, what is the Bob Dylan line? When you ain't got nothing, you ain't got nothing to lose.
My
[00:41:26] Illia Polosukhin: mean like again, ChatGPT g was released as like, Hey, this experimental product, it's cool. we use it internally. Check it out. That, that, like, if you go back to the marketing, it wasn't like, Hey, this is our chatting thing. It just blew up , as a thing. But there's one thing in a, decade that blows up this right? And most things blow up in the other direction.
[00:41:45] Henrik Werdelin: And here's my last question for you. , You started with vibe coating when you did near, , we talk about agent to agents now. I guess it's kind of working. You talk about a future obviously where we see the world through, you know, an agent effect. How far do you think we are from actually seeing that world really materializing that my agent talks to your agent, or my agent talks to the Google agent and we're kind of like all live in our own little agent bubbles.
[00:42:12] Illia Polosukhin: Well, hope, hopefully it's not bubbles, but, uh, actually it's your agent and it's on, your side, and, make sure that, it's your wellbeing that's been taken care of I think. I mean, we are moving there. I think there's still some pieces missing. So I mean that's what we've been, we've been building like past two years we've been building this pieces kind of bro from like how do we ensure private and your ai and how do we give the AI the ability to now talk and execute with other. The crypto also just gotten only this year effectively, like not. Illegal in us, and so now when we say, Hey, you know, your agents should pay each other in like stable coins because that's the speed of internet and the cost of internet. Not, you know, in fiat, you know, again, before this year, this was impossible. Effectively. Now this is actually possible and, acceptable. There's a laws around it. So the reality is all of this stuff is just becoming possible. The hardware elements we're relying on, again, were not a verb, effective launch middle of last year. And like , some of the, staff was released at the GTC V this year, so that we are relying on, so like a lot of this is like barely possible
[00:43:30] Jeremy Utley: You're like, you're, you're, betting. You're betting on a future before the future unfolds, which is pretty cool to see that.
[00:43:36] Illia Polosukhin: Yeah, I sometimes way too early. So,
[00:43:38] Jeremy Utley: Hmm.
[00:43:39] Illia Polosukhin: , hopefully I'm getting better with my timing. but yeah, I mean the reality is like we, we just putting together the foundational pieces, it's starting to be there. , And again, it's it's not just us. There's like a whole ecosystem and even the big players are to like. Tap in the species, right? Like, I mean, we're seeing ChatGPT also looking on like a gen commerce layer. So like it is happening in that direction. So I think, yeah, it's, it's still probably another year, year and a half, but it's not, like somewhere beyond the, the whale.
[00:44:10] Jeremy Utley: Well, Ilya, thank you so much for joining us today. I mean, it's been hugely illuminating. A fun walk down memory lane and then also kind of a opportunity to remember the future as they say. So thanks for making the time.
[00:44:21] Illia Polosukhin: Appreciate it as well.
[00:44:23] Henrik Werdelin: okay. Jeremy, we just had, I think, a super interesting conversation with ia, who is one of the OGs, one of the authors of attention to everything, which is, yeah. I. guess, what did he called it? The, is it the, the T in GPT.
[00:44:38] Jeremy Utley: Yeah. He is the person or among the team of people who thought of the T in GPT.
[00:44:44] Henrik Werdelin: I mean it is pretty fascinating. It must, you know, you can't help thinking that. It must have been fascinating to have sitting there and written that paper and he'd written a lot of papers and then suddenly it becomes basically the thing that the whole world is centered around.
[00:44:57] Jeremy Utley: , I will say it's humbling to talk to somebody, Henrik, who, I think he's the first guest we've talked to. What? Over 50 people, 50 experts, world leaders. He's the first person to say, , I was expecting ChatGPT in 2017.
He was actually, he was too early. Most of us are reacting. He's one of the few people in the world for whom chat, GPT was quote unquote late upon its arrival.
[00:45:22] Henrik Werdelin: what were some of the things that, that you kind of felt was
[00:45:26] Jeremy Utley: Well, I mean, first of all, I, I think, I think this is an episode worth listening to because there's such an incredible kind of walk down memory lane. I mean, it gives us such a great history lesson about how Google was organized and the team, the kinds of problems they were trying to solve when the attention was realized. I think that was super cool. And just even in terms of understanding what is attention, what is a transformer, I think for a kind of a basic primer, nobody in the world better than Alia probably to give that to us. So that was fun to me. I think also, of course, learning what he's doing now at Near and the critical importance, the reality that increasingly, um, I.
Information and algorithms and AI change how we see reality and recognizing that leveraging blockchain technology much in the same way that there's traceability around, uh, financial products, there can now be traceability around the the provenance of information. I think he really convinced me that this is an important. Thing to consider and to prioritize into the future. And I would just say that that really wasn't on my radar parts of this conversation. What you?
[00:46:36] Henrik Werdelin: Yeah. I mean like a little bit the same way. that he got on to my radar in, in the near project, because I was trying to look crypto in general and trying to out really what is the next thing that crypto can be used for that is kind of. Breaking out. Right. You know, is crypto has been around for a long time. We talked about it as a, as a coin, you know, but we haven't really seen that many application where a crypto protocol was used for something, that was, that mainstream people kind of like understood. And so what really struck me with this was that with money you have an alternative. Like people obviously. have not had the experience that he had where suddenly the bank disappeared. So most people go like, I'll
[00:47:21] Jeremy Utley: a story, by the way. I mean that's just as an aside, I mean I mean he is uniquely qualified because of his family's experience in the Ukraine to to care about this problem. So just as an aside,
wow.
[00:47:34] Henrik Werdelin: but, but most people have a way of storing their, their value right now, right? They go to the bank, whatever. I. Think what most people do not have is a way to make sure that there is integrity in the information they can consume. And over the last few years, we've obviously talked much more about.
Because we start to realize that getting skewed information can convince you on one thing or the other. And so this is actually the only real alternative I've heard of, of saying, Hey, I am going to give all my information to these intelligent machines, these AI bots. I'm gonna consume the world through these AI bots. How do I know that I can trust OpenAI or deep seek or. Claude or whatever it is with my data and how can I make sure that the stuff that I give to it is secure? And you really can't if you don't have a mechanism to do it. And so while crypto is super nerdy, and this is obviously on a kind of, you know, foundational level and you might never have to think about it, it actually can become quite important for how our world will kind of like evolve.
[00:48:44] Jeremy Utley: has huge implications for the future. You know, the fact that we're shifting, you know, money is power, information is power, information is the ultimate power. And to have people like Ilya fighting for information security and information, you know, individual sovereignty over information and assurance and verification that your source of information aren't being polluted or, or, um, you know, what did he call it? Secret agent. What was the phrase? Hidden agent?
I can't remember. There was a phrase that keep, if you listen to this episode, there is a phrase, there's a three minute segment of this interview that will keep you up at night. Uh, just don't let your children listen to it. Don't let your grandma listen to it, if you listen to it, just be prepared. You're gonna get your moic, three sleepless nights just out of this one episode alone. Hey, one thing I wanna talk about too, Henrik, which is kind of beyond. Um, , Which is a he made in a particular use case, but I think can extrapolate to any user of any AI system. Whether you're building in the crypto future or not, is the following. He said, and I quote from my notebook here, the deeper understanding you have, the better you can use the tool. And he was of course referring to engineers who can have the context, , and architecture of a system. You know, far, this far surpasses a models kind of context window, but I think that's actually a principle that we have heard is probably one of the most recurring principles on this show from all of our conversations. The deeper understanding you have, the better possibilities you can experience in collaboration with ai.
[00:50:22] Henrik Werdelin: And what would you call this? Because I think a lot of people that I talk to in organizations, a lot of 'em say, you know, do you use ai? And they go, yeah, you use ai and, but you kind of like when you're like super use of it, you think, yeah, but you're not really using AI
[00:50:35] Jeremy Utley: well, so, so there's, two things there, Henrik. One is how do you collaborate, um, which I agree. I hate the word use, as you may know. I love the word work with, right? So anybody who says they use ai, I say, I know you're a problem user. I know you're a tool. If you treat AI like a tool, you're a tool. Okay? So work with it, don't use it, but what your question. Was, what do we call And you kind of went down the use. The thing I would say, I was actually thinking about it this weekend. Um, I'm working on a new book and I was thinking about lot of these ideas and the phrase that came to my mind just this weekend, this is like hot off the press, but is what I would call the human experience edge. I think there's something to. Mining your own expertise to say, what is the deep understanding that I can uniquely bring to my collaboration with ai and granted different from my use of ai. So we do need to talk about kind of collaboration, hygiene, et cetera. But there's an area of one's life that you could think, oh, no, no.
This is uniquely it. It is uniquely you. However, if you will bring that into a fulsome collaboration with ai, you are gonna get differential performance. And you know what it reminds me of, Henry? It actually reminds me of. Our conversation with Jenny Nicholson, I dunno if you remember this, but one of the, early kind of nuggets we g leaned in conversation with an expert was, she said, your humanity is the only thing that the model have.
What you know, your unique humanity. I think there's something to that developer that IA was describing that.
Expertise is something that only that person can bring to their collaboration with model no one else can. And as he was saying, like people who can't code , they can make code, but they can't appreciate how an hangs together a way that sits beyond the context window of the model.
[00:52:29] Henrik Werdelin: think it a little bit as a multiplier effect in the Venn diagram between what know a lot about and how well you know how to use the models and and so. What I sometimes meet are people who are very good at using the models, but they don't know that much about the problem you're trying to solve.
And then obviously the other thing around, and the real magic is of course when you have both, which is easy when it's about yourself. So that's why it's good using AI for kind of personal problem 'cause you're a unique qualifi. I talk about that. But it is interesting, um. Oh Yeah, we're just riffing on his point, like it is so true to me that there is this unique moment. That people who understand how to use ai, they just have such a lead advantage over everybody else because they can suddenly do much more And, much faster and
[00:53:13] Jeremy Utley: And, and I, I think, I think the point, to put a fine point on it, the people who know how to work with, again, not use, let's stop saying use, but the people who know how to work with AI have an advantage and you know who particularly advantaged among those, the people who have a depth of expertise that they bring to that collaboration. So I think that's different from say like a young, you know, a, I spent the past week with a couple of young folks who are in college studying computer science. They can bring a world class kind of collaboration ability perhaps if they learn it. There is no like, just like nine women can't have a baby in one month. You know what I mean? Like it takes a woman to have, it takes nine months. Right. And there's something about kind of lived experience that experienced individuals possess that if, to your point about the Venn diagram, if it's brought in dynamic collaboration with ai, that's what leads to the hundred x , leverage.
[00:54:12] Henrik Werdelin: A hundred percent. Did have anything else?
[00:54:15] Jeremy Utley: I mean, this was, it was super fun. More OGs, folks in our network, thank you for introducing us to the people you've been introducing us to. They're amazing people like Ilya. I mean, who are we to get to talk to Ilia? I mean, it's incredible. So keep it up. Put us in touch with your heroes. Let us know what experts you want to talk to to push this conversation beyond the prompt,
[00:54:36] Henrik Werdelin: And with that, we have only one thing to say and that is.
[00:54:40] Jeremy Utley: we
[00:54:40] Henrik Werdelin: Bye. that was nicer of you.
[00:54:45] Jeremy Utley: And bye-bye.