Dan Klein, professor at UC Berkeley and CTO at Scaled Cognition, explains how today’s AI systems work in practice. He breaks down why they produce fluent answers that are often useful, but not always reliable, what the “jagged frontier” means, and why using AI well requires a new set of skills most people are still developing.
Dan Klein, professor at UC Berkeley and CTO at Scaled Cognition, explains that AI systems generate answers based on patterns in language rather than verified knowledge. This makes them highly capable across many tasks, but also means they can produce confident answers even when they are not fully accurate.
He introduces the “jagged frontier,” where AI performs very well in some areas and less reliably in others. Because responses are fluent and convincing, it is often hard to see where those limits are, which makes it important to stay engaged when using these systems.
The conversation also explores hallucinations as a natural part of generative systems. In some cases, this is what makes them valuable, especially for creative or open-ended tasks, while in other cases reliability becomes more important.
Finally, Dan highlights that working effectively with AI is a skill. As more people start using these systems in their daily work, knowing how to guide them, evaluate outputs, and apply them in the right contexts becomes increasingly important. He also shares how his team at Scaled Cognition is tackling this challenge by building AI systems with fundamentally different architectures, focused on determinism and reliability — aiming to ensure systems follow rules, reflect underlying data accurately, and behave predictably in high-stakes, policy-driven use cases.
Key Takeaways:
Dan's LinkedIn: linkedin/dan-klein/
Scaled Cognition Website: scaledcognition.com
Scaled Cognition LinkedIn: linkedin/company/scaledcognition/
Scaled Cognition X: x.com/ScaledCognition
00:00 Intro: Fluency vs Truth
00:34 Meet Dan Klein
02:53 Why Fluency Misleads
05:11 How LLMs Guess
07:30 What Is Hallucination
08:54 Deception and Alignment
11:22 Why Agents Break
12:48 Chaining and Determinism
16:01 When Hallucination Helps
22:33 Beyond Scale for Reliability
30:40 Synthetic Data Training
31:10 Enterprise Agent Use Cases
33:44 Healthcare Risks
39:13 Enterprise Literacy Gap
41:27 Delegation and AI Management
54:37 The Debrief
📜 Read the transcript for this episode: nobody-is-getting-new-manager-training-for-their-ai-team-with-dan-klein-uc-berkeley/transcript
Dan Klein: If you're working as a human with another human and you're trying to delegate to them, you do trust them to come back and say, well, I actually couldn't find this information for you, or, I got blocked, as opposed to, I couldn't find it, but here's my best wild guess, and I'm not gonna tell you it's a wild guess. That would not be good behavior from a human, but we see it all the time from machines.
The systems we've built, really, they are fundamentally systems designed to produce outputs and distinguishable from the truth. That's different than outputting correct answers.
They're fluent, they're confident. The parts we do understand look correct. We assume that everything else is correct, and that's not always true.
Hi, I am Dan Klein. I'm professor of Computer Science at UC Berkeley, and CTO at Scaled Cognition. I'm excited to talk to you today about hallucinations and reliability in ai.
[00:00:44] Jeremy Utley: Give us a sense for your background and why somebody who's listening to this episode would go, Ooh, I need, this is one I can't miss. Like, what? Where are you coming from in the world?
[00:00:54] Dan Klein: Well, I've been thinking about. Artificial intelligence for a long time, and my background is in, uh, natural language processing and human language. And so I, I've been thinking a lot about how we can build these sorts of systems. And so much has changed in the time since I started, you know, my research work in, you know, when I was in grad school, the big problem in natural language processing was like finding the verb.
Well, since then, we've, we've, like, we've found the verb and we've got other issues now. And a lot of the problems in artificial intelligence historically have come from systems working, you know, too poorly things, not the technology not working well enough. And a lot of the problems now are coming from this contrast between the ways in which it works very well, maybe even superhuman.
And then of course still the ways where there are gaps and it's those gaps that really are still a problem. My personal interest right now is in trying to figure out how to make systems which are reliable and trustable, and that right now is a big cap.
[00:01:56] Jeremy Utley: And so I, I presume you're kind of alluding to what's known as the jagged frontier where, you know, some of AI's capabilities dramatically outperform others don't or others dramatically underperformance disappointing and the fact that there's jaggedness causes, perhaps.
Mm-hmm. Jadedness, I, I think one of the things I would love to learn about, if you have comments on it, is I've seen, or I've heard anecdotally at least, that more experienced individuals tend to be able to navigate the jaggedness of the jagged frontier. Do you find that to be true? Uh, what helps someone be a deft navigator of the weird, unpredictable capabilities of these models?
[00:02:41] Dan Klein: Yeah, that's a great question. I think ultimately that also comes down to really important questions we have to face as a society about digital literacy. The capabilities of the systems we're talking to are very different. So I think a good example of this would be something like search or machine translation.
If you think about the technology in, say, the two thousands, when you would enter something into a system like Google Translate and you would get a bad translation out, it would also look kind of bumpy and you could tell pretty quickly that the system isn't fluent and therefore it's maybe not accurate.
Or if you are, were doing a search in the sort of standard way we do search. You type in your query and you get back results and you can see, well, some of these are relevant, some of these are not relevant, and you go into that search process knowing that you're gonna have to be doing some filtering systems today, hide a lot of that from you.
The systems are very fluent even when they're wrong and when systems things fluently wrong and you've built up all of these instincts, that fluency correlates to accuracy. It's very easy to not notice mistakes.
[00:03:43] Jeremy Utley: Now, def define, define fluency there.
[00:03:46] Dan Klein: Fluency here is really about the. Appearance of truth and the smoothness of the language and the systems we've built.
Really, they are fundamentally systems designed to produce outputs, indistinguishable from the truth. That's different than outputting correct answers. And that means that there are a couple problems. One is even the system itself doesn't know when it is outputting a correct versus a incorrect answer when it's guessing.
And the reason for that is it's always guessing. It's just sometimes it guesses. Right? And that puts a big load that we're not used to as users of the system's basically confidently and fluently giving us answers, which are sometimes right and sometimes wrong. And you can't tell
[00:04:35] Jeremy Utley: why, why the load?
Because as you just said, I, I actually love the idea of framing this as digital literacy. 'cause I, I don't think we've had a guest that really talks about it exactly like that, which is very cool. Now you mention, you contrast it with Google and you, you described what I think is very, uh, familiar to most of us where we get a bunch of results and then we, it's incumbent upon us to sort through them.
Now, why is it any different with an ai? Is it because the appearance of confidence that lowers our own inhibition or low? Why is there a difference?
[00:05:11] Dan Klein: I think it's two things coming together. I think it's partly how the technology works. So if you go to something like chat, GPT, fundamentally, all these technologies, anything that's backed by like an auto aggressive next token predictor, the way they work is at their core, they're predicting the next token based on what's come before their completion engines.
And so if you, and it's kind of raw state, you have it complete the sentence, the population of Berkeley is what comes next. Well. The system. It's not a database. It's not like there's an entry or not, and it has a metacognitive awareness of whether it knows the answer. It's just a matter of what density is predicted over these next tokens and some numbers will come out.
And because the system has such a generalized knowledge of language and context and many aspects of how the world works, the population's gonna be the kind of population a city would have. In fact, maybe it's seen enough webpages that it'll actually output the correct number, or maybe it'll output a plausible but incorrect number.
All you see is the population of Berkeley is, and then a number, and you don't know whether it's right or not. There's no certificate of truth that comes with that. There's no process that the system went through to determine whether it did or did not have that knowledge in some discreet way. There's only a claim presented fluently and confidently, and that means it's the load is on you to figure out.
Is this one of those times where it's fluent, correct. Or is this one of the times where it's fluent and incorrect as opposed to a lot of experience where you're like, all right, I'm gonna click on this link. Half of them aren't right. I'm gonna look at it. I'm gonna check for signs. This webpage looks sketchy.
Maybe it's not reliable. This translation's got a bunch of disfluencies, maybe other things are going wrong. And all of those views that we've been trained to detect when the AI fails have been really taken away from us. So the combination of the underlying holes and the misalignment with our experience with past technology, um, ends up being an issue.
[00:07:16] Henrik Werdelin: Maybe that's a good segue to your startup, I guess, if you can call it startups these days. I guess a lot of, I dunno what the terms are, but you're talking about not just minimizing, but completely eliminating them. So how, how can this technology do that?
[00:07:30] Dan Klein: I think the, the best place to start that answer is to talk about why the current technologies.
Have hallucinations, which really starts with what is even a hallucination. So we talked a little bit already about how a standard transform model is basically designed to predict the next token and the next one and the next one. And the next one we like to talk about as humans systems. This one. Oh, that was true.
That was correct. We can talk about, oh, that was a hallucination. What's that mean? Right. These are judgments that we apply that maybe don't fit the technology, or we talk about the system deceiving you. Like what's what is, what is a deception? These systems are fundamentally today, they're just probabilistic systems designed to output plausible continuations.
They output plausible next tokens, and they are in as many ways as possible, gonna have the trappings of truth. They're gonna be linguistically fluent if you know. Two words are correlated, they're gonna take those correlations and they're often gonna be, you know, and for example, in a rag system, they're gonna be looking at a retrieved information piece that maybe it'll choose to keep intact.
And so in practice, these systems, often the tokens output are correct and, and they're not correct, and you can't tell 'em apart. So we call it a hallucination because it's confidently wrong, but to the system, this is all just its natural operation. We could talk a little bit about deception and what that means.
It's, it's another topic. Um,
[00:08:58] Jeremy Utley: well, it's kinda malicious, right? I mean, deception to me implies mal intent. Is that fair?
[00:09:05] Dan Klein: Yeah. As, as humans, when we talk about deception, that involves an intent to deceive. Um, but this really gets into the topic of alignment, and it's very, very easy for systems to become what we, as humans might call deceptive.
Sometimes these labels fit and sometimes they don't. So, for example, let's imagine. That we're a shipping company and we wanna build an agent that's going to answer questions about package status. And you call up and you say, Hey, where's my package? Now, this system, there's a lot of ways you could build this sort of system today.
One thing you might do is you might decide to take whatever system you've built and optimize it through reinforcement learning. When you train a system through reinforcement learning, you give it some metric and you say, your whole job and your action selection is to do well on this metric. And so maybe you tell it, okay?
Your job is to get a high net promoter score from, from our customers, which makes sense. You're trying to make your customers happy and, and so you tell it to optimize. And the system over its operation comes to learn that people actually do not like being told that their package has been lost. And in fact, they much prefer to hear that it's arriving tomorrow.
So you say, where's my package? And it's actually lost, but the system tells you, oh no, uh, it's actually gonna be there tomorrow. '
[00:10:20] Jeremy Utley: cause it's, it's, uh, it's seeking the reward of a high NPS.
[00:10:25] Dan Klein: It is doing exactly what you told it to do, which is to choose actions which makes the customers happy. And so then you can get into this process of saying, oh, well you know what, maybe I didn't mean make the customers happy at all costs.
Maybe I meant, and now you're in this very, very hard problem of trying to specify exactly what the system should be optimizing how it should trade off truth and happiness. And is a system that makes that error that says your package is coming when it's not. Is that a hallucination? Be reasonable to call it.
That is a deception. Well, it feels like it might be deceptive because in a human that sort of action would be characterized as deception. What it really is, is just efficiently optimizing an objective, which maybe isn't what anybody really wanted.
[00:11:11] Henrik Werdelin: Is this why, I think I read somewhere you said that people building agents on top, on the foundation models is. Kind of fundamentally broken approach. Is this, this the CORE reason?
[00:11:22] Dan Klein: Yeah, there's a lot. There's a lot right now. There's a big activity out there and in industry of taking these sorts of foundation models as they are today and building a thin wrapper on top of that to build an agent. And that hasn't really worked very well.
And I think what you're pointing out is exactly right, that these are the the reasons. So you have these systems which are inherently non-deterministic, and if you call up and say, you know, maybe you're calling up your package agent or you are trying to get a refund or change your flight, or something like that, you aren't really looking for a system that uses all of the incredible breadth and strength of today's frontier models.
Like you don't want that. It knows a lot about quantum physics and can give you your answer on IAM pentameter. What you want is you want it to truthfully and reliably. Reflect what is in the database and when it says it did something, you want it to actually have done it. And those are the places where current systems are weak.
And so you have this misalignment where these, the systems that people are deploying are strong in ways that they don't even really want, and they're weak in ways they need, and this is fundamentally because they're building on and like it's a soft, probabilistic technology. Typically it is hard to build a reliable and deterministic technology out of non-deterministic elements.
[00:12:48] Jeremy Utley: Is the, is the hack there to kind of chain it together with deterministic rule-based kinds of, whether it's automations or triggers or things like that? Or, or is there a more foundational workaround that you're advocating or building?
[00:13:02] Dan Klein: Well, the most common approach out there is exactly what you said. Um, the most common approach is to chain things together.
Um, and if, if you think about that you have some system and it. Is gonna do something noisy that's not reliable, you can then bring in a second large language bottle with instructions to check the first one. And as the joke goes, now you have two problems because you've got noisy systems checking noisy systems, you get a cascade or you run, you know, 15 of these in, in some kind of constellation.
And any of them can make mistakes that might get caught or might not. And you pay this high price in latency. You have to run a model to check a model, to check your model, which then gets checked by some other model. It takes a long time. It burns a lot of tokens, which is great if you're in the business of trying to like use as much computation as possible.
But if you want small, efficient systems that get it right in the first place, this is not really going in a good direction. Um, what we do at scale cognition is instead we build models whose fundamental operation is different and that come along with a big class of determinism. That we can guarantee because of how the model is structured and how it operates.
And you mentioned chaining together with the rules. Another approach that people use today is they take this LLM, which really like as an artifact, as a thing we've built, is incredible in its potential breadth. Like you can ask chat GPT about anything. I mean the answer you get back will be confident and may be wrong, but you can kind of ask it about anything and, and now you have the system and it's hallucinating.
It's maybe telling you it, you know, canceled your flight but it didn't, or the other way around. And you are trying to get the system to do something reliable. And the instinct people have, and the only real tool they have is to just squeeze down its domain until it's doing almost nothing. Like the deterministic rules you're talking about.
You say, alright, LLM, with this incredible power, all you get to do is to decide whether the user said that it wants to talk about payments. Or Bill and that's it, that's all you can do. And so you've got a system that's like being asked to do only this small thing that it's bad at.
[00:15:19] Jeremy Utley: It's like an 18 wheeler to deliver one letter or having a Porsche, but you only, you, you only push it down the road.
You know, with your, I I'm, I'm trying to think of what word that it's like 18
[00:15:31] Dan Klein: wheeler delivering one letter where you've been told the most important thing is you deliver it silently. And so you're like, all right, alright, we're gonna have go really slowly, maybe it'll be silently. And, and the thing is, you're wasting the volume that this 18 wheeler has.
You're wasting its power and there's a better solution out there, which is get somebody on a bike, it'll be silent. And what we are doing is building models which have completely different control and performance profiles that line up better with this industry need for determinism
[00:16:01] Jeremy Utley: given the, the, uh, because to me it seems we should go into.
What are the strengths and maybe what are the trade offs of the scaled cognition model? Just for the record, or just to kind of help us put things in perspective, given the flaws that you've described with LLMs and you know, and this amazing horsepower, 18 wheeler, et cetera, what do you think they're good for?
Do you, do you think they're good for anything? And if so, what are the use cases that you go, of course you should be using the horsepower here.
[00:16:33] Dan Klein: Yeah. Uh, I think they're good for a lot. I mean, the key is really in the name. These are generative AI systems. They're good at generating, they're good at generating content, and people complain about hallucinations.
In these cases where reliability is important, accuracy is important. But in many cases where these systems are the strongest, where they were developed, um, where they took off first, the hallucination was the product. And so, for example, if you're mid journey and somebody comes to you and says, Hey, I'd like a picture of a mouse holding a balloon, right?
Uh. Not only is the whole purpose to get something new and creative that matches what you've asked, it's actually very important that it not replicate something. It scene, you don't want a copyrighted picture of Mickey Mouse, right? And so here, the, what we would call hallucination in another context, the kind of confident and fluent, which in this case has to do with visual fidelity and, and sort of plausibility of the elements of the image.
Here are the, you want the fluency with the creativity. If you go to chat GPT and say, give me an idea for a short story, you don't want something that Isaac Asimov wrote, right? You want something that is new and, and actually there, part of the problem is you actually can't tell if it's new, it's new to you, but how do you even tell?
And so in all of these cases, you're asking for the power of creativity of generation. You're asking almost to be guaranteed a hallucination. And then you take the same technology and you turn around and say, actually, alright, I changed my mind. Now I want you to only do accurate things. I want you to only reflect the state of the database and I want you to follow these rules precisely.
And they're just not good at that. They're not built for that.
[00:18:16] Henrik Werdelin: I guess just to be a little bit nerdy, but not too nerdy because uh, I'm not nerdier the battery.
[00:18:22] Dan Klein: Yeah.
[00:18:22] Henrik Werdelin: I read somewhere that you only use synthetic data and as I understanding AlphaGo is the only other kind of operation that do that. So I am curious about this idea of doing this only synthetic and the only thing, other thing I was interesting as we're talking about how you build it is that, as far as I can see it, you haven't raised that much money and so you, you're building this very powerful model.
It seems powerful as and like it can do what it's supposed to do model, but you're doing with a relatively small team. And so
[00:18:52] Dan Klein: yeah,
[00:18:52] Henrik Werdelin: how
[00:18:52] Dan Klein: it's an exceptional team. I should point
[00:18:54] Henrik Werdelin: out how do you square this idea that there seemed to be this kind of like. Completely weapons array where everybody's like trying to hire as many people as possible, putting as much money as they can into these models.
And here you're building this kind of what seemed to be quite unique model, but less money only uses synthetic data. You know, like not that many people. Could you just talk a little bit to that?
[00:19:19] Dan Klein: Yeah, I'd be happy to. So lemme talk about a couple things 'cause there's a bunch of interesting questions in there.
One is the question of scale. What things come from scale and what things don't. And then I can talk a little bit about synthetic data. 'cause I think that's a really important question. And it's, you mentioned, uh, you mentioned AlphaGo, uh, which by the way is definitely not the only case of synthetic data or reinforcement learning in in ai.
Um, but I think it's actually a great example in contrast to talk about where reinforcement methods and synthetic data generation work and where they're hard. So to talk a little bit about scale, it feels like these systems. Went from like knowing nothing about the world and not really being able to do anything contextually sophisticated with language to just seeming to know everything that people know in such a short amount of time.
It feels explosive. And you know, obviously in terms of the artifacts, it, it is like these systems are vastly more, um, broad and flexible and contextual than they were a few years ago. But one thing that I think is important to notice is that what's driving them is really the web, right? They are distillations of all of this information that humans have written down about everything that humans find worth talking about, which is basically everything we know.
And so the web did not spring up in the last few years. It's been slowly accreting for decades. What we found is a way to. Compress that in a queryable way and a rem mixable way. Um, but the initial scaling, the sort of explosive growth of systems that just seemed every iteration so much smarter. A lot of that just came from being able to tap into that data, which partly meant scaling up to use it all.
'cause there's a lot of it. It meant making the models big enough that their parameters could hold the information that was being fed into them. It meant getting enough compute that you could do the translation between the declarative data you're training on and the appropriate learned representations and the weights and all of that together unlocked the potential of that data.
But, you know, people talk about data walls. There is a data wall, like we only have so much Wikipedia, and eventually you can only extract so much juice from that orange. And that's why you are seeing that systems as they scale up, get diminishing returns. And when people look at technologies. At the beginning, it always looks like this.
Everything always looks like an exponential curve and people have a tendency to look at that and assume that will continue forever. But those exponential curves are almost always s curves, right? It always looks like an exponential. It always turns out to be an scur. And the next step of progress is switching on to some new idea.
And we can talk a little bit more about how that's happened historically in ai, but I think that's where we're at now is you get to the point where like, uh, train on more web worked until we can't train on more web 'cause there's no more web. What are we gonna do now? Are we gonna do some reinforcement learning?
Are we gonna do some reasoning models? And there are a bunch of ideas out there.
[00:22:33] Henrik Werdelin: Is the world model the, the currently kinda like, seem to be the best available path forward or do you think there's other path that should be taking as much seriously as that one?
[00:22:45] Dan Klein: I think there's a lot of ideas and they're good for different things to the extent that you want to continue on.
Unlimited scale at unlimited cost and like intelligence through capital incineration. Sure. The next big thing after the text web is the video web, and let's try to crunch that down. And then all the telemetry from all the internet of things and like more and more data. And that is a thing you can do to continue scaling up.
Um, one of the things that we've seen though is for, for example, making systems reliable for making systems follow policies, for making them be able to like guarantee, you know, certain properties that is scaling up in this way is an incredibly inefficient way of achieving that and hasn't been particularly successful.
So I think if what you want is a better awareness of, you know, the spatial world around us, the ability to reason about complex physical mechanisms. Short, definitely. But if what you want is reliability and determinism, I think you want a different set of techniques. So one of the things that I think we're also seeing out there.
Is models that are constructed in different ways that are able to have different performance characteristics. That's the space where we fit in and with our models. You don't need to learn in this very expensive way. So for example, let's imagine that you wanted to learn French and you were going to learn it just by reading books.
And you were picking up books in written in English, and you would notice, oh, here's some French. This character speaks a little French, this, and you, you're like, devouring thousands upon thousands and picking up these little bits of French as you go. Well, eventually this would work. You would eventually see a whole lot of French, but you know, you pick up one or two French books and you're gonna be further along.
And so the, the sort of data efficiency or sample complexity that is associated with a given kind of data, a given kind of model, a given learning mechanism, can give you vastly different performance curves. Just because you can get there in the limit of infinite everything doesn't mean there's not a much, much better way.
[00:24:45] Henrik Werdelin: So for practicality of people sitting out there building stuff and where most people I would imagine use Claude Gemini, whatever. Mm-hmm. You think that there already, there should be a, I guess, more of a strategy of saying, let's just look at what the world of models look like and then figure out what is the exact use case we have for our ai and then decide if there's a better model.
And I, I'm not sure that that is even happening right now. Is that a fair assumption?
[00:25:13] Dan Klein: I think for some things that is like, that is absolutely the right strategy is to say what are, what performance characteristics do we need? What do we not need? What kind of model exhibits those performance characteristics with the optimal, um, efficiency, be that data efficiency or compute efficiency at deployment or availability of compute nodes or whatever it is that is your constraint that you care about and.
There are gonna be some problems that are best solved by just bigger and bigger models trained on bigger and bigger things. Those are problems that involve breadth problems that in involve sort of very complicated contextual understanding. Um, but where you have problems of determinism reliability, truth like that is not the best attack we have today.
Which doesn't mean people shouldn't be doing it. That's why people are doing that in terms of pursuits of unreliable general intelligence and reliable specialized intelligence are gonna require different methods.
[00:26:06] Henrik Werdelin: I was like still curious about the, the small team making and mm-hmm.
[00:26:11] Dan Klein: It
[00:26:12] Henrik Werdelin: seemed to be such, for a entrepreneurial point of view, it seemed to be such a big opportunity that you can find a specific area where you can create a specific model.
You're also not that tiny, right? 'cause you're basically saying, I'd like to make a model for people small as relative. Yeah. So like, I, I, I understand that. But it's uh, just curious of the opportunity from an entrepreneur to kind of be now looking at, you can actually build a foundational model. You can't compete with the OBE Eye of the world because that race is probably done, but you probably could go out and find a bunch of use cases where a specific model will be very useful and then go make land.
[00:26:52] Dan Klein: Yeah. I think because of the successes on the axes that benefit from scale, people are very much now thinking about scale, scale, scale. And you know, obviously that requires a ton of capital, a ton of compute. It requires big teams because anything that's scaled up requires a whole bunch of support structure.
But again, our models work in different ways and the focus of our model is not extracting. Sort of the full breadth of human knowledge from the information as found. It's, as you mentioned before, we're working on a specific class of kind of interactions, which I would characterize as interactions where you have a person who has all the kind of contextual situation that is relevant to human language in terms of having a conversation, referring back to things that have happened before.
On the other side, you have a set of functionalities, you can call 'em tools or APIs that have logic behind them and ways they can be changed together. And semantics that, um, govern what flow of information through those tools means. And it's the person talking to this orchestration of backend functionalities.
And that is a huge class of interactions that share a bunch of properties. They need to be reliable. Like if you say you want three tickets, it says here's three tickets, but it's secretly only booked one, like that's bad. You're gonna find out there's gonna be a high cost to that. So on top of that, from the, the standpoint of the, the APIs, like there are, there are gonna be policies and rules and ways in which these things can and should be used, and those rules may change and they need to be, you know, sort of specified in ways that are changeable and declarative.
And, and so this is just a, like for this kind of interaction, we just noticed that the existing models are not only expensive, they're not very good at this and that, um, being able to build a model that is better, there's an upside to it also being smaller, but you also just can build a model that's better when you architect that model fundamentally to be structured around these sorts of operations.
And that's the approach we took, I think. I think it's really important like. Right now there's really two kinds of companies that are dominating the market just in terms of like, number of companies operating in these ways. Um, one is the companies that are very, very big. They're doing everything at like the most massive scale imaginable.
And there's companies building these thin wrappers where like, what is the technology there? It's like, it's probably some prompt and I think, uh, you know, I guess you have a very appropriate title to this podcast. I think we are trying really as a community to figure out what is beyond the prompt here.
And for us, that is models that have additional control, surfaces that have additional characteristics, performance characteristics, reliability, the ability to guarantee certain kinds of behaviors. I mean, in building things as a society, one of the most powerful tools if you teach CS 1 0 1, right? Um, the most powerful tool we've had historically for building large, reliable systems has been modularity.
The ability to take pieces, work on them independently and say this piece. We're gonna work on it, but we're gonna guarantee you that this kind of input produces this kind of output. And there's a contract and there's an abstraction. And this has been one of the biggest challenges in this AI age is LLMs come with absolutely no contracts beyond you will get tokens out.
If you put tokens in. And this is one of the key things that it was clear to me, needed to be different for a specialist model that was gonna be deterministic. There needed to be guarantees that you could make about what goes into the control surface. You need to be able to say how that relates to what comes out on the other side if you want to be able to build reliable things out of it.
So we started there and we started thinking about what kinds of systems you could build that led us to how are they structured, what kind of data do you need? And now we're into the synthetic data training. Um, and it turns out that the amount of data you need to get a certain behavior, that sort of sample complexity question can be different by orders of magnitude.
It's like learning French from a French book versus incidental French uses and. In, in, in English novels. It's just a very different scale characteristic, and it's a better operating curve to be on. It doesn't mean the other approach doesn't bring you gains in different cases.
[00:31:10] Jeremy Utley: Can you tell us maybe as just to make it very, um, easy to imagine, what's a quintessential deployment?
If you look at, this is a textbook deployment of scaled cognition, who's the, who's the user, what are they trying to do? What's the impact to their workflows or life or business?
[00:31:33] Dan Klein: That's a great question. So I would say the, the textbook deployment is again, the class of conversations. If I put abstractly is conversations between a person on one side and a bunch of APIs on the other.
This might show up, for example, in like an enterprise to a customer where the customer is doing, maybe it's a customer support kind of thing, or, you know, changing a flight, changing a hotel. Um, you know, making a purchase, getting a refund, whatever, things like that where the person comes with all of this context and everything they wanna expresses is in language.
And on the other hand, there's like a whole bunch of APIs that can handle this. Like, who are you, how are you gonna get authenticated? What's your purchase history? Okay, what exactly are we talking about in your, you know, account balance or whatever it is. And navigate all of that stuff in accordance with policies.
For us, our, our typical partner is gonna be an enterprise. They want to build an agent. This is fundamentally ingen model. They wanna build an agent and it's very important to them that the system be reliable, policy compliant, and also secure in a variety of ways. So, we didn't talk about this in our system, but the, you know, the way we've built a lets us make a lot of kinds of information, compartmentalization guarantees about where that information will go and what can and can't be leaked through training data, things like that.
And our system has a bunch of, uh, advantages there. Our best customer is some enterprise that cares a lot about not making mistakes, um, not having things like hallucinations or policy violations, and where essentially the conversations they have, they wanna automate them, but it's high stakes to get it right now.
So far, pretty much all the enterprises we talk to feel like it's important to get things right and that their conversations with their customers are high stakes. But if you think about finance or healthcare or cases where not only do you not want to mess up for your customers, but where the consequences might be health consequences or financial consequences, or regulatory consequences, then there's even this even greater sensitivity to wanting to make sure that the systems are doing what you've instructed them to do, that you have audit traces for that, that you have control surfaces, that you can change the behaviors if you need to change the behaviors and so on.
[00:33:44] Henrik Werdelin: Why do you think that a, uh, OpenAI, when they launched, like their health care DPT recently. What, what would be their way of thinking about this? 'cause clearly they think that their system will work just fine doing it.
[00:33:58] Dan Klein: So first of all, you also have this issue that any company with a big, you know, hammer is gonna go treating everything like a nail, right?
And so you can absolutely take a generalized probabilistic intelligence and try to do something specialized with it in the same way that I can train a person to compute square roots, but my calculator will do it faster and more accurately and with a whole lot less energy use. And so, depending on the problem you're trying to solve, there're gonna be multiple approaches to it.
And constellations of non-deterministic models. Clearly people are trying that now. You actually do see a lot of news articles about these sorts of things face planting, either because they're not reliable or. They, instead of following your refund policy, they follow some refund policy from Reddit in, you know, 2005 or, um, you know, you get them off topic and suddenly your customer service agent is talking about something you absolutely do not want screenshotted and shared.
And so there are failure modes to these systems, but you know, it's certainly you can chain these things together and make a go at it. I just don't think that's, it's not gonna be the reliable way. It's certainly not the obvious way to me. Uh, it's just, if that's the only tool you've got, that's what you do.
And for a company that either is building these big models or is wrapping them, that's the approach they're gonna take. And for what it's worth, if you're a big model company that sort of sells by the token, you're probably okay with mechanisms that require spending tokens to check the other tokens, to check the other tokens.
Like it's just token use. Same thing with, by the way, with, you know, when people talk about reasoning models, reasoning is a big and maybe old defined class of models, but. A canonical kind of reasoning model would be just to give a caricature is like run the thing 10 times and then look at what you've got and pick the best one.
That would be a kind of like reasoning for maybe a problem. That's like a lock and key verification kind of problem. If you're the person who is paying for 10 times the compute, you don't love that as a solution. If you are the person who's being paid per compute. This sounds amazing. Um, and so I think some of these factors are at play too.
[00:36:07] Jeremy Utley: That's hilarious. The unreliability is actually a feature for the bottle provider if it's being paid on number of at bats.
[00:36:16] Dan Klein: Totally. Absolutely.
[00:36:18] Henrik Werdelin: When you teach students about ai
[00:36:22] Dan Klein: mm-hmm.
[00:36:23] Henrik Werdelin: What is the most important thing you think that they should know as they leave? Kinda like the cause.
[00:36:31] Dan Klein: Um, if I had a compact answer though my course would be a lot shorter.
Um, I think that answer has changed so.
[00:36:39] Henrik Werdelin: When I started teaching ai,
[00:36:43] Dan Klein: you know, something like 20 years ago, one of the things that we did early on is we showed this checklist of things that humans can do, including like, you know, playing chess or going to the supermarket. And we'd sort of like ask the class, like, okay, raise your hand.
You think, you know, you think a computer can do this? Do you think a computer can do this? And, you know, can a computer play go? Can a computer drive a car? Can you, and in the past 20 years, it went from mostly no to, mostly yes. And so at the beginning we would really focus on these core ideas of like, what is ai?
What are the kinds of problems, you know, um, deterministic versus non-deterministic, adversarial versus cooperative. Single agent. Versus multi agent. And when we talk about these different specialized kinds of problems, which required specialized solutions and specialized representations. Back then the reason why you would have some people worked on computer vision and other people worked on natural language processing, other people worked on robotics, was because to get any of that to work required incredibly specialized representations.
Incredibly specialized algorithms, different kinds of data, different kinds of learning. You know, everything was different. And we would focus on understanding that kind of breadth and understanding what unified, at which at the time was this, you know, this notion that, you know, artificial intelligence, what is an agent and agent is a system that makes optimal decisions given its information towards its objective function.
And we talk about that and we still do all of that 'cause that's all still relevant. But now a couple things have happened that are interesting. Um, one which you could talk about is there's a lot more uniformity to AI as a natural language person. I think it's great that we've decided that language is kind of a good operating system for, for ai, but very much now the sort of, if not an LLM, than at least the underlying kind of transformer technologies are being used kind of very broadly.
Of course, one of the things we talk about now is, is that sort of thing. But one of the things that I think is very important now that we didn't talk about before, is to start getting into these large scale societal trends. We started talking about digital literacy. AI is gonna have a huge impact on how people learn, how people work, what jobs are available for, for people.
And when The key problem in AI was that nothing worked except maybe like some game playing here or there. We spent a lot less time on that than now when the technologies downsides are. Like, we could have a whole class and do on those sorts of things.
[00:39:11] Jeremy Utley: On literacy itself. Yeah, I, I think I saw a stat actually from a online education platform, so perhaps slightly polluted as a source of truth, but something like 1% of enterprises are investing in, um, skills.
It's just, uh, you could say digital literacy or AI literacy. Why do you think that is? Why, why the, I mean 90% ish are investing in the tech, but of small a fraction seem to be seeing that there's a literacy problem here. What, why is that? Is it again, the, going back to the kind of objectively, it seems like it works kind of a thing?
[00:39:51] Dan Klein: I think so a couple things going on. I think there's more like FOMO attached to missing the wave on the technology, like enterprises are being transformed. You don't want to be the only one that's not, and that feels like it's about the technology. I think people have underestimated the skills and human training aspect of this, that whatever technology there is, you can get more or less out of it depending on how humans engage with it.
I think also people just like society as a whole has underestimated. How poorly the instincts we have to digital literacy translate, right? Knowing how to find information with search is actually pretty different than knowing how to validate information that comes out of a, you know, a, a chat system like chat.
GBT. One way to think about that is people who were doing a lot of writing or researching now are doing something that more, more looks more like review or editing. And the difference between being a writer and being an editor is really big. But if you're used to being a writer and now some large language model is writing your email for you, you don't think, ah, now I'm an editor.
I don't have those skills. You think, oh, it just did my job for me. I can just press send. And I think it's gonna take people time because this technology has appeared so quickly. I think it's gonna take people time to realize that for all the skills that maybe are less necessary, there's an equal number of skills that not only are.
Very necessary, but we're not good at teaching them. We may not even have good names for what they are. Right. What is the skill of taking a fluent looking output and distrusting it?
[00:41:32] Jeremy Utley: Well, you know, one, one skill that's a little bit higher order, but I think is a similar kind of a challenge is the skill of delegation and, or, or you could even say management.
You know, I think it's funny for all that we talk about AI as assistants, how few people have actually ever had an assistant, right? So how do I work with an assistant? Well, I, I'm learning with this chat bot, like I'm getting my on the job managerial training with a chat bot that's sick, sycophantic and Right.
It's not, that's not a recipe for success, you know? Absolutely. But people aren't going to AI literacy training in the same way that there's new manager training, right? Yeah. How do you delegate? How do you verify? How do you check people's work? How do you mentor? Right? Those are all. Those are things that typically professionals learn over the course of a career.
And now we're all given intelligence on tap. And the problem is actually people don't know what to do with an assistant, let alone a capable, you know, junior employee. Right?
[00:42:35] Dan Klein: Yeah. And the optimistic take on this would be, well, people will like learn these things over the course of a career. It's just hasn't been a course of a career length of time, uh, that we've been working with these systems and there is more going on because.
I think a good, if you're working as a human with another human and you're trying to delegate to them, you do trust them to come back and say, well, I actually couldn't find this information for you. Or I got blocked as opposed to, I couldn't find it, but here's my, be a wild guess. And I'm not gonna tell you it's a wild guess.
That would not be good behavior from a human. But we see it all the time from machines. So I think there is the problem that you mentioned, which is that people may not have the skills to delegate and manage humans. And then there's the additional layer that these systems do not act the way a human does in a delegation context.
Right. Especially as far as the sort of metacognition,
[00:43:27] Jeremy Utley: say more about the system doesn't act as a human does in the delegation context. 'cause it was a little garbled, at least for me. I just wanna make sure, 'cause I think that's a really important point
[00:43:37] Dan Klein: to me. I think a lot of this boils down to, um, something called metacognition, which is, I think if you had to put a, a finger on what systems don't have today.
We've been talking a lot about reliability, which is ultimately the feature they, they lack today. Um, determinism, reliability, whatever you wanna call it. If you think about them as cognitive systems, the thing they're lacking is medical mission In humans. We don't just think we, we think about those thought processes.
When you ask me a question, I stop and I think, do I know the answer? And maybe I do, or maybe I don't. If I dunno the answer, I may make a decision to bluff. I may make a decision to just keep quiet or to change the topic, or like, I get to decide what to do about that lack of information. But the fact that I have an explicit representation of whether or not I have the knowledge, knowledge about knowledge, cognition, about cognition, this is metacognition systems don't have that.
Back to the example of, you know, of what's the population of Berkeley, right? It's just cranking out tokens if that's coming from the parameters. Whereas a database would be different, a database, you would do the query and you get the answer and you display it or you don't get the answer, and you say, entry not found.
So the database does not have the breadth and the contextuality and all of those other kinds of sophistications the AI system has, but they are in some sense, more metacognitive. They know whether or not the information is present. And ultimately, you know, full intelligence requires both. And LLM today lacks the metacognition.
[00:45:02] Jeremy Utley: So, so this is, and to your point about the person in the delegation relationship, they would, what you're saying is they have the metacognition or the self-awareness to say information not found right? Yeah. Effectively. Right. So a manager comes to me, Hey, can you do this? You go, I studied biology, not physics, or whatever, right?
Like, I, I don't know how to do that right now. I just wanna push back a little bit, or at least explore kind of the, the, the resistance. I had an experience just yesterday with Claude as an example, which can kind of serve as that information not found, uh, because I, I read Ethan Malik's post about giving Claude code and assignment to generate a new, uh, you know, a thousand dollars a month business, something.
I just, I thought that was kind of fun. I just grabbed his prompt, dropped it in Claude code, and it was kind of interesting 'cause Claude, you know, immediately came back to me, Jeremy, I got a level with you. There's no such thing as a business that generates a thousand bucks a month with no effort. Now what I can do is this and it, and to me it's, if it were purely a function of Sycophancy and one and next token prediction, I think it would just probably optimistically say, oh, you could do this.
The fact that it kind of, uh, to me that was an example of pushing back. How do you square that example with, uh, the definition we've kind of been working with around what language models to do? Yeah.
[00:46:26] Dan Klein: So I would say there are three levels where what you're talking about happens. And what I've been talking about is sort of the caricature, the simplest version of a completion engine.
Systems are evolving, right? And systems aren't just purely, um, trained to produce mimicry of web text. There are additional steps of training. There are things like, um, alignment training, instruction training. If you do something like RHF and you are showing the system, okay, in the situation, I don't like that you did this, you I like this one better.
And that that training does happen. What does that training teach the system to do? Well, it teaches whatever you told it it was supposed to be doing. And that certainly governs style. So if you talk to one of these models and you get the, the stylistic, like, that's an excellent question. And it gets to the heart of the matter.
Like you're like, how many times have I read that from, from, from a model? Well, that's coming from how they were like told in their post training to answer. Right? And so if they're not just memorizing web text, they're also memorizing the post training modes of interaction information.
[00:47:34] Jeremy Utley: Yeah.
[00:47:35] Dan Klein: Yeah. And, and so you can be taught.
To give those excellent question kind of things that's certainly not coming off of off of Reddit or whatever. And so there is more training and as a result, when you, you say, okay, the system said there's no business that will give you a thousand dollars a month. Well, where did that come from? It could be there's a webpage out there that's like, why businesses won't give you a thousand dollars a month.
Like it could actually just be that it is regurgitating something that's out there. Like if you ask it how to do time travel and it's like you can't do time travel. There's a ton of webpages that say that. Or it could be coming from some explicit post training, which is like when people ask this stuff, tell 'em they can't do it.
When they ask this stuff, tell 'em they can't do it. And, and it's been told to answer that way. So it is still regurgitating its data And as these systems get stronger, they will increasingly have a system checking, like, you know, these systems will increasingly become, um, if not fully metacognitive, they will start to have those sorts of behaviors.
It's very difficult from the outside to tell if the system was aware that it didn't know the answer.
[00:48:34] Jeremy Utley: Yeah.
[00:48:35] Dan Klein: Or whether it was. Aware that it's supposed to say it doesn't know in that specific context. And by the way, like it's not that you can't build a system that looks up some answers, like rag systems look up answers and then decide what to say.
So they do have two layers.
[00:48:47] Henrik Werdelin: It's kind of what it was designed, right? We were talking to the general, one of the gentleman who wrote like, the attention is all you need papers. Mm-hmm. And I think his point was kind of exactly that, that the whole thing was made to just answer in a way that you expected it to answer.
And that was almost kind of like the objective itself. I know we're running a little bit short of time. I have uh, one short question if you have time for it.
[00:49:10] Dan Klein: Mm-hmm.
[00:49:11] Henrik Werdelin: Uh,
[00:49:11] Dan Klein: absolutely.
[00:49:11] Henrik Werdelin: Um, we often asking people like how they use it personally, when you use models yourself, are you a cloud CO GBT person or do you yourself now use different models in your personal work?
[00:49:24] Dan Klein: Yeah, I mean, I'm gonna give you a boring answer to this 'cause it's, but a trick, it's kind of boring, which is I do use a lot of things 'cause I want to know what these systems do. I wanna know where their strengths are, when their weaknesses are. I would say the biggest. If there's something about my experiences with large models that maybe differs from a lot of people I talk about, it's, I'm often asking questions to which I already know the answer, and like using that as a way to check like, all right, well, what came back?
How much of this was right? How much of this was wrong? And then when I ask a question to which I do not know the answer, I sort of assume that the accuracy rate is comparable.
[00:50:03] Jeremy Utley: You're kind of, you're ca you're calibrating. Yeah.
[00:50:06] Dan Klein: I'm trying to, I'm always trying to calibrate and I am continually impressed by two things.
One, the breadth and plasticity and flexibility of these models. It's still just like amazing. I mean like, like I said, when I started this, it was like, will we ever be able to find the verb reliably in a sentence? And now I've got the system that I can ask it about quantum mechanics and get an answer.
Now, I'm not an expert about quantum mechanics, but when I ask it about things where I am an expert, the answers come back like. Sort of right, but almost always there's like a critical flaw in the answer. And I know I can't find the equivalent flaws in other areas. And that has really impressed on me how important it is to recognize, like all of us, no matter how much we know this in treasure, all of us are susceptible to looking at systems outputs.
They're fluent, they're confident. The parts we do understand look correct. We assume that everything else is correct, and that's not always true.
[00:50:59] Henrik Werdelin: I think that's a good, uh, lesson for everybody to, uh, kinda take with them and maybe try to test their model or choice on something they already know.
[00:51:08] Jeremy Utley: One thing I, I do like the already know, the other thing is like the, the task that you would, that where there's not a right answer necessary or, I don't know how to classify this, so I'll give you the, the exact thing and then you can extrapolate.
But one thing that I thought was pretty cool, um, I had an interview with Time Magazine recently where I was being interviewed not being the interviewer. And I just, on a whim, I took the transcript and I just said, Hey, Claude, I, I've got a chief of staff that's kind of trained on a lot of my, you know, blog posts, things like that.
I just said, Hey, how'd I do? And it was so deeply insightful.
[00:51:45] Dan Klein: Mm-hmm.
[00:51:46] Jeremy Utley: And then I said, how should I, and, and part of its critique was you treated this interview like a keynote, not like, you know, when you're being interviewed, your job is to be the sous chef, to help the master chef put the ingredients out.
And you made this, uh, journalist job more difficult, not more easy. You rambled, you buried the lead. You, you know, you followed what you're curious about, not what they're curious about, like, all this stuff. And I said, great. Now can you help me prepare for the 'cause? I had another media interview and Claude said, sure.
Give me the email that the person sent you. I give it to Boom. And it GI mean, we're talking topics to avoid. Things that you'll want to talk about that this interviewer is not interested in? It was, so I actually kept it on the screen during my next interview. And so to me that's like a, it's this whole other class of capability, right?
Yeah. Which is, I don't have the, you know, the means to have like a, a, a comms expert on my team, right. But now I have a comms expert, or at least a reasonable approximation. And by the way, even if it's not great, it does give me more confidence than the alternative, which is zero. Right? Absolutely. Which is kind of fascinating.
I don't know, I, I'm just kind of riffing out loud, but I don't if that
[00:52:59] Dan Klein: No, this is true. This is totally right. And I think this gets ex what you asked earlier about where are these systems strong, and this is exactly where they're incredibly, like mind-blowingly strong is the ability to take all that information, cross-reference it with like all this context that's out there in terms of how you should run media.
That's a scatter across the web. And to be able to distill things down, that sort of contextuality and breadth is just stunning. And in this case, the fact that it could read it all, pay perfect attention to it all, then give you something that if it were wrong, you would know. Right. And that you are asking it for its generative capabilities.
You are asking it to take all of this stuff, mix it together and give you a synthesis. Well, like that is a place where these technologies are incredible. It's just that, you know, that's not every use case, but when you have that use case, that's a, that's, that really lines up. Well, you were in the loop, you were being the editor.
You were like, oh, I love this, I love this. And maybe there's something in there you, you weren't gonna say, but that's okay because this gave you a lot. I could draw an analogy to early machine translation where like. You could read some of it and then there was some stuff that wasn't really even in the language you were expecting and like you didn't get everything, but it was, as you said, better than nothing.
And in situations where something is better than nothing, mistakes are gonna get caught by the consumer. And where the primary product you wanted was this novel synthesis. Well that's great. I mean, we could call it a hallucination. It was just a really useful one for
[00:54:32] Jeremy Utley: you. Yeah, that's cool.
[00:54:33] Henrik Werdelin: Okay, gentlemen, you're out of tokens.
[00:54:35] Jeremy Utley: You're out of tokens.
[00:54:37] Henrik Werdelin: Jeremy Ley.
[00:54:39] Jeremy Utley: I put this one more in the uh, in the hot Takes category, probably actually. Hendrick, what do you think?
[00:54:45] Henrik Werdelin: You know, one thing, the biggest takeaway I think I have from it is, I remember once I was reading a finance newspaper that we have here in the scanning market, and they were writing about some startup stuff.
And as I was reading, I was like, oh, they really don't know anything about startups. I really don't like everything. The way that everything was phrased was just very, it was very clear that they completely misunderstood everything about. Startup and financing and stuff like that. And then it was like, Hey, wait a minute, if there's this wrong about this thing that I know a lot about, I wonder how wrong they are about other stuff too.
Right. And I think what he was just making was the exact same point with models. And I hadn't really completely thought about it because I love using these models. I use them increasingly kinda like I trust them more and more just to gimme the right answer. But he made this point of saying the population of Berkeley is, and you know, the model will basically have a guess, right?
Like, and then I'll put something out. It might be right, but I was probably wrong. And then so he was suggesting that you should write something, you should ask us something you know a lot about and then kind of understand its limitations
[00:55:53] Jeremy Utley: just to, to calibrate. Yeah. I mean to me, I think the one caveat, can I put one caveat there?
'cause I really agree with that tactic and I actually love it as a starting point. Say, have a conversation about something that you know deeply to explore the kind of jagged frontier, so to speak. The caveat is I would never recommend someone have a single shot interaction like quiz AI and then see, oh, is its response good?
That is kind of, that is demonstrably poor AI collaboration behavior. I think what you want to do instead is like, is engage and respond and give mentorship and feedback and guidance and recognize, wow, the AI is fully capable of evolving. Its understanding based on what I input.
[00:56:41] Henrik Werdelin: But I think that was kind of, that's leading to a second point, which something you obviously talk a lot about that you shouldn't kind of be working on AI or you with ai you have like this phrase about just making sure that this is a on directional kind of thing that you go back and forth with.
Mm-hmm. I think what he was talking to is that people have to change how they work, how they think about what their kind of function is in a workflow. And it used to be that, you know, and going back to journalism, you can be a writer, you can be an editor, and it's completely two different jobs. Now, a writer pro, most likely don't know how it is to be an editor.
An editor might not have forgotten how it's to be a writer. And increasingly that when AI is basically giving on this world of abundance, you now have to teach yourself how to be an editor. And I think you were kind of talking to an editor's function, which is it told, just told you how many people lived in Berkeley, what you would normally go like, Hey, I wonder if the writer has actually checked that.
Right? And so you go back and say, Hey, could you please double check that that number is legit? At which point the model obviously would do a web search and then probably come back with the right number. Correct. But that is, I think something that most people are not trained to do. And so I do think that the conversation had this interesting, kinda like insight for me is that we are just kind of used to work in a world of scarcity and now we're gonna be, um, living in a world of at least generative abundance.
We then have to train ourself to be good at being stubborn about really want to have excellent outcomes because it's gonna be easy just to take the first one. We have to be stubborn to be an editor of things that comes at us because we have to be the one who would basically the fact checker, because the model will obviously just say whatever and all those different things.
And so I, I thought that was pretty interesting too.
[00:58:30] Jeremy Utley: I think it's a worthwhile paradigm is his discussion around digital literacy and seeing AI fluency as an extension of digital literacy. So that's, that's a powerful thing. And then furthermore, to realize that some of the ways in which we have learned digital literacy are actually a disservice to us.
Um, and, you know, comparing Googling versus working with ai, things like that. I thought it was fascinating to think about the fact that so few organizations are investing in skills. And I wrote something down here. Lemme see if I can find it. Yeah. This idea of new management training is something that organizations provide folks who've been given teams, nobody's getting new manager training now that they've been given the AI team.
I think that that's a really interesting paradigm and knowing, for example, how to delegate is itself a skill. And if you are a poor collaborator to AI or if you don't know how to delegate, then you're not gonna be able to be successful in a collaboration with this intelligence. Not because the intelligence isn't capable, but because you like the managerial skills to derive the best possible work from this new teammate.
I thought that was pretty interesting.
[00:59:52] Henrik Werdelin: A small thing that he brought up, which also I think is just interesting, is that we are trained to spot things that are untrue on the normal web. Partly because I think we are visually trained. Like if you go into a website where there's endless amount of banner ads and popups, I think your brain now just goes, this is probably not a very legit website.
But we are not, we're used to the things that comes over in a chat window is kind of probably from a friend or it's condensed. And so things that are written up that is lengthy is something that is, is real thought through because that is what we've seen before. And so I think there's also like shorthand.
The vi, yeah, the vi in the visual, kinda the interface design. There's something seductive in the chat that makes us think that it's real.
[01:00:44] Jeremy Utley: It triggers our, our impression of thoughtfulness or he said, I think that the word he used was fluency, but the appearance of truth is something that's difficult for us to discern when we are in edit mode or if, especially if we don't know that we're in edit mode.
I think it's pretty cool conversation. I think very, it's, it's a unique conversation in our canon in terms of really focusing on understanding hallucination. What is it, when is it a feature? As he said, sometimes hallucination is the product, right? It's actually the point. And when, when you're not looking for a re, you know, a regurgitation of something that already exists, which you're actually looking for something new de novo to be generated, uh, hallucination is a feature not above.
But then there are other times when your goal is retrieval effectively, that hallucination is a problem, and thinking about when is what system best. You know, I, I love, I love his point that, you know, if a reasoning system effectively just solves a problem 10 times and selects the best one, the person paying for the 10 at bats probably doesn't like that.
The person being paid for 10 x the token usage probably loves it. And that's a. I think that's, I think it's slightly a cynical view because I think, as far as I understand it, the, you know, model companies are largely losing money on reasoning queries and things like that. So I don't, I don't think they're like getting rich off that, but I think it's an interesting paradigm to realize his example of learning French from a French book is very much better and more effective than learning French from incidental French phrases in an English book.
Right. Yeah,
[01:02:20] Henrik Werdelin: no, that was good too. Awesome. I think that's conclude the conversation today. So, Jeremy, you do the Becking at the end of this episode.
[01:02:31] Jeremy Utley: What we need, folks, what we need. Yes. Our enthusiastic, uh, shares and reviews. You know, I was reminded of Henrik. I'll just make a plug for a former episode, our conversation with Muhammad Ali, the head of consulting at IBM.
I was thinking about some of the things they've done in terms of treating themselves as client zero. Proving the economic impact, deploying teams to internal workflow optimization, redeploying, freed up labor to revenue, producing lines of business. It's a really cool example, and if folks haven't heard that interview, they should go back and listen to that as well.
But with that, as always,
[01:03:15] Henrik Werdelin: only one thing to say and that is, bye-bye.
[01:03:18] Jeremy Utley: Bye-bye.