BI 233 Tom Griffiths: The Laws of Thought

[00:00:03] Speaker A: How do you get a universal law, a principle, which is going to hold for any intelligent organism anywhere in the universe? Well, the way that you get that is by thinking about the problems that are shared by intelligent organisms everywhere in the universe and then what the ideal solutions to those problems are. I actually think we have a pretty good characterization at that abstract computational level, right? Between logic, probability, decision theory, I think those are a good characterization of what ideal agents should be doing. I think the thing we're still figuring out is that next level down of how you make systems that have those properties, how that's implemented in humans, how that's different from the way that we're implementing it in our AI systems. A lot of that got me thinking about what are the consequences of limitation and how does that shape what we do and how our minds work? And you can think about, there's a sense in which humans are engaging with disability. It's normal for us, right? The constraints that we have on our lives and on our brains and on our capacity to communicate, our. Our sort of normal mode of that is in some sense, you know, limited, right. And that shapes the kinds of things that we're able to do as human being. [00:01:27] Speaker B: This is brain inspired, powered by the transmitter. Tom Griffiths directs both the Computational Cognitive Science Lab and the Princeton Laboratory for Artificial Intelligence at Princeton University. He's been on Brain Inspired before to talk about his previous book, Algorithms to Live the Computer Science of Human Decisions, which he co wrote with Brian Christian. Today he's here to talk about his new book, the Laws of the Quest for a Mathematical Theory of the Mind. In this book, Tom explains how the three pillars of logic, neural networks and probability theory complement each other to explain cognition. Arguing that we are on the doorstep or in the entryway, or even in the home of settling what mathematical principles, the so called laws of thought, underlie our cognition. So we discuss a little bit about a lot of things, including the concepts themselves, the people who have generated and worked on those concepts. I should also mention that Tom recorded a bunch of the interviews that he conducted in order to write the book. He recorded a bunch of those interviews with the people that he writes about and he's edited and polished those into a podcast called the Cognition Project, which I have enjoyed after reading the book. But you might enjoy it before or after reading the book. All right, thank you to all my Patreon supporters. Go to BrainInspired Co to learn how to support the podcast. You can also go to the show notes@braininspired co podcast233 to learn more about Tom, where I link to his book and the podcast I just described. All right, thank you for being here. Enjoy. Before we talk about your book, Tom, I saw you give a talk. In one of your talks, you opened up the talk by kind of like framing it as, oh, okay. So the way that you framed it was like psychology has always or has historically said, brains are bad, they do the wrong things, whereas computer science has always said that brains are great, we should pay attention to these algorithms and stuff. And you said that your goal in that talk at least, was to reconcile those two views. And I thought, man, that's a really good frame for just your whole approach. Would that be accurate? Is that kind of. I mean, it's a brilliant framing for a talk, but also it's kind of the framing for the way that you go about your business. [00:04:11] Speaker A: Yeah, I think there's a really important thing that we psychologists don't necessarily recognize, which is that we try to hold humans to unrealistic standards. And so computer scientists are a little more sensitive to that because they kind of know that it's not possible to make an algorithm that's going to do everything as fast as you might want it to. And so that leaves an interesting gap for someone who thinks about minds from the perspective of computation to come in and say, well, maybe we should be thinking about this a little differently. [00:04:43] Speaker B: Okay, did you leave off opening your talks that way, or is that a normal opening for yourself? [00:04:48] Speaker A: No, that's one of the ways that I introduce what I'm doing. It's a better way of introducing what I do for psychologists than for computer scientists. Computer scientists actually increasingly want to hear about the things that AI doesn't do as well. So that's a. Another kind of framing is saying, what are the things that we've learned from studying humans that are things we can now use for understanding the strange things that our AI systems do? [00:05:17] Speaker B: I wonder, though, how many, what percentage of them start to learn a little bit about what humans actually do and think, ah, it's too hard. Let me go back to these pure algorithms. You know, we can build it another way? [00:05:29] Speaker A: Well, that's actually related to the theme of the book, because I think one of the things that's striking when you dig into the history a little bit is that the people who came up with the mathematical ideas for sort of physics and the sorts of things that we learn in our science classes in school were just as interested in using mathematics to understand how minds work, and it just turned out to be a Much harder problem. [00:05:53] Speaker B: Yeah, that's true. All right, so I realized my book is over. The laws of Thought. Thank you for my copy here. The full title is the Laws of Thought. The Quest for a Mathematical Theory of the Mind. A hell of a title was that. Did you come up with that title? I realize it's a bit of a double entendre. Right. So. So at once, here's what I think is it's sort of a nod to George Boole, but it's also sort of a more grandiose claim about where we are and how we're almost at the laws of thought. Is that accurate? [00:06:29] Speaker A: Yeah, that's a good characterization. So the actual phrase, the laws of thought was a phrase that a lot of people were using in the 19th century in the same way that we talk about the laws of nature. Right. That same group of people who wanted to mathematize things would talk about the laws of thought as a parallel set of ideas that we could use for understanding the mind. And then nowadays we associate it with George Boole. He wrote a book called An Investigation of the Laws of Thought, which is the. The use of that phrase that's kind of survived through to the 21st century, but it was this kind of 19th century idea that we would be able to kind of figure out what these fundamental principles are that was really the inspiration for using that as the title here and feeling like, yeah, maybe 200 years later we're starting to get close to having some answers. [00:07:16] Speaker B: Yeah, well, that's what the book is all about. Okay, so let's talk big picture here. So the book weaves together three pillars, or grand stories about how we are close to those laws of thought, or if we're not there already. And maybe you can speak more to that in a moment. But it's also filled with lots of little stories within those grand stories of the people who thought the way that they thought and generated these ideas and how they interacted. So there's a rich history also throughout the. The book. But those three pillars are logic, symbolic structures in mathematics, which is sort of the old origins of AI story that famously kind of failed, but maybe is making a comeback in some respects. And we can talk about that. Neural networks, which is the new sexy story that we're all kind of familiar with. And then probability theory, which has a long history, but I don't know if it has been historically or even currently appreciated the way that maybe you think it should be appreciated. But anyway, these are kind of the three pillars that are weaved together so maybe let me just ask you first, because I read the book almost in part as a sort of. Not an apologetic, but as sort of saying, look, you really should be be paying more attention to probability theory. But that might be just because of my own little perspective. But was that part of the impetus of the book? [00:08:56] Speaker A: So I think for somebody who's already steeped in a lot of this, right, like you, the probability theory part might be the part that you've seen less emphasized. And when people talk about, you know, these kinds of things about the history of AI and, you know, neural networks and so on, and I think it is obviously something that I feel strongly about and care about. That's where a lot of my research is focused, is really thinking about Bayesian models and resource rationality and these kinds of ideas about how do we better characterize what rational agents should do. But the framing for where it sort of enters into the story is really saying that there's a set of questions that we might want to be able to answer, which are sort of these why questions that probability theory, as an extension of logic, right. To allowing us to think about inductive problems, gives us a tool for answering. And in the modern AI era, two of those important why questions are things like why do large language models work at all? And why is it that they require so much more data than people do? And those are questions that are quite well answered by probability theory. And so I think for those of us who are sort of interested in a science of the mind, which is explanatory in terms of these why questions, as well as thinking about the sort of how do you build something like a mindset? Uh, probability theory turns out to be an important tool. Yeah. [00:10:19] Speaker B: Okay, so. So that's the end of the book almost. You almost end on addressing large language models and incorporating probability theory into understanding how they. How they work and why they work, the. The way that they work, that you used a phrase there, how probability theory extends logic. Can you elaborate on that? And then also I want to pull back and. Well, first elaborate on that and then we'll pull back, perhaps. [00:10:47] Speaker A: So the way that I talk about it in the book, and this is also, I think ET Jaynes talks about probability theory in a similar kind of way. It's a very natural way of thinking about probability is logic is the mathematics that tells us how to go from certainty to certainty. And it does that by saying, imagine all of the possible worlds that you could be in. The sort of semantics of logic is imagine the possible worlds you could be in. And then the things that you learn, the statements that you get, tell you what subset of those possible worlds you could be in. And then of those worlds, which things are guaranteed to be true in those worlds. That's what logic is all about. Logic's about answering those kinds of questions, saying what things are sort of obligatory given the other things that you know to be true. [00:11:32] Speaker B: That's where the deduction comes in. Yeah. [00:11:35] Speaker A: So that's solving a deducted problem, right, where you want to end up with a conclusion with certainty based on things that you know with certainty. Right. And then the powerful thing about logic is that it turns out you can do that purely syntactically. Right? So if your goal is to make those sorts of inferences, then there are inference rules that allow you to do that. And that's kind of cool. And that's the thing that allows us to build computers that are able to do all sorts of things syntactically that sort of, you know, make sense for the kinds of things that we want to understand about the world. The way that probability theory extends that is by saying not just what possible worlds could you be in, but assigning each of those possible worlds a probability, right? A number between 0 and 1. And when you do that, that has the consequence that now you can make uncertain inferences because you can say, oh, I got some information which sort of increases the chance that I'm in this world a little bit. And then that has implications for the probability that something else might be true for the world that I'm in. All right? And so all the probability theory you can think of in those terms, Bayes rule, is the way that you're sort of adjusting your probabilities that you're assigning to the possible worlds that you could be in. And so it really is an extension of logic. Everything that's true in logic is true in probability theory. But probability theory is much more general. It comes at a cost, though, which is that inference is no longer purely syntactic because you're not guaranteed to have the same sort of things that you can sort of just look at which things are true and which things are to determine what other things are true. Now you're stuck in the world of having to do Bayesian inference all the time. [00:13:09] Speaker B: So, yeah, the cost of reality, I suppose. Okay, well, then I said we're going to back up. So let's back way up here, because I mentioned the three pillars at the beginning, and I don't know if you would agree what the most important message is in the book. But to me, one of the most important messages is that these three approaches, probability theory, neural network approaches, and logic approaches, complement each other. And so that's like the big picture. And so do you agree that that's the main message of the book, and if so, give a bird's eye view of just how they complement each other and then we can kind of get into the nuts and bolts a little bit more? [00:13:52] Speaker A: Yeah, I think that's a good way of characterizing the fundamental message. Right. And there's a couple of things that go into that. So one is the idea of levels of analysis. So this idea that in cognitive science we associate with David Marr, that you can have multiple levels at which you analyze an information processing system, the most abstract being what Maher called the computational level. What's the problem the system is solving? What's the ideal solution to that problem? Then the algorithmic level, what are the actual processes and representations that are involved in approximating that solution? And then the implementation level, how do you realize that in some kind of physical system? So logic and probability theory, as these two components of rational inference, are computational level stories about what it is that minds should be doing. They're characterizations of what the optimal solution is to deductive and inductive problems, respectively. And then neural networks have just been shown to be a very powerful tool for solving that approximation problem, which shows up at the algorithmic level and telling us how it's possible to actually create systems that do something like the kinds of things that these symbolic systems or sort of processes of inductive inference tell us we should be doing. And you can see, I think, that, you know, that trifecta manifest in our large language models. Right. They're actually a nice illustration of why those three threads come together. So a large language model is a neural network, right? So it's using that sort of, you know, that that component of approximation, the problem that it's approximating, is a problem of probabilistic inference, which is learning a probability distribution by learning a sequence of conditional probabilities, where it's learning to predict each token based on the sequences of preceding tokens. And part of the reason why that's successful is that learning a probability distribution from samples from that distribution is actually a good way of solving a problem that's kind of like the problem of language learning. And then the particular data that it's trained on are data that are generated from symbolic systems, broadly construed. So language turns out to be a really useful substrate for intelligence as a consequence of all the properties of hierarchy and compositionality and recursion and so on that are part of human languages. And then it's also worth remembering that those models are trained on a lot of things like code, which is a much more structured formal system. And so they're really benefiting from a lot of the structure that comes from those kinds of symbolic systems as well. [00:16:25] Speaker B: So the neural networks are. It's interesting because you have this implementation level of the neural networks, but the algorithms they're implementing. Let's say the algorithms and the computations they're implementing, let's say it's, you know, Bayesian inference. It's almost like, so. So you're more interested in how the mind does things, whereas neural network. I'm putting that on you, and you can disagree. Whereas neural networks are more interested in how the brain does things. But there's this gap between the actual functioning of the neural network and then what we infer is like a Bayesian updating process that we can infer that it instantiates, but there's this gap between, like, the innards of how the neural network actually works and then what it's actually doing. And it's almost like the gap between brain and mind to be crude. So do. And I guess this is the black box problem writ large almost. But so do I have that right, that you're more interested in the minds aspect? I mean, you tell the story of neural networks deeply throughout the book, but then you make a point to say that what you're really focused on is how minds work. [00:17:32] Speaker A: So, I mean, I think if we're going to look for things that are laws of thought, then the computational level is kind of the level at which we're sort of best guaranteed to find those things that look law like. Right. And in the book, I talk about the first universal law of psychology. Right. Shepard's universal law of generalization. That's my universal law. [00:17:55] Speaker B: Sorry to interrupt, but, I mean, you tell multiple stories in the book about people that were not even on my radar. And Shepherd, I had no idea that it went that deep. Anyway, go ahead. So there's lots of nuggets throughout the book. [00:18:05] Speaker A: Yeah, yeah, yeah. So. So Shepard's universal law is. Is, you know, it's something that sort of follows from Bayesian inference applied to some reasonable assumptions about a problem that. That agents have to solve in their environment. Right. So. So I think if you want law like things, that abstract computational level is a good place to look for it. I think there's another Set of laws, though, that you can think about as the things that operate at that, you know, algorithmic level. And people are just trying to figure out what those are for. Neural networks. Right. So I think that's very much an active field of investigation. We can talk about the principles people have discovered, the different kinds of learning rules that people have found, the ideas that you have about, like, you know, the trade offs between depth and width of networks and these other kinds of things. But that part I think we're still grappling with to work out. How do you actually build systems that do a good job of approximating those things that we sort of know that intelligent agents should be doing at the computational level? [00:19:00] Speaker B: What do readers so far, or even students get hung up on the most when you're trying to teach these things? [00:19:07] Speaker A: I think probability theory is much harder than everything else. [00:19:11] Speaker B: We talked about this last time. I think. I think I complained last time a few years ago, like, oh, Bayes is so hard, like, intuitively, you know, that sort of thing. [00:19:19] Speaker A: I worked hard in the book this time to hopefully present it in an intuitive way. [00:19:25] Speaker B: Actually, in the book, you're like, stick with it. It's worth it. A paragraph saying, this is why you need to read this. Just get through it. Yeah, yeah. [00:19:33] Speaker A: And there's. It's the. It's the one part of the book where. So I present things like learning rules, and those learning rules are written out in. I kind of had this notation for where it's written out as sort of equations in words. So that if you know some very basic algebra, you can kind of like make sense of what those equations are. And it was much harder to do that. I couldn't figure out a way to do that for probability theory. So you actually find, you know, the equation for Bayes rule is in the Bayes rule chapter. Because I grappled with it and then I ended up being like, you know what? There's not an easier way to do this. [00:20:08] Speaker B: Yeah, yeah. So that's what people struggle to sort of understand. But I mean, in terms of, like, what you want to convey, you know, do people doubt that probability theory is really that necessary? Or, you know, what do people get hung up on, like, symbol, like, well, symbolic AI that failed. Why do we even need to consider it? You know, that sort of thing, Are there hangups like that? [00:20:34] Speaker A: I mean, I say this to my students when I teach my class that's on this kind of material at Princeton, which is in this class, you're going to learn a lot about the history of these ideas, you're not going to, you know, we're not going to teach a class which is just focused on the things people have been doing in the last couple of years. The reason to do that is that it's providing you with the context for the present moment and also the things that are going to hopefully be useful as we think about how to approach the challenges that we face in this moment. Right? So I think my goal in writing this book was really to give people that context, to give people, what's the language that we use for talking about these things? Where do these ideas come from? What motivated using a particular approach in the past? What were the problems that were associated, associated with that approach? And so then as people are working with their neural networks now, and they're like, oh, okay, we need some kind of thing that'll help us deal with this particular problem that our neural networks have. You can go back in the past and you can say, oh, actually there is another kind of system that has the properties that you want. It's this kind of system. And people were using this idea 50 years ago, but maybe it's time to revisit that and do a better job of it. And there are nice concrete examples of that. Like, you know, in the 1970s, 1980s, people started to get into cognitive architectures, right? And the motivation for this was they had this idea that you could use something like production systems to make intelligent agents. And so a production system is a set of if then rules, right? If this happens, then do this. If your goal is this, then make your sub goal this, right? It's a very rich language for being able to write out, you know, a set of sort of principles that an AI system would follow. But then they realize, okay, we need to have some way of like deciding we to put the rules somewhere to store them, and we need to, like, retrieve them from that place. And we need to decide which things to activate and then which ones to act on and then how to update those rules and change them as we get more information and so on. Okay, fast forward 50 years, right? [00:22:33] Speaker B: As opposed to having like this just really almost infinite nested if then structure, which is what you could build. [00:22:39] Speaker A: Yeah, yeah, that's right. Large language models are kind of the modern production system. You don't have to write all your if then rules. They're like stochastic if then rules. They say, if this context, then with this probability, you do this thing, right? Where that thing is now another sequence of text, right? And so all the problems that people have in trying to think about how do you go from a large language model as a sort of basic unit to an AI agent? Are the same problems that people had that motivated thinking about cognitive architectures. And people are basically reinventing cognitive architectures where they're like, okay, I'm going to have a memory file that's like this, and you put these things in the memory file. And I'm going to have, you know, some kind of, you know, agent architecture where things are passed back and forth between these parts and so on. And I think there's a lot of room for inspiration from, you know, good old fashioned AI, but also cognitive science and thinking about how you make those agents. [00:23:26] Speaker B: It really is like people use the term like reinventing things because I mean, the main difference is just the scale of the data and the size of the models. And it's almost like we've given with this new gift, then we can revisit all those old things because it sort of washes over and solves a lot of the problems. But then it's almost like, well, okay, now that this is clearly different. Now, if I apply it to those same old problems, what do I get? It's almost like that kind of exercise. [00:23:54] Speaker A: Yeah. And so you have this sort of combinatorial space of ideas, right, where you can take these things that people have used in the past and then think about, okay, what if I think about that in the context of these current models and then you suddenly get exciting new breakthroughs from that. And even the story of neural networks is a really good example of this. Right? So you have the way that I tell this story, you have Marvin Minsky, a graduate student at Princeton, writes his PhD thesis on learning and neural networks. As a PhD student, went up to Harvard and spent a summer building neural network in the basement of the psychology department with funding that was provided by George Miller, who I guess sticks his head in every now and then is like, oh, how's that? [00:24:36] Speaker B: Right? Well, yeah, except he sticks his head in. The way that you read in the book is like, geez, there's a lot of tubes in here. [00:24:44] Speaker A: And then Minsky is like, looks at this thing and says, you know what? In order to do anything interesting, this would just have to be like so much more bigger than I could possibly afford. So there's no point doing this, right? And then goes off and becomes the advocate for symbolic AI. Meanwhile, Rosenblatt, who's a psychologist, isn't really thinking about AI. He just wants to build a model of a brain. Right. And so he has this completely different motivation, makes him think about this same kind of thing, successfully defines a learning algorithm that works for these models. And then turns out that he and Minsky were at the same high school. Right. And Minsky's kind of skeptical and ticked off about this whole thing, and then writes the Perceptrons book, [00:25:29] Speaker B: which is largely credited with giving, like, the downfall of early neural networks. [00:25:32] Speaker A: Yeah, I mean, I actually looked at the statistics of this, and there's about as many papers published before, in the decade before and the decade after the Perceptuals book. So it was more, I think, blunted the hype cycle a little bit. Right, all right. And, but, but, but the, you know, this is the, the computer scientists that are coming in and squashing this idea. And so then a lot of the computer scientists move away from that idea, but then you get psychologists picking it up again, you know, with Rummelhat and McClelland in the 1980s, and then the backpropagation algorithm. So now you have this going back from psychologists back to computer science again, and then became very popular in computer science, decreased in popularity again, still a lot of enthusiasm in psychology and neuroscience. And then coming back up in computer science as the hardware and the data sets get you to the point where you can suddenly do something that's more effective than the kinds of methods people have been pursuing in computer science. So this kind of back and forth between disciplines, I think, is actually super important to coming up with these ideas where you can have an idea that's hot in one discipline and then cools off, but then people, for a completely different reason, can be invested in trying to figure out how that idea works and how to improve upon it. And once they figure that out, then that sort of propagates back to the other discipline. And so making it sort of possible for people to pursue unpopular ideas is actually really important for working out what the consequences of those things are in the context of whatever the other ideas are of the era and the technologies that are available for supporting them. [00:27:09] Speaker B: Well, and criticism plays a role also, because I think maybe without Minsky and Papert's Perceptron's book that sort of criticized the current state, maybe it wouldn't have gotten the attention to fixing it that it eventually got. I don't know. So what you said is it's important to let people sort of explore these ideas. But in part, I think the criticism probably bolstered and sped up the eventual success, perhaps. [00:27:41] Speaker A: Yeah, maybe. And I mean, it's certainly the case that I Mean, Rumalhat read that book as part of a reading group and that sort of planted an idea in his mind. He also went back and read Rosenblatt. And so if you read a lot of Rosenblatt and then you read a lot of Rumelhart, you're like, oh, okay, here's that idea. And here's that idea. Backpropagation is the name of the algorithm that Rosenblatt used for doing multi layer learning. So he had a multi layer learning algorithm. It just didn't work very well because he didn't have the error signal right, when it sort of propagated back to the next level. But he had this idea that you need to sort of take the errors and then push them back through the network work. And he just didn't really know how to do it properly. [00:28:24] Speaker B: All right, let's go back to logic for a moment though, because this in the book and sort of in real, because historically, logic sort of sets it all off almost. I mean, you start by the story of Boole, right, and his approach to the laws of thought, essentially, which, you know, he wanted to, he had aspirations to explain thought using logic, essentially. So you go into that story, and so in some sense, logic, those origins, is sort of the grand start to all things, right? Another perspective, and I'll hash this out a little bit more, is that it's the original sin that got us off on the wrong foot. Because from that perspective, what you write in the book here, I have a quote here, let's see. And this is tied into formal systems as well, which you can expound upon also. But so the reason I bring this up, we were talking about probability theory, and you make it clear in the book that one of the great benefits of probability theory in Bayes Rule Bayesian inference is that it's medium independent. It doesn't matter whether you're doing it on books or, you know, what symbols you're doing it on, cars, trains. It is medium independent. You can do it on anything, language, that, that sort of thing. And so it's medium independent. And the same applies with, like formal systems. Essentially any formal system you have, it doesn't matter what you're talking about. The rules are valid within whatever system you're talking about, right? So you talk about formal systems in the book, which are systems of rules and symbols. To use a popular definition, a formal system has three properties. It's a token manipulation system, it's digital, and it's medium independent. And so in that sense, formal systems and by association, logic is sort of the origins of this medium independence, story. And so this original sin kind of perspective. I'm going to read another quote from your book here because logic in some sense is the first thing that stripped away meaning. And here I'll read your sentence. So reducing semantics to syntax, reducing discovering truth to following rules, was a massive first step toward demystifying the nature of thought. If we can just write down the right set of rules, thought stops being something that just happens inside human heads and becomes something that can happen on the page, on a game board, or inside a machine. Okay, so that was very long winded. But I'm wondering how, you know, this original sin perspective versus, like the thing that started it all to get us where we are. It stripped meaning from. There's. So there's semantics and there's syntax. And going into that logic, zeros and ones strips meaning. It strips the meaning from everything. So I don't know, how do you think about that? You have to do that in mathematics. But that original sin was there from. [00:31:32] Speaker A: Yeah, I think that's a great perspective in terms of just characterizing what were the really good things and then the really bad things about that approach. Right. So the really good things were that it gave people a way to just to characterize how something like thought could work. Right. And in particular how it could be done by a machine. And that works really well for problems that fit the schema of these sorts of formal systems. Right. So it worked really well for making systems that could play chess, for making systems that could do arithmetic and sort of algebra and maybe even prove some theorems, making systems that could solve problems through search. Right. And for characterizing the abstract structure of languages. Those are the big success stories of that rules and symbols approach because they're all cases where they fit that schema of a deducted problem. Where you're going from things that you know to be true to consequences of those things. Right. Where you're starting in some initial conditions and then you're following the sequence of rules to get to some endpoint. And then it's a question of like, yeah, what are the rules that you should follow in order to get there? Right. That's the schema that characterizes logical proof. Playing a game of chess, deriving a sentence from a formal grammar, all if, [00:32:58] Speaker B: then statements, all production systems in the Newell and Simon sense. [00:33:02] Speaker A: Yeah, that's right. That's right. And so production systems are universal. So anything that you could express as a Turing machine, you could express in the form of a production system and vice versa. So it gave you this very nice way of synthesizing all of these aspects of cognition. And the problem is that that schema doesn't fit everything that minds do. And then there was a lot of effort to shoehorn the things that aren't deductive problems into being deductive problems. And that caused all sorts of, you know, issues, right? So classic examples, perception, right? Not a deductor problem. You are getting, you know, photons hitting your retina from a three dimensional world outside, they're hitting this two dimensional surface. You're trying to reconstruct what it is that's in that three dimensional world. It's underdetermined by the information that's available to you. It's not a deductive problem. Learning a language, right? You hear utterances in that language. You are trying to piece together what the rules are of that language. You're not getting enough information to allow you to definitively identify what those rules are. It's not a deductive problem. And so those either led to people trying to solve edge cases like toy worlds that they could set up where things were deductive. So you could say, oh no, this is actually going to work. Right? [00:34:22] Speaker B: Yeah. The trick was to take inductive things and make them deductive, right? That's right. [00:34:29] Speaker A: Or deciding that learning wasn't really a thing. Right. Sort of. Famously in the case of Chomsky saying that. The process of acquiring language is one which is very, very strongly constrained by some kind of genetic endowment, essentially to the point where you end up with something which looks like a deductive problem. Right? So like the principles and parameters version of universal grammar is a nice example of this. You've got some set of principles that mean you've got a small number of parameters that you need to set by hearing sentences. And then when you hear the appropriate sentences, you can be like, oh, okay, now I know that my language is prodrop. And so I'll just flip that bit and I just need to do that enough times and then I can figure out what the grammar is, right? So you had this kind of like abuse of deduction to try and allow you to solve those inductive problems. And in fact, the real challenge of induction is you need a different kind of math for it. And it's a math which no longer allows you to make those same purely syntactic inferences because you need to be doing things like assigning probabilities to different outcomes and updating those probabilities. And that requires you to have some sort of semantics that go into how you specify the probabilities of the world. [00:35:41] Speaker B: Right, Elaborate on that. Because I said logic stripped out semantics immediately. So how did it get back in? [00:35:49] Speaker A: Yeah, so I mean, from one perspective, probability theory is a formal system, right. Because it's a thing that you can get a computer to, your computer can calculate. Bayes rule for you, the tricky part is how you specify the probabilities of the different possible worlds that you might be in. Right. So all of the semantics of what you know about the world goes into that big joint probability distribution that you assign over everything, your sort of prior distribution. Right. And so in the case of perception, that's about the model that you have of what the three dimensional world is like. Right. That's your sort of prior distribution that you're updating when those photons hit your retina. And so there's a huge amount of knowledge, semantic stuff that goes into that. Right. And the inferences that you make from those photons are going to be influenced by your experience in a way that someone else might make a different kind of inference. And in the same way, learning language there, we think about having some kind of prior distribution over languages, and that's something which is shaped by your genetic endowment. Yes. But all of the other kinds of experiences that you have and all of the other things that are in your environment that feed into the way that you're actually learning things about the language that you're speaking. [00:37:03] Speaker B: Yeah. So I hadn't thought about. So going back to like the universal grammar, Right. And that we have to have had built in grammatical rules, et cetera, I hadn't thought about. So the answer was always like, well, it was through evolution that those priors were gifted to us. But I hadn't thought about it in terms of like, that's trying to solve the. Set us up so we can do deduction. That's what evolution did. And so that's a nice way to think about it. But is that the right way that I'm thinking about it? [00:37:37] Speaker A: So that's still an overly constrained version of the problem is the way that I would say it. Right. Which is that if your learning mechanism is something like deduction, you're limiting the tools that you have for solving this problem. [00:37:51] Speaker B: Well, what I mean is like that was sort of what they were trying to do. Right. Or like Chomsky was trying to say is. [00:37:56] Speaker A: Yeah, okay, yeah. A classic example of this is indirect negative evidence. Right. So indirect negative evidence is the fact that you don't hear somebody say something actually gives you some information that, that's a thing that is not part of the language that you're speaking. Right. So if you don't hear a verb used in a particular construction, that gives you some information that that verb can't be used in that construction in your language. Right. And that's not something you can get deductively. That's purely inductive evidence. [00:38:28] Speaker B: But the absence of things is because there's an infinite number of absences that could occur, so it's difficult to learn from. [00:38:34] Speaker A: That's how language models work. Right. So that's what I was saying. Probability theory is really important for understanding why large language models can learn something like a language, because the fact that they're modeling the language as a probability distribution means that they can make use of indirect negative evidence. So the fact that when you're predicting that next token, you're producing a probability distribution which is normalized over all of the tokens that you could see means that if you don't see a token in that context, that's giving you information that you can use to update your probability distribution over the next tokens. [00:39:04] Speaker B: Okay, So I started this sort of segment by noting that it was the original sin or victory of logic that stripped the semantics out and just sticking with that theme for a moment. So it's interesting that. So when you do that, that original. Let's define. So you're defining thought as logic when you're saying, all right, well, I'm going to try to explain thought with logic, you're almost like, defining thought by that logic. Right. And when I say that was like the original sin already, you've made, like, a commitment there. And then, okay, we're going to almost operationally define thought as logic in that case. Right. And what you're saying is the semantics is saved, like when you bring back inferences like probability theory. But I wonder. Part of me wonders, you know, you have to go back and think, well, was that the right first assumption? Right. Or definition of thought? Are we losing something? I don't want to, like, harp on this too much, but it's this, like, hunch in the back of my mind, like, what are we missing when we're trying to characterize thought? Do I really need to think of thought as these mathematical processes, these algorithms, ones and zeros? What am I missing if I don't accept that that is a valid definition of thought? Or what if I accept that? Like, yeah, like within formal systems, sure. If I'm thinking about various moves on a chessboard, I can think it through like that. But the vast majority of my ongoing cognition isn't like that at all. Right. So maybe I'll just throw that at you and see how you respond. [00:40:48] Speaker A: I think that's right. I mean, I think it was a starting point. And it works fine for the things that are well characterized by deduction. It doesn't work very well for the things that are inductive inference. And it turns out that most of what brains do is maybe a little more like inductive inference. I think you can also bring in neural networks there because doing that semantically rich probabilistic inference is incredibly hard. It's computationally costly. We have to define what the relevant prior distributions are and all of these kinds of things. If you think about neural networks as really good universal function approximators that if you apply them to a problem that requires probabilistic inference and set up that problem appropriately so you have cross entropy loss and all these other kinds of things, then they are really good at approximating these sorts of problems of probabilistic inference. And you don't have to explicitly do Bayes rule in order to update beliefs and so on, because the models can sort of implicitly learn to represent a distribution and update that distribution in the way that Bayes rule tells you you should. Right. And you can either do that in a way where they're amortizing probabilistic inference and they're just like creating a representation which allows them to solve that inference problem directly. Right. Or they're representing a probability distribution. And things like in context learning are essentially models doing Bayes rule on the data that they've been provided, having kind of learned what you need to do in order to be able to do the relevant kinds of computations to execute that. Right. And so, yeah, I think that's another relevant breakthrough here is that for probabilistic modeling to go beyond relatively toy kinds of settings, you need something like neural networks to make it possible to represent the rich complex semantics of those probability distributions at scale. [00:42:40] Speaker B: Yeah. So I guess maybe at heart of my worry here, One is that I'm being too simple minded about this because this is like sort of just something that's in the back of my head always. So there's like the idea that. How do I phrase that? How do I phrase this? Okay, so the perfect solution is to do a Bayesian update. Right. But it's not tractable. So then the next step is to think, oh, okay, well, we must be approximating it. Well, how can we approximate it by some amortizing procedure or whatever. But the assumption is like, well, we must be approximating it because it is the optimal solution. So that's what the algorithms must be doing. We have to find the right algorithm that is the approximation of the perfect solution somehow, then tell a story about how that can be happening fuzzily in the neural networks. Right. And so you're sort of committed to this. You're not committed to things the optimal solution being implemented, but you're committed to the agent in this case trying to implement the optimal solution but just failing to do so, but approximating it because it knows it wants to solve it optimally or knows it wants to or has evolved to or something. Right. So that's, that's kind of my, like, worry. And all of this stuff is so beautiful. But then it's like, oh, but we could have it wrong. Is that the wrong approach to think of it that way? [00:44:13] Speaker A: You know, it could be. I mean, I think, you know, I think it actually works pretty well. [00:44:21] Speaker B: It does work really well. That's the, that's the seductive thing about it and maybe the thing, the reason why it's right, you know? [00:44:28] Speaker A: Yeah. And I'd say the things that actually make it a little more complicated than that are that we can say there's one thing that you should be doing at that most abstract computational level, which is Bayesian inference. And then it's worth saying, Bayes, is how you update your beliefs. It doesn't tell you what actions to take. Right. So there's another part which I don't really talk about in the book, but is the sort of part that's related to decision theory and reinforcement learning and all of these other kinds of things, which is how do you go from belief to action? Right. And so that's important. But if we just stay in the sort of belief part, what makes it tricky, I think, from a psychologist perspective and a neuroscientist perspective, is that you should not have an expectation that there's a single algorithm that's being used to approximate probabilistic inference in all of these settings. Right. So if you look at what computer scientists and statisticians do, if you have a problem that you're going to solve by a Bayesian inference, you then say, okay, I could try this algorithm, I'll try this algorithm, I'll try this algorithm. You've got macro chain, Monte Carlo, you've got particle filters, you have importance sampling, or you make your amortizing neural network. You have gradient flows, whatever. Kinds of tricks. [00:45:33] Speaker B: Yeah, you just said tricks. So Are you going to say that the brain is a bag of tricks? [00:45:37] Speaker A: Well, I think it's more that, again, sort of this point of laws of thought, right? The level at which we can maybe sort of identify generalizable laws is that computational level. And when you get below that, everything gets much more complicated. Because what I think we should expect to see in human brains, but also in our AI systems, is many different ways that we do something that ends up looking like Bayesian inference when we analyze it in that abstract way. And so I just talked about two. Right? You can directly amortize a Bayesian computation, or you can learn a joint probability distribution such that you're able to do something like in context learning, right? So this in weights versus in context learning is already a trade off that we see in our AI models. Another thing that you can do, which is not something that's necessarily explicit in the AI models that we have, is actually, and we actually do this in my lab, is prompt the models to implement an algorithm, right? And this is, I think, a thing that humans do, right, Some of the time when we are just like, generating examples of possible outcomes and then updating our beliefs based on those examples. So my postdoc, Dilip Aramugam has a nice paper where he shows that we can actually make a system for doing reinforcement learning using large language models by just taking a Bayesian reinforcement learning algorithm and then prompting the model to do the steps in the Bayesian reinforcement learning algorithm. So we're like, express a set of beliefs over possible worlds, okay? Now you get this information, update your beliefs over possible worlds, choose an action based on the beliefs that you have about possible worlds, right? And sort of running that loop. So we're doing something where we're giving a strategy to the model and the model is implementing that strategy. And there are plenty of cases. [00:47:24] Speaker B: Meta learning. Sorry, sorry. Is this related to the meta learning [00:47:27] Speaker A: or is that different? It's not. I don't talk about it in the book. It's more recent work than that. [00:47:30] Speaker B: No, I. Well, yeah, yeah, well, I knew that. But. But is that a part of the meta learning? [00:47:35] Speaker A: Is it a form of meta learning? It's not, because it's just prompting. Yeah. Okay, so this is. But. But we are. We're also doing meta learning things along those lines. [00:47:44] Speaker B: Yeah, I know. Well, I didn't know if this was adjacent to that or part of that story. [00:47:46] Speaker A: Yeah, no, but. But that's the. That I think you can think about humans as having sort of learned encoded strategies, right? That's the Amortization, having something where they're just representing a probability distribution and being able to update that probability distribution as information comes in, that's like the in context learning and then having more explicit strategies that they use for solving problems. And that's like the sort of prompting case. And those are all ways that we end up using our brains to approximate the same thing that we would in each of those cases just write out as Bayes rule. [00:48:21] Speaker B: Yeah, I think I'm two minded about this, right. On the one hand, God, it seems so complicated. This is the cognitive architecture problem also. So we have this huge bag of tricks and we have to decide which one to use given the context that we're in. And if that one fails, there's an if then statement. If that fails, go to a different one and then learn how to adjust the if then statements with a learning algorithm. Algorithm. And man, my brain seems like so much smarter than I subjectively feel when I'm trying, when I'm trying to solve problems. So it seems like, well, that intuitively doesn't make sense. And then on the other side, we have billions and billions of neurons and the capacity to do all these things is way higher than we just appreciate in our day to day subjective experience. So my intuitions about my own cognition, or sorry, well maybe a different way to say it is like the reality of the capacity of my cognition far outstrips my own intuitions perhaps of my day to day subjective experience. [00:49:20] Speaker A: And I think we're starting to discover that with things like language models too, where it turns out that they're doing incredibly complex things that are not things that we would even think are things that they could do because they're solving this problem in this weird way that's not necessarily the same way that humans are. Right? Because they're not coming at the problems with inductive biases that are human inductive biases. They're relying on all sorts of tricks that are not the same ones that human brains use. And that's part of why they end up behaving in the weird ways that they do as well. [00:49:49] Speaker B: Well, would you consider those tricks almost heuristic? How are those tricks related to heuristics? [00:49:54] Speaker A: Well, like tricks like memorization. Right. So like, like you, if you have large language models, have a very large capacity for memorizing text. Right. And so to the extent that that's something that you can use to solve a problem, they're going to lean on it and it's, that's not the same kind of Mechanism that a human brain might be doing for solving a problem. [00:50:12] Speaker B: Yeah, well, even something just on memorization. I notice these days, especially when I do math in my head, it's almost all memorization. Just I remember my addition table and then my multiplication table. And so it's not like I'm calculating in a certain way. I'm just memorizing the answers to known rules that I've learned or whatever. Yeah. So, okay, so I said, heuristics. Let's talk a little bit more about how probability theory was saved from Kahneman and Tversky. Right. So famously like this, thinking fast and slow. That was Kahneman. Right. Was the author of Thinking Fast and Slow, I believe. Yeah, yeah. So Kahneman and Tversky, we started off this conversation by me saying that you start off talks by saying that psychologists often talk about how poor our. Our brains are. [00:51:06] Speaker A: Our. [00:51:06] Speaker B: Our minds are right. Solving things. And Kahneman and Tversky, like, sort of made part of what they did in their careers is made careers out of pointing out just how poor we are at using, like, probability theory. And so this sort before that, I. I guess probability theory was touted as a. As a potential way to explain how. Explain our cognition. Kahneman and Tversky come along and say, whoa, like, humans are terrible at this. People like you come along and Josh Tinnenbaum and those sorts of folks and say, actually, they are doing it. They're just doing it really well on a different problem. That's a really rough, shod version of the story that you tell in the book. Can you expound on that? Tell me what I got wrong? [00:51:54] Speaker A: Yeah. So when we first started working on Bayesian models of cognition, it was a little bit of an uphill battle because anytime you put up Bayes rule in a psychology talk, people would be like, oh, we know already that this is wrong. Because Carmen and Tversky told us people don't do anything like baseball. [00:52:08] Speaker B: It was full stop like that. Really? [00:52:11] Speaker A: Yeah, it was like, you know, so there was like, a lot of skepticism. And I think there's a couple of things that are relevant to this. And so it's worth saying, you know, this is a sort of through line that starts with Roger Shepard. Shepard was interested in this idea of universal laws of cognition, Sort of rediscovered Bayesian inference as a way of deriving those. Right. Because he's saying, how do you get a universal law, a principle which is going to hold for any intelligent organism anywhere in the universe? Well, the way that you get that is by thinking about the problems that are shared by intelligent organisms everywhere in the universe. And then what the ideal solutions to those problems are. And for something like generalization, where you're sort of saying, I know that this property holds one object, what's the chance it holds for another object? The way that I express that it's already expressed in terms of probability theory. And so then Bayes rule tells you how to sort of work out the answer to that. And so he, he was sort of saying, we can use Bayes as a way of characterizing what these optimal solutions are like. And then he obviously was a colleague of Tversky's at Stanford, was very aware of that sort of critique of thinking about humans and Bayes in the same context. And he sort of said, well, maybe it's okay to think about using probability theory to explain human cognition when we're talking about these kinds of abstract problems that human minds have to solve. Even if when you ask people questions about probability, they give you sort of horribly wrong answers. Right? And so he already made that distinction in the way that he was approaching it. And then John Anderson, Nick Chaytor really helped to develop that out into Anderson through this idea of rational analysis, which was based on reading Shepard's paper on universal laws when he was on summer holiday on a beach. Sort of flesh this out into this more general approach to rational analysis. And then, sorry, can you flesh out [00:54:15] Speaker B: what a rational analysis is? [00:54:16] Speaker A: So rational analysis says, let's exactly think about the problem that a human mind is solving, derive what the optimal solution to that problem is, taking into account things like what's the distribution of events in the environment that people encounter, and then use that as a tool for explaining human cognition. So this again runs into this problem where you're saying, let's make the assumption of rationality in order to explain an aspect of human behavior. And then Nick Chater sort of picked this up and continued to develop that approach and sort of showed how it could be used to explain various kinds of, of things that people do wrong in a classical sense, in terms of logic and so on when they're engaging in reasoning. If you reformulate that problem as one of probabilistic inference, then these logical errors that people make turn into things that actually make sense. And there's a nice paper by Mike Oxford and Nick Chaytor that sort of demonstrates that this is true for some of these classic logical fallacies. And so that that sort of sets up this idea that we can maybe make this distinction between using rational accounts like probability theory to explain human cognition versus what people do if you actually ask them to use probability theory. And so we already have this in the context of logic saying, well, turns out if you think about people who are solving a different problem, then we can explain what people are doing in the context of logic. And then Josh and I worked on a sort of series of problems where we were saying, well, maybe if we look at the things that people are doing wrong in probabilistic inference, you can also think about those as solving a different problem. It's just a little more tricky now because they're both in the world of probabilistic inference. Right. So when people are making an error in probability judgment, maybe it's because they're solving a different problem of probabilistic inference. [00:56:10] Speaker B: So the key here, I believe. Right. Is that what people are trying to do is infer that the generative. You're trying to infer, like what generates the distribution of samples that you're. That you are sampling. Right. As opposed to. Go ahead. [00:56:27] Speaker A: Well, there are a few things that I think are good sort of principles to think about here. So one is probabilities are just weird. Like, the way that people talk about probabilities is not necessarily consistent with the sort of math of what probability theory is. [00:56:40] Speaker B: What do you mean? You have to elaborate on that. [00:56:42] Speaker A: Yeah. [00:56:42] Speaker B: So, [00:56:45] Speaker A: you know, if I ask you what is more probable, if I flip a coin, Heads, heads, heads, heads, heads or heads, heads, tails, heads, tails. People will tell you, heads, heads, tails, heads, tails. But it's always the case that a longer sequence has lower probability. Right. Because you have an extra event which is getting multiplied in. And so your intuitions about probability are not really about the things that probability theory is about. So when someone says, oh, the probability of this. This. A lot of the time they're not talking about probabilities. They're talking about something which is more like evidence. Or they might be talking about a relative probability. Like they're saying this thing is less likely than whatever the default is that they're comparing to. Right. So that's one thing. And then the other thing is that often thinking in terms of evidence is really thinking implicitly in terms of generative processes. Right. And that I think you can make a sort of evolutionary argument, and Shepard actually makes this argument relatively explicitly, which is that there is certain, you know, the kinds of problems that human minds have to solve. It's not ever an evolutionary problem for you to calculate the joint probability of a sequence of outcomes. Right. That's not a problem that your mind has to solve. But if you're trying to work out, oh, is the sequence of outcomes I saw a consequence of some genuine causal process or just the work of chance? That is a problem that your mind has to solve because that's part of how you figure out what the, you know, the causal structures are in your environment. Right? So a classic example of this. Okay, if we take my heads and tails case, if you ask people which of these sequences is more likely? Now I'm going to do five heads in a row. Heads, heads, heads, heads, heads, right, versus heads, heads, tails, heads, tails. People will tell you, heads, heads, tails, heads, tails is more likely. But in fact. So the question that I asked is, what is the probability of this sequence given a random generating process? And people are wrong, wrong in answering that question. But if you flip it around and you say, what is the probability that a random generating process was used in generating the sequence? And that's what you get when you apply Bayes rule here and sort of compare the random generating process to some alternative, then you can actually make sense of the judgments that people are giving you for those probabilities. So heads, heads, tails. Heads, tails gives you better evidence that someone was flipping a fair coin than heads, heads, heads, heads, heads. [00:59:04] Speaker B: Even though they're equally likely. [00:59:06] Speaker A: They're equally likely, but the posterior probabilities are different. And so you can use that to explain some of these sort of weird, anomalous things that people do when they're thinking about probabilities. [00:59:17] Speaker B: Well, you see, it's weird, anomalous, but I mean, it is more useful to stay alive. Right? Evolutionarily, in that sense, it's what you should expect. And in that sense, the, the, the error that people make is actually correct. Right? I mean, it's the. [00:59:38] Speaker A: Well, but it's, it's. [00:59:39] Speaker B: There, there's. [00:59:40] Speaker A: If you put them in a casino, they're still going to be in trouble. Right. But that's because a casino is an environment which is not your evolutionary environment. No real causal relationships in the casino, which is kind of an unusual circumstance for a historical human being, you know, just, just moving around in the world. Right, right. [00:59:59] Speaker B: Yeah. Okay. All right. Well, you should. I'll try to tell that to my grandmother. She loves the slots for whatever ungodly reason. So I asked you to elaborate. [01:00:11] Speaker A: Well, I'd say that there's two things here. So one is, first of all, this idea that some of the time you're not solving the problem that you've been asked to solve. Right. And that there might be other kinds of problems that we can still give an analysis in probabilistic terms. So that takes an important idea from rational analysis, which is that we're kind of like, as the theorists allowed to say, people might be solving a different problem from the one that they've been asked to solve by the experimenter. Right. And so that's important, I think, for understanding some of these things. And the other thing that is important for understanding some of the ways that people deviate from probabilistic inference and from decision theory more generally, is recognizing that people are doing this under cognitive constraints. And that's something that was a component of rational analysis. Anderson's version of it said assumed the minimal cognitive constraints. And then more recently, in context of this approach of resource rational analysis, we say, no, let's take those constraints really seriously and think about how it is that if you have an agent who is operating with limited computational resources or limited time, how that's going to affect the strategies that they use. And when we do that, we actually find that a lot of the things that have been held up as instances of human irrationality and the sort of dumb heuristics that people might be using are things that kind of make sense. When you think about humans being constrained in particular ways by their cognitive architectures and trying to do a good job of solving the problems that are posed by their environment, subject to the constraints of those actors. [01:01:39] Speaker B: Yes, this is another case where we got. Because here the resources are the constraints that are sort of like the dominant aspect of life. In life, a machine is not subject to those same necessarily. I mean, there are computational resource constraints for a computer, but it's subject to way less constraint than organisms. [01:02:03] Speaker A: Or different. That's right, yeah, different. And I should say we just had a book come out from Princeton University Press called the Rational Use of Cognitive Resources, which is our guide to resource rationality, where it sort of introduces this idea from. From scratch and then sort of goes through a bunch of applications of it. So folks who want to learn more about that, that's a place to go for that. [01:02:27] Speaker B: Yeah. [01:02:28] Speaker A: So, no, I think about this as a way of understanding the moment that we're in with respect to AI and maybe a little bit of what the future looks like, which is that human minds are fundamentally shaped by the constraints that they operate under. Right. So we live for only a few decades. We only have a couple pounds of neural tissue that we're going to learn all the things we're going to learn and do all the things we're going to do with in Those few decades and we can only share data or pool compute by making honking noises at each other like we're doing right now, or maybe by wiggling our fingers. Or we have to use our bodies to try and like communicate things to one another. And so those constraints give you something that looks like human intelligence in the sense that we have to be able to learn things quickly because we're not going to get that much data in our limited lifetime. We have to be smart about how we use our limited computational resources and have heuristics and strategies that we use for getting around that, but also be able to recognize the structure of problems and when that's shared with other problems and what goals and sub goals might be reasonable things to set and so on. And we have to come up with things like language, writing, science, starting companies, systems of law, all of these kinds of hacks that we have to allow us to pool our data and compute resources in order to achieve things that go beyond the capacities of any individual human. And that's what makes human intelligence. Human intelligence is sort of the set of all of those things. Things, if you make an AI system that's not subject to those constraints, can learn from much, much more data, can add more compute as more computers needed, and can either. You can train multiple systems on the same data. You can take a system that's been trained on some data and fine tune it on something else. You can sort of pool data and potentially compute resources in much more flexible ways. You're going to end up with something which is just a different kind of intelligence. Intelligence. And that's where we're at, right? We have these systems that can do some of the same kinds of things that we can do, but the ways in which it's doing it are a little different from us because they're not operating under the same constraints. And that's part of what makes them maybe not so intuitive and means that we sort of expect them to act like people because that's the only kind of model we have of what intelligence is like. But they end up just screwing up in ways that seem inscrutable to us. Right? Because they're just operating with a different toolbox for solving those problems. [01:05:06] Speaker B: Problems well, they screw up in many ways, but they also do many things much better. Right? Like calculators do, for example. So I guess there, I guess again, I'm like two. I have two minds about this because. So you're probably familiar with the concept of enabling constraints from people like Michael Anderson and Michael Anderson and Vicente Raja, write about this. Other people write about enabling constraints. One way to think about constraints is they're limits on what we can do or they're sort of in the way, obstacles, challenges to overcome. And what we're trying to do is do the optimal inference, do the optimal algorithm, but we have to adjust because of these constraints. Another way to think of constraints is that they're enabling that. Like you said, that is what human, in one species case, that is how human intelligence comes about. The constraints are actually what forms that intelligence. And so in that sense they're enabling. Right. And so to think about an AI system which has like a completely different set of constraints, I just don't. Yeah, you're not going to get to the same sort of quote unquote intelligence in an AI set, nor should you want that, perhaps. [01:06:25] Speaker A: But I think that's a really productive thing to recognize because I think when people talk about superhuman, artificial super, superhuman AI or artificial general intelligence, a lot of that is based on our understanding of what intelligence is from humans. Right. Superhuman AI. I mean, it makes it sound like there's one dimension. Right. And you have. [01:06:47] Speaker B: Yeah, yeah, it should be like extra human or something like that. [01:06:51] Speaker A: It's going to be just like us, but smarter. Right. And I think the thing that comes out of thinking in these terms is recognizing that the AI systems we build are just going to be, be different from us. They're going to be able to solve some of the same kinds of problems because they're really optimized to solve those problems. Right. And a lot of what is going on in the AI companies at the moment is curating training data to make models better at solving specific problems that are the kinds of problems that people want to solve. Right. But then the patterns of generalization that they're going to have are still going to be different from the patterns of generalization that humans have have, because they're not operating with the same inductive bias. Right. So there's two important consequences of inductive bias. So inductive bias is everything other than the training data that influences the solution that you find. Two important consequences of that. So one is how much data you need to learn something. Right. So humans can learn language from far less data than you're a large language model, because humans have an inductive bias that supports learning human languages. Large language models can learn all sorts of things that humans can't. The other is, is what kind of solution you end up finding, particularly when you have these models that are sort of very much like These sort of over parameterized models. The inductive bias that you have in a neural network is a very weak kind of inductive bias towards the initial weights of that network. A lot of the things you can do in terms of setting up different architectures and so on are about trying to create models where those inductive biases sort of align with whatever the problem is that you're solving. But you've got a system that intrinsically has a sort of pretty weak inductive bias in that space to functions. And so that means that even as you train it on massive amounts of data, it's going to do great at solving the things that are in the training data. But the patterns of generalization that it's going to get are going to look different from the kinds of patterns of generalization that a human would give. Because the inductive bias is different. Right. Because the way in which the solution that it's going to find is going to be different from the kind of solution that a human might find. And we just, I think, have to get used to the idea that we're going to see things that are somewhat inscrutable to humans in the behavior of these models. Because unless you really work on engineering more human like inductive biases, you're not going to get models that are going to generalize in ways that look like human generalization. And you can try and get around that by having enough training data that you never really get out of distribution. But that's always going to be a property of the model. [01:09:13] Speaker B: Yeah, I don't know if you'd want to get around it. I don't know why. So the idea of superhuman intelligence almost assumes the solution to that would be to build all of the same constraints, all the same inductive biases, into the AI system. And it just seems silly to do that almost. [01:09:30] Speaker A: Yeah, that's right. Well, I think one of the discoveries of machine learning in the last 50 years is that engineering inductive bias turns out to be harder than scale scaling. Right. That's the hard part. So scaling is an engineering problem. Constructing systems that have human like inductive biases is really difficult scientific problem of figuring out what those inductive biases are and then how to appropriately induce them in the systems that you're working with. And so we have these effective engineering tools where if you can express something in data, you can use back prop to push it into a neural network. And so that means a lot of the emphasis is on finding the right configuration of Pre training that you're able to emulate something like those human inductive biases in a way that makes it possible for the models to at least generalize in ways that make sense most of the time. [01:10:24] Speaker B: So is it a logical conclusion to state something like, there will forever be a fundamental different set of solution spaces in AI systems and in humans because they will almost inevitably have different inductive biases, almost inevitably be subject to different constraints. And unless we make that set of organization, that organization of constraints the same, there's no reason to expect that we would get. Get this linear human to superhuman, but exactly like human, but super sort of intelligence. [01:11:08] Speaker A: Yeah, I mean, that would be my expectation. Modular, the fact that we're only going to detect those differences as you start to get out of distribution. Right. So if you can just get enough training data on the problems that you care about and get enough coverage of things, then you're not going to get into the generalization regime. That is the thing that maybe reveals what the differences in inductive biases are like. [01:11:34] Speaker B: Yeah. Also, I don't know, where are you on the scale of how general human intelligence is versus specialized? [01:11:42] Speaker A: Well, I mean, I'd say specialized to the set of computational problems that are characterized by those constraints that I was talking about, Right? [01:11:49] Speaker B: Yeah, but what I'm asking you is in the space of all possible intelligences, right? Like. [01:11:55] Speaker A: No, but that's what I'm saying. The set of human computational problems is pretty tightly circumscribed and there's a subset of the things that are the things that we can imagine getting our AI systems to solve. And so, yeah, I actually think there's interesting opportunities there in terms of. I'm more interested in making AIs that are not like us, because if you really want to discover new things, being able to make brains that are, you know, have a completely different perspective on the world and the problems in it. And that's a way of extending what it is that our human brains are able to engage with. [01:12:34] Speaker B: Yeah, that's true. I would rather talk to an entity with a vastly different perspective on things, for example, than just to a smarter person. Person. I mean, it's great talking to smart people. Right. I do it all the time. But. Okay, so getting maybe even back to into the weeds. This might take. This might require getting back into the weeds a little bit. One of the things, many things that I found interesting in your book is you make this connection between the hidden layers of an artificial neural network with a hidden Markov model. And probabilities with respect to language that sort of. Those things seem unrelated, but you tie them together in the book. Is there a way for you to sort of summarize what all that means? What I just said? [01:13:29] Speaker A: Yeah. I mean, so I think there's two ways to connect them. So one way is just this was know trying to understand why it is that large language models are able to learn language when Chomsky had made a strong argument that statistical methods couldn't work for learning language. Right. So. So part of Chomsky's argument was the famous sentence, colorless green ideas sleep furiously. Right. And a speaker of English can recognize that as a grammatical sentence in English, even though it sounds a little funny, might be hard to come up with a semantic interpretation for it, but it was carefully constructed because Chomsky was arguing with the behaviorists and with the information theorists. So that every pair of words in that sentence would be very unlikely to have occurred next to one another in all of previous text or utterances that you might have encountered. Right. So colorless green green ideas is something you can hear more since the 1950s. But in the 1950s, Green Ideas is probably an unlikely thing to happen. [01:14:30] Speaker B: Is that when green became associated with the concept of new? [01:14:34] Speaker A: No, I mean, subsequently, I'm not sure when it was. But that date was after Chomsky was making this argument. [01:14:41] Speaker B: Sure, sure. [01:14:43] Speaker A: Ideas sleep and sleep furiously. Right? So none of those pairs of words should have occurred next to one another. And so Chomsky was like, clearly statistical models of language cannot succeed. Right. Because of this property. But the kind of statistical models of language he had in mind were Markov models. Right. Because that's the kind of thing that if you're a behaviorist, you think, oh, I'm just learning associations between words, and learning associations between words is going to be enough to allow me to identify all of the grammatical sentences. That's not going to work. You haven't got associations between colorless and green and so on. And then the information theorists were thinking in terms of things like bigram models, trigram models. And again, the way that you estimate that model is by looking at the frequency with which the pairs of words appear next to one another, another. And so that's not going to work. So in the 1950s, this was a damning argument. But then subsequently, we have much more sophisticated probabilistic models of language. Hidden Markov models are a nice example. So in a hidden Markov model, you say, I'm going to have that Markov structure where you have the Next thing that's being generated depending on the previous thing. But instead of applying it at the level of words, I'm going to apply it at the level of latent syntactic classes. And so then I have. My sort of generative process be one which says choose a syntactic class and then based on that syntactic class, generate a word and then generate the next syntactic class, right? And so just making that level of abstraction, now you have something which can explain colorless green ideas. Look furiously. That sentence is a sentence of the form adjective, adjective, noun, verb, adverb, and that sequence is actually a relatively common sequence in English, right? And so if you're learning your Markov model at the level of the syntactic classes that are involved, then that's a very reasonable high probability sequence, right? And then based on the first thing being an adjective, you generate colorless, the second thing being an adjective, you generate green, and so on. And those are also reasonably probable words given those syntactic classes. And as a consequence, you end up with that sentence having relatively high probability, at least compared to other sentences that jumble up those things. Okay? So the key trick is having some kind of latent structure be a part of the way that you're defining a probabilistic model. And then large language models, you can think of as the limiting case of that. Because. Because the problem that you have in a hidden markup model is you learn this sort of transitions between these abstract syntactic classes, but that's still a relatively impoverished model, right? You sort of have these syntactic classes are discrete, and so that limits how much information you can be keeping as you're sort of propagating these things through, and then they're still Markovian. So you're sort of just generating it based on the previous thing. And so if you replace discrete syntactic classes with continuous vector embeddings, so now instead of the latent variable being a discrete class, it's a point in space, then you've got a much richer sort of latent space in which you can encode the information about what you've seen previously in the sentence and then use that to predict what the next thing is going to be. And so our large language models, or sort of these sort of neural network language models more generally, are a way of getting that latent structure, but then generalizing it beyond the sort of constraints that are imposed in something like a hidden Markov model. [01:18:20] Speaker B: And that's why I believe you say this in the book, that's why, first of all, I didn't know that Hinton named them hidden units. And that's why. Is that right? [01:18:30] Speaker A: From hidden Markov models? [01:18:31] Speaker B: Yeah, it is right. Isn't that. [01:18:33] Speaker A: Yeah, yeah. [01:18:34] Speaker B: That's the reason he gives. [01:18:35] Speaker A: But that idea of hidden being latent. Right. It's the thing that you don't get to observe. So in your neural network, you get your input and your output, you don't get to observe what's in the middle. In your hidden marker model, you observe the words, you don't get to observe the latent states that are behind. [01:18:49] Speaker B: And part of your explanation there, you talked about embedding it in sort of a spatial construct. Right. And going back to that original law of thought. I mean, that was like the original idea is to spatialize distributions of features as distributions in space. So a large part of the book is talking about that concept and how it's been applied to various things and how they kind of are unified together. [01:19:14] Speaker A: Yeah. I mean, it's a way of dealing with some of the problems that came up when people were applying logical approaches. Right. Is to say, okay, stop thinking about things as being discrete and start thinking about them as being continuous. And now we're not representing things as these discrete propositions. We're now representing them as points in a space. And then you now have a problem of, okay, we knew how to do computation with discrete things. How do we do computation with continuous things? And the neural networks give you a nice solution to that problem. When you think about the neural network as implementing a function that's transforming you from one point in space to another, it's also worth saying this is also a story that goes back a long way. So Leibniz, which is the place where I start the story, is like Leibniz trying to figure out what's the mathematics of the mind. Right. One of the solutions that he had was actually the. The first kind of vector embedding. So he was trying to figure out, how could you use arithmetic? And this was his fundamental error, or maybe has shown to be correct over time. But the way he was approaching this was using regular arithmetic to try and sort of do logic. And so the way he came up with for doing this was associating numbers with every proposition and then having some rules about, like, okay, if the numbers for this proposition, divide the numbers for this proposition, then they're related in some way. And so you end up with this sort of complicated thing where there's prime numbers and so on. But he had this idea of, you Associate a vector of numbers with each proposition in a way that sort of anticipates our modern vector embeddings. [01:20:47] Speaker B: You talk a little bit earlier I complained about how my subjective experience of thinking is not the same as like what I realized is actually going on under the hood. Said, but you mentioned in the book that sometimes we do conscious. We like explicitly and consciously perform Bayesian rule approximations, right. To reason to do things in reason when we're trying to, I don't know, let's say maybe make a chess move or something. Right. But that all the time we're also doing this implicitly, unconsciously. So I guess my question is why should we. This is the age old question. It's not answerable. I know, but I'm going to ask you anyway. Why should we need to do some of it consciously when we can handle it perfectly well not doing it consciously. [01:21:37] Speaker A: I would think about that in terms of if you think about the unconscious stuff as some kind of amortization, right. Then in order to amortize computation effectively, you need to be able to get a feedback signal. Signal. And so you are able to get good feedback signals for certain kinds of probabilistic inference, right? Things that happen on a relatively short timescale. Think about perception as a good example. Lots of feedback signals when you screw up your perception because you end up walking into a wall or tripping over something or not grabbing the thing you were trying to grab or something like that. Okay, so that's a good candidate for amortization if you're trying to make inferences about or decisions about what you're going to be doing a few years from now now or complex decisions about how you're going to plan for next week or other things like that. There's a big gap between the feedback signal and the computation that you're doing. And so it's really hard to amortize those kinds of things. And so that means that it's exactly a sort of resource rational trade off. Right? That there's a trade off in how well you're going to be able to amortize things versus how well you're going to be able to do them using those more effortful conscious computations. [01:22:48] Speaker B: So it's like a limiting factor. That's the. [01:22:51] Speaker A: Yeah, but poker players are a really nice example, right. Where poker players make a lot of decisions that involve uncertainty and as a consequence have a really good amortization signal for doing these probabilistic computations in a way where as they develop expertise as poker Players, they are amortizing stuff that the rest of us probably don't have opportunities to amortize because we're not in those instantaneous feedback situations for doing probabilistic computation. [01:23:24] Speaker B: So there's like a consolidation that happens that becomes what intuition is. You would say, yeah, yeah, yeah, okay. Simple as that. So, okay, one of the things I was going to ask you, and I was going to do this maybe in some extra time, but I'll ask you now because it's sort of a. Comes out of left field. But since I mentioned intuition there, like, what is the role of feelings in all this? Right. When I feel sad, like it's. It's hard for me to explain this in a probabilistic way. It's hard to me, hard to explain it in a symbolic rules or a neural networks sort of way. Like what? Like why the hell do I feel sad? Not that I feel sad a lot, but why would I? [01:24:04] Speaker A: All right, maybe I need to write another book which talks about the decision side because I think a lot of feelings end up on. On the decision theoretic reinforcement learning side of things. And we've done analyses of these kinds of things. So in Algorithms to Live By, I talk about with Brian Christian, we talk about some of a subset of emotions. So essentially the sort of game theoretic emotions, which are love and anger, these sort of interpersonal emotions. And you can give a rational justification for them essentially because they. They're commitment devices that mean that you have better outcomes in certain kinds of interpersonal situations. So being able to fall in love makes you a better partner. Being able to be angry gets you out of situations where you could be exploited by somebody else or where you have sort of games of chicken. And I can talk more about that, but there's a whole detailed explanation of that. And algorithms live by for sadness and sort of remorse and some of these other sort of. Of more complex emotions. I think you can give a story, and it's both a story that has a rational component, but also a sort of resource rational component in that it's a consequence of constraints on our architecture. There's lots of work on model based versus model free reinforcement learning. Right. Model based is you build a model of the world, you plan the actions that you're going to take in that world, and that's how you're going to decide what actions you're going to take. Take. Model free is you're learning associations between the state of the world and the action you're going to take. Right. And that thing we were talking about about amortization is a good example of that. You can imagine that things might move between those two different modes, but you might also imagine that as an agent, when you're trying to, you know, you're sort of taking actions in the world, you're relying on both of those kinds of systems, right? You have that intuitive, immediate response and then you also have your more slow, deliberative, planning based response. And so one interesting question you can ask is how should signals pass between those systems? Right? If you're a model free agent that has this model based component or the other way around, maybe you're a model based agent that's trapped in a model free body, then how can your model based agent communicate back to your model free body that certain kinds of things are not the things that you should have done, done? And I think you can think about some of those kinds of moral emotions as signals about that, right? So if you feel remorse after doing something, what that's doing is lowering your utility, decreasing your reward after having taken an action in a way that feeds into your model free system because you're associating what you did with this crappy state that you're feeling, right? But it's something that comes from your model based system telling you, hey, that that's not a thing that you should have done, so don't do that again, right? So I think you can give similar kinds of analyses for other sorts of emotions. We have a paper, so we have a paper that analyzes those sort of remorse cases with a former grad student, Paul Kruger. And then Ratchet Debay, who's at UCLA now as an assistant professor, has a paper on happiness which does a similar kind of analysis that sort of shows the, that the components that contribute to happiness are things that help to keep you motivated as a reinforcement learning agent. The sort of hedonic treadmill is a good way to be productive, even if it makes you unhappy. [01:27:36] Speaker B: Okay. So I want to make sure before we move on here. So that is a future book. The. I don't know, the Laws of Sadness. [01:27:46] Speaker A: I hadn't thought about it, but. But maybe, yeah. [01:27:48] Speaker B: How did this book come about? Is it like, I know this took you some time to write, so was this requested of you or did you. How did you decide to write this? [01:27:56] Speaker A: So this is the book I always wanted to write. And I wrote Algorithm Slid by with Brian. As you know, it was really a chance for me to learn how to do this because I didn't feel like I had the Chops to write this book. It's a hard book to write. [01:28:13] Speaker B: It's big. Not big as in toward force, thick. It's just big ideas and weaving them [01:28:21] Speaker A: together and telling the stories and knowing how to capture. So, unusually for a popular science book, I did a lot of work reporting and interviewing and archival work and so on, really trying to report out the stories. And I think the canonical popular science book is someone telling you what they think about the world. And there's some of that in this book, but a lot of it is about how other people thought about things. [01:28:52] Speaker B: That's a shit ton of work, isn't it? Like, going through all the. I'm going to ask you about that later. But just the sheer amount of work it takes to put something together, that's what maybe people should appreciate more. Right. [01:29:03] Speaker A: So that was the. Those were the skills that I had to build, and I didn't feel like I was able to do that as a first book. And so working with Brian on algorithms was a way to really figure out, okay, these are the things I need to do and then get some experience doing those. And so I think Brian, in the book, the dedication is to my teachers and my students. And Brian was really my teacher on the. From the reporting and writing front. [01:29:33] Speaker B: Well, because he's been good at that for a while, I suppose. [01:29:35] Speaker A: Yeah. No, he's amazing. Yeah. [01:29:37] Speaker B: What. Have I not asked you that? So I know I've asked you a lot and we've kind of skipped around a lot, but is there something that I haven't asked you that you wish I had, that you'd like to explain or talk about that's from the book or otherwise? [01:29:51] Speaker A: So one thing I wanted to mention is that the interviews that I did, I've also been collecting as an oral history project. And so those interviews are available as a podcast called the Cognition Project, where people can actually go and hear. So we're starting in the 1950s. It's going to take us a little while to get up to the 1980s and people's favorite neural network heroes. But you can go back and hear from Jerome Bruner, who kicked off the Cognitive Revolution and then sort of move forward from there. [01:30:21] Speaker B: That's available right now. People can. Okay, yeah. The. Bruner was one that I had just not had heard of Bruner. And you're like, friends with Bruner, right? [01:30:32] Speaker A: No, I just interviewed him for that. [01:30:35] Speaker B: Okay. Okay. Well, you spent time with Bruner, so the book is chock full of that sort of stuff. But Just science wise or like the message that you want the book to convey. Is there something that we've missed thus far? [01:30:50] Speaker A: No, I think this was really the story. I think you sort of picked up on the distinctive pot here. Really, really making this argument that maybe it's worth thinking about probability as one of the things that we're going to use to help understand the moment that we're in with respect to AI in much the same way that we've used it to understand aspects of human cognition. So I think in the book I say something like. I paraphrase mar. Right. And say something like, we're not going to understand neural networks by studying only neurons. [01:31:25] Speaker B: Yeah, I think this was in the book. I'm pretty sure it was in the book. You said you mentioned you had some sort of accident, malady or you went through something that. Where you couldn't use your hands for a spell. Do you mind sharing that? What was that? [01:31:39] Speaker A: So in my. So I've always been a pretty active person. So like fencing. I know fencing. That's right. And then in my 30s, I just started accumulating joint injuries at a sort of alarming rate. And none of them were particularly unusual injuries. It's just that I had all of them. [01:32:03] Speaker B: Right, right. I had a friend who just. He was always breaking bones and he found out he has like feather or bird bones, essentially. Right. [01:32:10] Speaker A: Yeah. So it's probably a similar thing that there's something not quite right about my connective tissue. So that I was getting like basically any situation where my joints were out of alignment, I would then get a tendon injury or something like that because the tendons were not strong enough to be able to support it. And so mostly these were just sort of common injuries. And then I got a less common thing happen, which is that my ulnar nerve on my left side, so it's the nerve that's your funny bone, migrated to the inside of my arm and just started moving around because it was no longer sort of tied down properly. [01:32:50] Speaker B: Painful. Is that painful? [01:32:51] Speaker A: It was extremely painful. And so I ended up having surgery on both of my arms to put my ular nerves in a more stable position. And then complications from those surgeries meant that. Plus the other issues meant that I had. Was. Was had limited use of my hands for about five years because the. The ulnar nerve innervates muscles in the hand. And then because those muscles were weakened, I got a lot of tendon injuries in my hands that I still have. But. [01:33:24] Speaker B: God, how are you doing these days? [01:33:26] Speaker A: Much better. So I, you know, have set up my life in ways that mean that I am less likely to injure myself. And I do a lot of physical therapy and. And as long as I do all the exercises I'm supposed to every day and basically strengthen the muscles around joints to stabilize them rather than relying on tendons, I mostly don't get injured. But I have to do a lot of. I still see a physical therapist to just realign all my joints. [01:33:59] Speaker B: What does this mean for your love of fencing? [01:34:02] Speaker A: I'm not really able to fence at the. This point. [01:34:03] Speaker B: Yeah, because you read in the. In the book you write about how, like, that was part of your grad school application. It was like, I really like the fencing facilities here. [01:34:11] Speaker A: Yeah, that's right. No, it was a big part of my life, and it's very frustrating not to be able to do it. I. I do some, like, virtual reality fencing is something I can still do because it's less impact. And my kids were doing some fencing and I was able to give them lessons and things like that. But. [01:34:28] Speaker B: So it's still part of your life, but you're not actively doing. [01:34:31] Speaker A: And I mean, but this also relates to some of the scientific, you know, things that I think about. So a lot of my thinking about limitations was shaped by this period, right, where, you know, this was me dealing with a certain kind of disability that meant that there weren't things that I could do. But then trying to figure out how do I work around this and how do I adapt and how do I do other things that, you know, I can, you know, I can do. So I, like, I think I had the experience that a lot of people have in their later life where, you know, I. Things that I loved were no longer things that I was able to do and I had to work out strategies for. Okay. One of the. One of those principles is if. If something was something I couldn't do, then I would find something else that I was excited to do that would replace. Place it. Right. So it wasn't just like a negative spiral of just like, being limited. And so a lot of that got me thinking about what are the consequences of limitation and how does that shape beautiful what we do and how our minds work? And you can think about. There's a sense in which humans are engaging with disability. It's just normal for us. Right. The constraints that we have on our lives and on our brains and on our capacity to communicate, our sort of normal mode of that is, in some sense, you know, limited. Right. And that shapes the kinds of Things that we're able to do as human beings. [01:36:08] Speaker B: All right. Well, yeah, I hope that you continue to heal and manage. And you're overcompensating, it sounds like, to these constraints. Okay, Last, last thing. I mean, things often go back to Aristotle. Syllogisms, Right. And logic and thinking about whether, like Leibniz. Right. Thinking about whether syllogisms could be captured with logic or. I probably got that wrong. But thousands, you know, years, thousand, multiple thousand years ago, they still go back to that. Like sort of the origins. Right. So a thousand years, let's say just a thousand years from now, are they going to look at your book and say, well, this is where we realized, oh, is this the laws of thought? We've solved it. Scale of 1 to 10, how confident are you that these are the laws of thought? [01:36:53] Speaker A: Right. Okay. Well, the first thing is, like I said, this isn't really a book that's supposed to be a. Here's a great discovery that I'm sharing with everybody. It's more, here's a tool that everybody can use, right, for understanding this moment. [01:37:07] Speaker B: It's a synthesis. [01:37:08] Speaker A: But it's not like, here's my new brilliant idea. Right? So it's not supposed to be that. [01:37:13] Speaker B: That's what the title. You could glean that from the title. Like, oh, man, this guy is going to really elevate it. Us. [01:37:20] Speaker A: No, my goal was really to provide more sort of literacy and fluency in the way that we talk about these kinds of concepts because they've become so much more important as AI is having the influence that it's having on society. So in terms of, oh, yeah, we've nailed the laws of thought. I actually think we have a pretty good characterization at that abstract computational level, right. Between logic, probability, decision theory. I think those are a good characterization of what ideal agents should be doing. I think the thing we're still figuring out is that next level down of how you make systems that have those properties, how that's implemented in humans, how that's different from the way that we're implementing it in our AI systems. There's lots of work to do to figure all of that out. It's also worth saying in Boole's investigation of the laws of thought thought, the two things he talked about in that book were logic and probability theory. 200 years later, we're still. [01:38:21] Speaker B: Look what you've discovered. [01:38:23] Speaker A: We're still, I think, building on that foundation that Boole provided, but I think we just have a much better understanding of how to cash those things. Out and then how does it they actually connect to human cognition? And so that was a lot of the work that was done in the 20th century and through the 21st was really working out. Boole didn't really care about about empirically testing his ideas. He sort of felt like it was self evident when he got something right. And cognitive science has really helped us to understand what the limits of those ideas are in explaining human cognition. And then what are some of the other pieces that we need in order to really think about capturing the things that are characteristic of how human minds work. [01:39:01] Speaker B: All right, Tom, thank you for talking with me. Thank you for the book and I wish you runaway success with it. Nice job with the book, so I appreciate your time with me. Thanks. [01:39:09] Speaker A: Thanks Paul. [01:39:17] Speaker B: Brain Inspired is powered by the Transmitter, an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advance research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives written by journalists and scientists. If you value Brain Inspired, support it through Patreon. To access full length episodes, join our Discord community and even influence who I invite to the podcast. Go to Brain Inspired co to learn more. The music you hear is a little slow jazzy blues performed by my friend Kyle Donovan. Thank you for your support. See you next time. [01:39:58] Speaker A: Sam.

Show Notes

Episode Transcript

Other Episodes

Episode 0

BI 113 David Barack and John Krakauer: Two Views On Cognition

Episode 0

BI 033 Federico Turkheimer: Weak Versus Strong Emergence

Episode 0

BI 232 How Should Neuroscience Integrate with Ecological Psychology?