BI 219 Xaq Pitkow: Principles and Constraints of Cognition

August 27, 2025 01:47:11
BI 219 Xaq Pitkow: Principles and Constraints of Cognition
Brain Inspired
BI 219 Xaq Pitkow: Principles and Constraints of Cognition

Aug 27 2025 | 01:47:11

/

Show Notes

Support the show to get full episodes, full archive, and join the Discord community.

The Transmitter is an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advance research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives, written by journalists and scientists.

Read more about our partnership.

Sign up for Brain Inspired email alerts to be notified every time a new Brain Inspired episode is released.

To explore more neuroscience news and perspectives, visit thetransmitter.org.

Xaq Pitkow runs the Lab for the Algorithmic Brain at Carnegie Mellon University. The main theme of our discussion is how Xaq approaches his research into cognition by way of principles, from which his questions and models and methods spring forth. We discuss those principles, and In that light, we discuss some of his specific lines of work and ideas on the theoretical side of trying understand and explain a slew of cognitive processes. A few of the specifics we discuss are:

0:00 - Intro 3:57 - Xaq's approach 8:28 - Inverse rational control 19:19 - Space of input-output functions 24:48 - Cognition for cognition 27:35 - Theory vs. experiment 40:32 - How does the brain compute with probabilities? 1:03:57 - Normative vs kludge 1:07:44 - Ecological neuroscience 1:20:47 - Representations 1:29:34 - Current projects 1:36:04 - Need a synaptome 1:42:20 - Across scales

View Full Transcript

Episode Transcript

[00:00:03] Speaker A: The hope is that you can find some principles that make things understandable. In a sense, the only things that are understandable are the non Kluges. Right. The only things that are understandable are the principles. And so this is a really important and I think, underappreciated element of what we have to do is we're not trying to, in my view, we're not trying to come up with the one mechanistic explanation of something. We're trying to find a class of equivalent explanations that have some shared properties. To me, this is the biggest gaping hole in neuroscience is that we don't understand how learning works. All of the machines that we use for learning, they're doing gradient descent these days. Like that's basically what they do. And, and the brain doesn't do gradient descent. Maybe it approximates it. What are the approximations? What are the constraints? We don't know. And we don't know because we can't measure it yet. [00:01:10] Speaker B: This is brain inspired, powered by the transmitter. You think you have principles. Zach Pitko has principles. Zach is my guest today and he runs the LAB LAB at Carnegie Mellon University. LAB here stands for LAB for the Algorithmic Brain, and an acronym for that is lab, which stands for LAB for the Algorithmic Brain, and an acronym for that. Well, you get the point. Zach is a theoretical neuroscientist with a background in some experimental neuroscience. As we talk about, he dabbles in. I think he actually describes himself as a dabbler. He dabbles in many endeavors, but the main theme of our discussion here is how he approaches his research into cognition by way of principles from which his questions and models and methods spring forth. So we discussed those principles, and in that light we discuss some of his specific lines of work and ideas on the theoretical side of trying to understand and explain a slew of cognitive processes. A few of those specific topics that we discuss are how, when we present tasks for organisms to solve, in order to understand some facet of cognition, the organisms use strategies that are sub optimal relative to the task, but nearly optimal relative to their beliefs about what they need to be doing, something Zach calls inverse rational control. We talk about probabilistic graph networks. We talk about how brains use probabilities or how brains may use probabilities to compute different ways they could use probabilities to compute. And one of his newer projects is ecological neuroscience that he has started with multiple collaborators. And these just touch on a few of the many projects that he is running, has run, and is interested in. You can learn More about his principles and about his work in the show notes at BrainInspired Co podcast 219. Thank you so much to my Patreon supporters. If you support the show, you get access to all the full episodes, the full archive. You can join the Discord community. You can access a bunch of complexity group meetings. That is a bi weekly, ish kind of discussion group that we've formed around the foundational papers of complexity. Look for my David Krakauer episode a couple months ago if you want to learn more about that. Anyway, I hope you're doing well out there. I'm going to have a new studio soon. In a couple months, it won't be this tiny, tiny closet that you see before you. All right, enjoy. Zach. Zach. I'm going to give it a shot. And. And then you can correct me. I'm going to, in the broadest terms possible, describe what you do. And then you can. Or at least a common theme. The broadest possible. Common theme. Right. [00:04:12] Speaker A: I like it. [00:04:13] Speaker B: Okay, so you study normative models under realistic assumptions to discover or infer cognitive functions. So the realistic assumptions being like metabolic cost, limited resources, the computational cost of cognition, rationality under suboptimality, and so on. So that was super brief. Where did I go wrong? And how would you correct me? [00:04:42] Speaker A: Oh, that's a, that's a great start. That's a major theme of what we're working on in the lab. I think those things are really fun. I've definitely been motivated by principles and different kinds of principles. It's a. Normative principles are a pretty natural one. But there's also, and you mentioned some non normative principles in the sense of constraints that come from inside the brain. How do we end up with those constraints? Some of them are still principled, like in physics. That was how I originally got interested in this whole endeavor. [00:05:13] Speaker B: Wait, stop there. What do you mean? [00:05:16] Speaker A: I saw a talk by Bill Bialik when I was an undergrad and, and he showed how you could use physics to understand the brain. And I was like, really? Yes. That's amazing. And so that's. That was the beginning for me. And so some of those constraints, like we can see as well as like the, the dimmest light possible according to physics. That's. That's a constraint that comes from physics. The balance in our eyes between resolution and refractive blur, diffraction, that. That comes from, that comes from physics. Those constraints come from physics. But then there's some other legacy things that show up that doesn't. Maybe it doesn't have to be that way. There's, you know, some architectural structures that are there, and I'd like to understand those as well. Those are harder to get at because you don't have physics to point to. That's just the legacy of our evolutionary history and the ecological niches that we occupy. So how can we understand something of how we end up? Like, for example, one concrete example is that learning and plasticity in the brain is largely local, and that's a constraint that comes from our brain wiring. Other systems, like AI systems, are not constrained to have local learning rules. So that's a particularity of our brains and our biology and our systems. Bilateral symmetry inherited from a long time ago. Those are things that are not necessarily optimal in some sense, but there are some influences that come in. [00:06:56] Speaker B: Go ahead. [00:06:57] Speaker A: I was going to say another major effort. So those are all kind of normative or normative adjacent things that we work on, but we also have some other types of things that we work on. I'm known for doing a bunch of work on correlations of different sorts, like what those do for you, where they might come from. And we now have been working on some really fine scale things like the connectome of a mouse brain and some very large scale things like human language and how that's represented in the brain. So it's a pretty wide range of things. But I would say that the bread and butter, like the core out of which these other spokes emerge is indeed these normative models. Because I really like principles. [00:07:46] Speaker B: Yeah. Okay, so let's stick with the normative models for a minute because you were just describing, you got turned on by physics. And yet these normative sort of principles are built on these things that have happened through evolution, the structures that we can't control. And so you have these biologically messy things that then somehow you view them, you view the algorithms and the computational processes that they're enacting built on top of them or within them or through them as like normative toward a goal. Like, how do you mesh the. How do you mesh. The normative stuff is built on non normative stuff. In other words, if that's a fair characterization. [00:08:31] Speaker A: Yeah, yeah, that's right. The machinery is non normative and it's pushed in those directions, like the directions of optimality. But whether it actually gets there is a separate question. [00:08:42] Speaker B: Well, you would say no. Right. [00:08:44] Speaker A: And almost it never really gets there. I mean, it's. I mean, in some cases it does. There's those, the most beautiful examples. But there are plenty of cases where it's not going to be exactly optimal. And then you can say, well, can we understand this anyway as, as a principle? And so one of the, one of the ways that we've tried to formulate that is that like is through something we call inverse rational control where we say that the animal is not optimal globally and it's certainly not optimal for the experiments that they're being put into, but they might be acting in a way that's self consistent and doing the best they can under their assumptions. So then you can define a set of assumptions, what it thinks it's trying to accomplish, what its goals are and then say, well, it's optimal within that. And you might be mistaken. Right. [00:09:40] Speaker B: So who might be mistaken? [00:09:41] Speaker A: The animal. The animal? Well, both. I mean the researcher might be mistaken about what's important for the animal. [00:09:46] Speaker B: Sure. [00:09:47] Speaker A: And the animal might be mistaken about what's important for the researcher. Right. In terms of like the experimental design. Like oh, this happens this often. These things are independent or these things are correlated. So whatever the animal is thinking about the structure of its little world that you've put it in the, that those assumptions may be wrong. And so we can make the hypothesis that the animal is still under those wrong assumptions trying to behave as well as it can, but it's not going to look optimal from the outside point of view because it's doing things that don't make sense according to the task. You have to find the way in which they do make sense, which we call rationalizing. Just the same way as like, why did you, you know, why did you brush your teeth before you eat your breakfast? Well now you have to come up with some rationalization of why you do that instead of the other way around. And so maybe that's a principle that relaxes the idea of optimality but doesn't lose it doesn't go into, oh, anything goes. [00:10:52] Speaker B: Well, it doesn't relax it, in fact it points directly toward it. It's just not the optimality that the task demands. Right? [00:10:59] Speaker A: Yeah, that's right. And, and the, the animal really like even in. So in some cases it might be the ecology that it is optimal under a different world. Like if you were actually in the savannah running around and gathering fruit, this is the right thing to do. And then it would be optimal in a different task, a different environment. It may also be not optimal in, in its natural environment. There could be some bad assumptions there too. And so then it would be optimal in some kind of fictional environment. So it's a lot of these things in the normative models, like you're talking about probabilistic reasoning, you're talking about reinforcement learning. A lot of them boil down to constraints, not just optimality. Like if you could do absolutely anything, you, you always have some constraints. And then do you fold the constraints into the principle or the constraints, some side thing? In fact, mathematically you can write them as equivalent, but I think it helps conceptually to separate them and say, here's a constraint that we have and we're going to work within that constraint and then we'll call the rest of it optimal. [00:12:12] Speaker B: Yeah. So, okay, I'm trying to understand. So in some sense, okay, the history of neuroscience is like task based, right? A large history, right? You design a task, you reduce the, you provide a lot of constraints, you reduce the preparation for the organism, whether it's head fixing or you have two boxes and you can look under both and with different reward distributions, et cetera, like in the inverse rational control. But what you're saying is, okay, that's fine. So there's a lot of criticism on this task based thing because, well, this is not ecological, this is not what organisms were designed to do, to look at these boxes or whatever and see if there are rewards under them. But what you're saying is kind of that's okay because they're still optimizing, but they're optimizing for a different thing that evolutionarily they are more prone to do or evolutionarily selected for. And we can infer what they're actually trying to do, which is in some sense maybe suboptimal or less related to what we want them to do. But we can still study it. [00:13:22] Speaker A: Yeah, yeah, that's right. I mean, this whole task based thing is really critical for neuroscience because we want to control things, but it depends on how, like the complexity of the task is a knob that we can turn. And we've changed over time from the very simplest tasks in the beginning. Even without tasks like the animal is maybe unconscious and you just have the eyes open and you're looking at what the brain does when the animal is out and it still does stuff, starting to move towards simple tasks like you have to choose A or B. And as we've gained more data, we've been able to dial up the complexity. Now we're not yet at the point, I think, where we can do benchmark tasks in machine learning styles where you might have a robot that's going around and like loading the dishwasher or swinging from vines. I don't know if there's any robots that swing from vines yet, but probably. [00:14:19] Speaker B: But that'd be cool. [00:14:20] Speaker A: I want to see that someday soon. You know, certainly running around on rough terrain. Right. There's those are. And maybe trying to acquire certain goals or for a real natural case we would have an animal that would be just in its natural environment, like climbing trees, interacting socially with other animals of the same species, finding food, mating, running away from predators, like playing. All of those natural things are things that we, we don't have enough data for yet to make that the task of interest. So I think we're always looking for this kind of intermediate level of task which is complex enough to reveal useful and interesting structure about brain computations. But it's simple enough that we can actually characterize it. So we've been shifting this way and I think I, I like to push a little more in the complex direction than most. And then you need to have a more complex analysis system or framework to interpret that data. But then the goal is eventually to move really towards naturalism. So it's an interesting tension. How do you navigate that? Some of my collaborators are really trying to collect massive data. Like Andreas Tolius is building this Enigma project. Where is collecting massive data in freely moving monkeys like this is the goal and having them really do all these kind of complicated interactions. But if you have massive data, then you can build some massive models like we've seen with large language models, but now of other sources. Sometimes they call them foundation models or frontier models. And so then with those big models, now you have a description. It's a descriptive model. It doesn't say what you should do. [00:16:14] Speaker B: Right. [00:16:14] Speaker A: It doesn't say how it's done. It just is like this is what happens and then you can try to analyze that further. And we've played a lot of games with those kind of models as well, trying to see if we had the. It's kind of a reformulation of the data in a sense. You have this massive data set that you compress into a descriptive model, but it's still the data. It's just reformatted in a much more sophisticated and potentially interrogatable way. [00:16:44] Speaker B: But are you making assumptions about what is being optimized to compress it into the descriptive model? [00:16:51] Speaker A: Usually no. I mean implicitly in some sense. But those are weak assumptions compared to the ones that we were just talking about with normative models. These are much more data driven models. They're just big neural networks that describe input output relationships. And then you can hope that they inside under the Hood somehow reflect latent variables that are relevant and interesting, but it might not because there's all sorts of equivalent ways of computing the same thing. And so this is a really important and I think, underappreciated element of what we have to do is we're not trying, in my view, we're not trying to come up with the one mechanistic explanation of something. We're trying to find a class of equivalent explanations that have some shared property and then understand what those properties are and how they relate computationally to the behavior of the animal and its sensory inputs. So when we do this in a big neural network model, you could say, hey, that model doesn't have anything to do with the brain. The neurons in this model are not the same as the neurons in the brain. [00:17:59] Speaker B: And yet. [00:18:00] Speaker A: And yet. Exactly. And yet we can still find some shared structure there. And, you know, that's. That's the challenge. That's the. The joy. How do we. How do we identify how we discover things with these new techniques? This is kind of getting to what I would classify as the field of neuro AI, which is a synthesis of brains and machines, where you're trying to use modern AI tools to understand the brain, as we're trying to use AI tools to understand everything these days, because AI is so powerful. But neuro AI has a particularly interesting spin on this because AI came originally from neuro. Right. And so we're continually trying to give back new ideas to AI to say, hey, it wasn't just like convolutional networks that inspired AI, but here's some other detailed structures that we could use. And sometimes you can find interesting parallels that way. There's the famous quote by Feynman that a lot of people know, which is, what I cannot build, what I cannot create, I cannot understand. [00:19:10] Speaker B: Yeah. [00:19:10] Speaker A: And so this is a test bed for our understanding of brains. If you really think you're understanding something about the brain. Oh, yeah. Make. Make something intelligent. [00:19:21] Speaker B: I just had the thought. I mean, so you just mentioned that, you know, AI came from very rudimentary neuro. Right. And so we're, us neuroscientists are constantly banging on the door, hey, listen to us. You need to do this. Right? And they just forge ahead successfully, and then we use their models to. Right. But it struck me like a different way to go. So you're saying you look at the models, you look at the innards, the inner workings, and you might find some latents, and then you can possibly relate those latent states to the way that Organisms are enacting their cognition. But you also mentioned that there are essentially an infinite number of ways to solve a single problem. I wonder if a useful exercise is to solve the problems in ways radically different than neural networks do. Although maybe that is kind of the history of neuroscience, which has, I don't want to say failed, but failed to make like, super great. Failed to solve the brain. Right. You know, and maybe like these psych. Psych. Mathematical psychological models that are fairly simple accumulator models, drift diffusion models, things like that. Maybe those are sort of what I just posited. You know, I'm just wondering, like, how far, how far away from brain like, activity can you go to solve to be within that class of. That you mentioned that class of solutions for a given optimization problem. Does that sound like a. That sounds. I don't know. I don't know how to. How you. How one would move forward with that. [00:21:02] Speaker A: Yeah, so it's a good question. It's a whole family of questions. And so exploring the family of, let's say, equivalent input output relationships is an interesting one. And there are two ways that something can be equivalent that I think are critical to differentiate. One is that they are equivalent over everything that we've tested so far. [00:21:29] Speaker B: Right. [00:21:30] Speaker A: And the other is equivalent in all ways, even ways that we haven't yet tested. [00:21:35] Speaker B: Well, there's a third category which is like the super narrow equivalent only for this very particular benchmark that we're testing. [00:21:42] Speaker A: Right, Right. Yeah. Okay, good. So now we have a spectrum, and this spectrum of how widely are we pushing this system out of the training regime? And so if we're equivalent only in this narrow one task case, then we might be brittle. Right. We'd have a brittle model and we'd say, oh, how well did we do at this, like capturing things? And then somebody comes along and says, oh, well, you know, you used gratings, test it with, you know, I don't know, random noise or natural images, and it breaks. And then you say, well, yeah, you definitely have the wrong model. [00:22:19] Speaker B: For what? The wrong model for what? [00:22:21] Speaker A: For. For the brain. Right. You're trying to fit the brain or you're trying to solve the. I mean, it could be also for machine learning. Right. Your system does fine at this one weird task, but it doesn't do fine in general, and it doesn't do fine when you deploy it in realistic conditions. So we're always looking for the test conditions that you really care about. And that's a moving target. In fact, in the beginning, people were trying to Classify binary digits. Now we can, I mean, even linear models classify binary digits with like, I think it's 88% accuracy or something in MNIST, but pushing it towards higher and higher performance until that benchmark no longer seems like the right test. Because, okay, we can do that reasonably well. Let's try something that's a better test and then we move towards. [00:23:16] Speaker B: This is Goodhart's law, is that right? [00:23:18] Speaker A: I know this one. [00:23:19] Speaker B: Goodhart's law states that once a metric becomes a target, it ceases to be a good metric. [00:23:26] Speaker A: Ah, yes, that's not what I'm talking about. But that's a good one. That's a really important one. And Russell Kudinov actually had maybe a corollary of that where he said whoever makes the benchmark wins. [00:23:40] Speaker B: Right, yeah. Okay, that's a good one too. So AI wins. [00:23:46] Speaker A: AI wins. But we're trying to move towards better benchmarks. And the benchmarks are constantly evolving. In fact, major advances were made by developing benchmarks. Like when Fei fei Li developed ImageNet, that was a huge spur for the, for the field. And so I think a lot of these large scale data collection efforts, like we have it at some big labs, those are going to push the, push the field forward, pushing neuroscience forward, because then you have targets that people can really test things on. That has not been the tradition in neuroscience for a long time and now it's becoming so. So this is, I think, a major sort of sociological distinction between machine learning and neuroscience that's really very fruitful when you import the machine learning, benchmarking style thing into other fields like neuroscience, for example. [00:24:49] Speaker B: I don't know if this is a good time to pivot and ask you about another common theme in your work, which is that, and again, correct me if I'm wrong, that organisms spend more energy or effort actually on metacognitive things like discovering. So there's a task at hand. But now they have to figure out the constraints and what those constraints and what the probabilities of those constraints are and how much energy they have to spend. So there's all these factors that go into solving a given problem. Would you say that organisms have to spend more cognitive effort on arranging and figuring out those limitations and constraints than actually the algorithm to solve the problem? [00:25:43] Speaker A: That's a good question. I don't know the answer in terms of like sheer brain power. My suspicion is that we spend a huge amount of our, of our energy at least doing primary sensory processing. But the, and the cognitive stuff is, I mean, it's lower It's a lot lower bandwidth. The structures that we're reasoning about are much slower, they're much lower dimensional, but they are right. So you look at an image, you get, I don't know, 100 million pixels per eye, basically. Whereas the kind of cognitive variables that we have are certainly the output that we, that we have is only. Is less than a thousand dimensions. It's just every muscle that we have, basically. [00:26:41] Speaker B: Well, and you just said a static picture because in reality we're looking at static pictures every, every time we move our eyes as we move through the world. And so it's that what, the hundred million every, Every few milliseconds. [00:26:56] Speaker A: Yes. Although a lot of those hundred million pixels are the same. [00:27:00] Speaker B: That's from. [00:27:01] Speaker A: From moment to moment. So you have to really, like, you have to be careful about how much, how you're characterizing the information content of these things. [00:27:11] Speaker B: And we have a fovea and extra, extra foveal. Like we have a fovea where we're actually only paying attention to a very small. Paying most attention or getting the highest fidelity of sensory input, at least in vision of a very small area of an image. Of course we're talking about vision because that's all. That's all anyone. That's the history of AI and neuroscience. It's like almost all vision. [00:27:34] Speaker A: Yeah, yeah, that's a big one. It's actually been fun for me that I've gotten to work on a few different systems over the years. Vision, some audition, some proprioception. And that's one of the joys about being a theorist is that experimentalists have to invest a huge amount in a particular system with this equipment and everything. But it's just math. I mean, you need a math. Yeah, give me a paper and pencil and. And we're. We're off to the races. [00:28:05] Speaker B: I've switched my, in my career, so I used to be an experimentalist, neurophysiologist, and I was always super jealous of theorists. We're going to take a little theory sidetrack here and, and now these days I do like, it's all like computational analysis that I do. And we're gearing up to do some, some more experiments in lab with, with mice. And I'm sort of hesitant because I'm like, oh, I just want to do the computational stuff. I was right to be jealous, I think, of theorists. I mean, if I ask theorists, and I'll ask you this too, you know, they say, well, we have like, it takes us a while too, but you don't run in the same kinds of problems. You run into computational problems. You don't run into, like, hardware problems and organism problems. And it's like, it's a huge mess. So. So good for you that you went the theory route. Do you agree with me that in some sense you not have it easier because you have to think just as hard or maybe harder, but in terms of productivity. And you can partner with any experimentalist that's willing to partner with you, and maybe you have to convince them to send you their data. That used to be a bigger deal than it is now. But how would you characterize being a theorist, knowing experimentalists that, you know, are you over there kind of laughing in your office and I can just do sit on my computer and I don't face the same challenges. [00:29:31] Speaker A: I'm loving life over here. Like, it's definitely a lot of fun to do this job, but I grappled with the same thing that you were grappling with as a graduate student. I started off like, I did some of everything. And so I did some experiments, neurophysiology experiments. I did some psychophysics. And I just got tired of when things broke. Not, you know, it. It being some weird. Like a wire was loose or the solution was old, or things that seemed really out of control and to be so meticulous that everything was pristine. It was just. It didn't suit me. And I found myself, like, if I looked back and say, where am I spending my time? I was just spending my time in front of the computer or doing some analysis rather than being in the dark room doing some vision experiments, which were interesting. And I'm really glad that I did it. And I think a lot of the experimentalists that I work with are also glad that I did it, because it gives you an appreciation for the difficulty and the messiness of data. So, you know, theorists who come from, let's say, some disciplines where things are more pure, you assume that you can. Physics is a little bit of hacking. It's like, okay, physics is to math as hackers are to computers. It's a complicated dance because, like, okay, there's two ways of where that interaction can go. One is an experimentalist comes to you with some data and says, hey, what's this mean? [00:31:01] Speaker B: That's rare, right? Isn't that the rare. [00:31:03] Speaker A: No, that's more common these days. Yeah, I think that. I mean, assuming that they're coming to you, that you're like, in a conversation, because they have the things that they've been working on and they've been thinking about the other way is that you have a theory you want to test, right. And then you have to like convince an experimentalist here. Would you please dedicate six months or a year of your life to testing this particular wacky idea that I had. Then when you actually go and work with an experimentalist who wants to like, you can make it a team, then you can say, let's co design this experiment. We have these ideas. I know you have constraints, experimental constraints. These things are easy, these things are hard. Let's figure out if we can find the right combination of things. And that's really fun and that's really fruitful. It also is a little bit amusing to me that a lot of PIs who are experimentalists are really operating very much as theorists in a sense, because they're not doing the experiments, they're designing the experiments. [00:32:00] Speaker B: Right. [00:32:01] Speaker A: So in that sense, when I'm collaborating with experimentalists, I'm kind of working like an experimental PI. [00:32:08] Speaker B: Wait, wait, why is this amusing? So this is kind of the way the history of neuroscience is. Someone is experimentalists who had their ideas that they wanted to test. So you set up an experiment and that idea is somewhat theoretical, Right. It's not just like, hey, let's see what happens. [00:32:23] Speaker A: So I mean like the big boss is sitting in the office and all the graduate students are the hands. [00:32:29] Speaker B: Right? [00:32:29] Speaker A: Okay, that's what I mean. But some PIs actually, you know, there's a few of them who really still like to go and do the experiment. That's true satisfaction out of, out of being there. And of course you get a lot out of it. You can see things, you know, how the systems evolve. But you know, there's. Everybody has time constraints. Those time constraints are often pretty strict. [00:32:50] Speaker B: Okay, so getting, getting back, you said you often when you're collaborating with experimentalist labs, the PI maybe is not doing the experiments like the, the lab personnel are doing the experiments. And then you end up feeling more like an experimentalist. Is that what you were saying? [00:33:06] Speaker A: Operating like the PI in an experimental group? I mean, it becomes a, you know, we're, we're co designing the experiment. And usually that means that for the PI, the experimentalist needs, needs to know what all the equipment is. You know, they're, they're making sure that things are in the right place, that the resources are available. But I like, I don't need to because I'm not building that lab, I'm not building that equipment. But I like to know it. I think, I mean, first of all, this the technology that we have these days is so cool. [00:33:37] Speaker B: Yeah. [00:33:38] Speaker A: And. And second, it just helps me understand the constraints of the experiment that much better. [00:33:45] Speaker B: Okay. Yeah, fair. All right. So we went on this big tangent about theory versus experimentalist. But one more thing before we, before we pivot again, it. It used to be maybe I just have this old conception that I haven't let go of. Right. So you have a PI. They have their own lab, they have their own grants, they have their own projects. Usually the things that they want to do are overflowing with respect to what they are doing. And along comes some theorist and says, hey, why don't you. Here's my idea. [00:34:18] Speaker A: Here's another one. [00:34:19] Speaker B: Here's another one. This is, you know, it's beyond what you can do right now, but I need you to order this new microscope and I need you to fit, you know, outfit, a new darkroom, you know, things like that. [00:34:31] Speaker A: So there's no work like that. [00:34:33] Speaker B: No, I know, but that's how. So how does it. Because there could be like a certain kind of tension there, right? [00:34:39] Speaker A: Yeah. I mean, a fruitful collaboration like that is going to happen where you, you know, what the person is interested in, you know, what their technical capabilities are. Every once in a while you say, like, oh, you know, it'd be cool if we could do this. And maybe the. The experimentalist is like, oh, yeah, that would be cool. Let's do it. And otherwise, you know, sometimes, most of the time it's like, yeah, that would be cool. I wish I could do it. And so then you just work within the opportunities that you have, and you try to make the most of what is usually really powerful technology. I mean, the people I've gotten to work with have incredible skills in measuring stuff. And they also come. I don't want to understate the ideas and interpretive skill that they already have. So coming, there's. There's usually complementarity. Like, they'll know a lot of things that I don't know, and they can teach me about this and, and vice versa. And so, you know, I. I have a lot of math skills and, And I, I know, like, sets of models and theories that we could bring together, and it can help synthesize some ideas, and they probably know some particular literature way better than I do, and they know what kind of signals you might find in what part of the brain and what this animal can do and what it can't do. And, and all of that is critical to making a good collaboration. [00:36:02] Speaker B: Yeah. Many, many episodes ago, I Was talking with, I think it was Nathaniel Daw. So he's on the theorist side as well. And he related to me that he and his colleagues were sort of battling whether they should start a wet lab, whether they should start an experimentalist lab so that they could apply their own theories to it. And I was like, no, no, no, you don't want to do that. [00:36:25] Speaker A: So, yeah, Vijay thought about that, tried to do that too. I mean, he did that. It was just tough to get people to test his theories. He was right. I mean, sometimes there are some easy experiments that, that we could run that would be fun to do. The most common of these is just human psychophysics. Like, here's a game. Let's just collect some data of a human playing a game that's easy in relative terms. And so, yeah, if people are inclined to do that, that would be great. [00:37:01] Speaker B: But would you say that as neuroscience matures a little bit, it's becoming more like physics in that physics historically has been sort of happy with. There are theorists and there are experimentalists, and they can collaborate, but they're kind of separate. Whereas in neuroscience, the past has been the experimental experimentalist is the theorist, and it's one person. And there had been this tension when I was a graduate student about 700 years ago, there is this tension between theoretical labs and experimental labs. Is that dissipating where people are more comfortable? No, it's not. [00:37:37] Speaker A: I think, I mean, I think that the division is indeed becoming stronger. I think it has to do with specialization. I mean, I think it's more than just every one of these occupations is subdividing. So even within theorists, you have people who are specializing in, let's just say, deep learning stuff and others who are going to do dynamical models and others who are going to do. [00:38:04] Speaker B: I don't. [00:38:04] Speaker A: Know, statistical physics kind of models. Like, they're, they. People pick their specialties and sometimes they work together. And that could be really cool where you have somebody who has a. A math way of thinking about things, but it only works in linear models. And then if you want to go to a nonlinear model, you need to either adopt that, that capability or work with somebody else who does deal with more complicated, let's say trained models, right? So the AI style and the math style, they're working together now in fruitful ways. And likewise, the experimentalists are coming up with teams where here this person is an expert in molecular biology and this person is an expert in neurophysiology. And so then you can both manipulate the neurons at A molecular level. And you can do these. You can record from them at a mesoscale level. And then maybe you also have somebody who is a cognitive scientist who does a lot of cognitive behavioral experiments. And so all their expertise comes together and you can have a much richer set of measurements that are related to each other and a much richer data set that you can connect in different ways and draw insight from. So I think it's actually continuing to subdivide. [00:39:18] Speaker B: Yeah. So in other words, continuing to get healthier and if we consider the history of physics healthy. [00:39:23] Speaker A: Yeah, yeah, I think so. [00:39:25] Speaker B: Did you pick a specialty? You said everyone chooses a specialty, but you're kind of all over the place. [00:39:30] Speaker A: I'm all over the place, yeah. I, I've always been a dabbler. Like, I dabble at a lot of different things. I dabble at musical instruments. I dabble in different scientific fields. [00:39:43] Speaker B: So. [00:39:44] Speaker A: But, you know, there's definitely themes that emerge, and we touched on some of those themes earlier on. There's a set of tools that I have developed. I keep acquiring new ones because, you know, it's good when things can be question driven. Like, how do you answer this question as opposed to, here's my hammer, where's some nails? [00:40:02] Speaker B: Well, you love to learn also. [00:40:04] Speaker A: I love to learn, Yeah. I really do. It's like, it's my favorite thing about this, this job is that I'm constantly learning. [00:40:11] Speaker B: It's amazing. Yeah. [00:40:12] Speaker A: And the people that you get to learn from are amazing too. Right. There's so much deep knowledge in, in the field and the chance to interact with all these other people who have their own ideas and creativity. That's. That's amazing. That's a wonderful experience to have. [00:40:30] Speaker B: Okay, so. All right. Shall we pivot to probabilities and brains? It's a hard, hard pivot. We were just talking about an experimentalist and theorist collaborations, and you've been on this generative adversarial collaboration. And I talked to Ralph Hefner about this a couple years ago, and he, he. So this is to figure out how the brain computes with probabilities. [00:40:55] Speaker A: Yeah, that's right. [00:40:56] Speaker B: And so there are different theories about how probabilities are represented and used in networks of neurons in the brain. And they all have some evidence for them, they have some evidence against them. And it depends on how you look at the data, et cetera, et cetera. And so you have these degenerative adversarial collaboration is a bunch of people with maybe opposing is too strong a word, with alternative views. On how probabilities might be represented in the brain, who came together. And when I was talking to Ralph with this maybe two years ago, he was really appreciative of it. He was surprised at how well it had gone, how well everyone had gotten along, and how productive it had been, and how much he had learned from it. So what is the overall issue with. Why is it difficult to know how brains compute with probabilities? And then tell me a little bit about the collaboration and what you guys have come up with. [00:41:59] Speaker A: Sure. Yeah. This is a really rich topic that we could go on for hours about. One of the problems is what people call Bayesian just so stories. So if you want to say the brain is doing some probabilistic thing, weighing sensory evidence with uncertainty, extracting latent variables, acting appropriately, you can always construct some probability distribution as your prior under which your data would be the right thing to do for probabilistic inference. [00:42:34] Speaker B: Is this the same as what we were talking about earlier, where there's a thousand different solutions to a given problem? Is it related? [00:42:41] Speaker A: Yeah, it's related to that. There are, I guess I would call these degeneracies. This is a very specific one. Right. But there are a couple different degeneracies. One simple one to characterize is, let's say that you're in the framework of reinforcement learning, where you're trying to maximize some utility, but you don't know exactly what the real world is. So now you have two things. How important is it? If the real world is in one state and you guess a different state, how bad is that error? Right. So that has some consequence. [00:43:18] Speaker B: This is with a defined utility, not. [00:43:20] Speaker A: This is. Yeah. So if you define. So there are different ways of measuring those utilities. Like, the utilities could have different consequences, different utility functions. Right. And you can also have different probabilities of those correct or incorrect responses. Right. The. The state of the world, what we care about is the product of those two. The utility times the probability. You're weighing the utility by how often it happens. And then you take your expected value. So if you're like, if you're taking a wager and you could have a 50, 50 chance of $200 or $0, that's equivalent to 100% chance of $100. Right. You just take the average that your average return, average utility that you're going to get there. But you can imagine that there's different ways of changing the utility function and the probability that give you the exact same balance. The product is the only thing that matters. So any ways that you get that product, you could make, divide the utility by half, double the probability, you know, things, things along that line. And so you can't. So that's one degeneracy that is not possible to distinguish in any cases because there's always going to be that. So the Bayesian just so story is like, oh, we found this explanation of the data and it's Bayesian, it's optimal probabilistic inference under this prior. Well, is that a real reasonable prior? You just made that up. Maybe it was a very jagged, weird shaped probability that was just the thing necessary to get your data to work out right. So how do you resolve that? You resolve it by testing for generalization. You look for something new and you make a commitment to your model, your probability, the Bayesian prior probability, which says what things are likely to happen. So you say, I'm committing to that. And now my model of how the brain represents probabilities has to remain consistent. Like I need to still explain the data when I test a new situation that still obeys my committed model. [00:45:34] Speaker B: So let me just really strip this down. So 2 times 4 is the same as 4 times 2. You get to the same solution. But if you say the prior is 2, then you need to use that same prior, 2 instead of 4. You need to use the 2 in different domains to get the same answer. That's your commitment. [00:45:54] Speaker A: Exactly every time. So you've made a commitment to that too. And that too represents, you know, the structure of the world, the things that you, that your model assumes that are likely to happen. So in all of these cases, when people have said, oh, the evidence is favoring this particular interpretation, oh, the evidence is favoring that interpretation. They, they may be using different generative models of what variables they're trying to do probabilistic inference over. And they could be using different probabilities. So we're not comparing apples to apples. And so in order to do a fruitful comparison, you need to compare apples to apples, you need to be sharing, you need to make a commitment to a model of the way that the world works. And then you can evaluate different models of the way the brain weighs probabilities. But you can't do the second part, testing whether these different models are representing probabilities until you make a commitment to a generative model. And that language that has not been, I think, widely appreciated. So this adversarial collaboration, in the beginning it was like, well, you know, you're making models of orientation, or we're making models of additive component features in the world. So yeah, those are different generative models. And so you're going to explain things differently. Ralph actually has shown quite beautifully that you can take one of those generative models and have a sampling based model for image patches. And then if you look at it from the point of view of some of his adversaries who are looking at probabilistic population codes about the orientation of some line or stripe pattern in the input, then you get their data explained. So you can explain the same data, different data with the same model in these two different ways coming up, reconciling that is hard work. And finding out that what generative model commitments we've made is part of what everybody needs to do when they're describing their own models. [00:48:05] Speaker B: Wait, you said that you use the same model to explain different data sets, but in one instance you're using sequential sampling, which is one theory of how the brain uses probabilities, and another one you're using the distributed population probabilities. So those are two different models, right? Or is it actually. [00:48:22] Speaker A: Well, it's different interpretations of the same data. So you can have one underlying distribution that you look at it from this perspective when you ask what is the representation of orientation or what is the representation of the image patches? And it's the same mechanism, it's just one thing that looks different in these two different ways. [00:48:45] Speaker B: And you have to reconcile how those two different stories can end up doing the same thing. [00:48:51] Speaker A: Yeah, there's a third group. So in this adversarial collaboration, there were really three different groups that we tried to get represented. These are prominent representations of probabilities. But one is sampling, which basically means if you see something out in the world and you're trying to interpret what caused it, you roll a dice and then some fraction of the time you'll come up with one interpretation. Another fraction of the time you'll come up with a different interpretation. And that's just constantly happening by our brain. Our brain is rolling dice. It's coming up with alternative interpretations. And the frequent, the, the amount of time that you're spending with one of those interpretations is the probability. It's right. That's the sampling hypothesis. A second one is probabilistic population codes. And the third is distributed distributional codes, which are really funny, funnily similar terms for probability representations. And in some ways they have a lot more, they have a lot of similarities. Like they differ by whether you're using, whether you're representing probabilities or log probabilities directly and by directly, I mean linearly through the neural activity. And there's some arguments about which one of these is better and which one of these is worse. They are fundamentally complementary. And I think that if you find one, you're going to find the other because they're good at different computations. In doing probabilities, you have two operations that you have to do all the time. You have to multiply and you have to add. Right. The multiplication is when you have two independent things happening. Like you roll one dice and you flip a coin. The probabilities of both things happening are the product of each one separately. So that's the probability rule. And the other one is only one event happens. Like if you roll a dice, you got a five or you got a six, you didn't get both on one die. And so those, in order to compute the probabilities of that, they have to add up to one, right? So that's the sum rule and the product rule of probabilities. And you have to do that constantly. When you see a new piece of evidence coming in, if it's independent, then you're going to multiply your probabilities, and that's the way you're going to accumulate information. These different codes are good at different ones of those computations. And so it's natural to jump back and forth between them. And in fact, there's a lot of mathematical beauty in this, that they have this kind of complementarity. But one of them prioritizes interactions. And this is the probabilistic population code. And so this is a little bit of a technical thing, but it's fundamental to the representation of structure. Not just probability, but structure. [00:51:48] Speaker B: What does that mean? What do you mean, structure? [00:51:50] Speaker A: So there's a lot of ways of characterizing structure. And structure is really critical in the way that the brain understands the world. I think it would be maybe helpful to talk more about the different kinds of structures that there are. But one that I like to work with is called probabilistic graphical models. And these represent, in one version of them, they represent causal interactions. So you can have, like right now I'm sitting on a chair. The chair is sitting on the floor. The floor is held up by some walls, the walls held up by some foundation, the foundation held up by the earth. So there's an indirect chain of causation. So I'm held up by the earth. Right. Indirectly through this long chain, through those. [00:52:32] Speaker B: Nodes in the probabilistic graphical model chair. Each one of those foundation earth. [00:52:38] Speaker A: Exactly, exactly. Each one of those is a node and the edges between them say what are the interactions? So like the foundation is not directly interacting with the chimney or the, I should say maybe not the chimney. The roof. The roof is held up indirectly by all of these other things. And so that structure, I hypothesize this is a key hypothesis that I'm really very interested in testing, that that structure is known by the brain and used by the brain in its computations. It's pretty natural. It actually provides a good way of restricting the possibilities of what computations you have to do. So not everything is possible because if everything is possible, then you have to consider all those possibilities. Here you can restrict your possibilities in a structured way, probabilistically structured. You can also do the meta level thing where you have a probability over different graphs. [00:53:35] Speaker B: Right. [00:53:36] Speaker A: In fact, there's a nice way of doing that in a meta sense where you have a dynamic graph interpreted as a graph with hyper edges. [00:53:48] Speaker B: Okay. [00:53:48] Speaker A: And it's really fun. It's beautiful. In fact, I made a 3D printed model of the, of the natural statistical shapes that emerge out of this. It looks like a rounded tetrahedron. [00:53:57] Speaker B: Is this what you're going to show me? That's in your office at work. [00:54:01] Speaker A: I can send you a picture of. [00:54:02] Speaker B: Yeah, send me a picture and I'll put it up in the video. [00:54:04] Speaker A: Yeah. So this is a little three dimensional probability distribution. And the cool thing about it is that you can have, let's say two variables, two nodes in your graph, X and Y that are directly interacting at one time and they're disconnected another time they're not directly interacting, depending on the value of a third variable. [00:54:25] Speaker B: Right. So meta variable, kind of. Yeah, yeah. [00:54:29] Speaker A: And it's mutual and there's all sorts of symmetries, but it creates this little tetrahedron where from different edges, like the front edge is saying that these are positively correlated or positively interacting. The back edge, if you're at a negative value of your third gating variable, has them kind of the other way. And if you connect the dots, you get a tetrahedron and this rounded tetrahedron is like that. So you can definitely get these, these changing graphs. So the graph structure, if you don't have an edge between variables such that they are not directly interacting, that is a valuable restriction. That is a valuable structure that the brain could use to simplify its computations. [00:55:14] Speaker B: Because there's no causal dependency between them. [00:55:17] Speaker A: There's no causal dependency. And so causal representation, like we know that neural networks are universal function approximators. Meaning that you can take any of those inferences that you want and do it in an unstructured way. Just like a big network. You just throw it at it and you train it forever. And you'll find the right answer in the end, because you can, you always can do that. But it might take huge resources, it might take a long time and a lot of data. That's another resource. And critically, it may not generalize because you're not using the right structure. So if you test it in a new situation, right. If I, if I now put, I don't know, like a yoga mat under my chair, it doesn't change the rest of the structure. But if I were just doing a universal function approximator to describe what's interacting with what, I would now need to start over. I would need a whole new circumstance. I can't leverage all the knowledge, structured knowledge that I already have. And so to me, that that graph structure of what is causally influencing what is going to become a really important inductive bias that I would say neuroscience has not really yet resolved. I threw in that term there. We'll come back. [00:56:30] Speaker B: I was about to bring it up actually, because. Yeah, I mean, it's. [00:56:32] Speaker A: Let me just quick finish the thought there. [00:56:35] Speaker B: No, no, we're going to stay on. I just want to. Yeah, go ahead. [00:56:37] Speaker A: Okay, so that structure of having direct and indirect connections is something which is manifested in one of those probabilistic codes and not the other. [00:56:51] Speaker B: Ah, okay. [00:56:52] Speaker A: And so that's bringing it back to this generative adversarial collaboration. But there are still fundamental questions. I mean, that's just my perspective that the natural parameters, which is this basically the non zero, the edges on the graph of what's connected are, are highlighted by one of the representations and not the other. [00:57:11] Speaker B: So, so in your representation, right, where you could, you could have the connection or not, you could turn on or off the. The connection. Duh. Okay. So you posit that the brain or our minds learn these graphical representations of the causal structure of the world, these inductive biases. Right. Which is interesting because it's built on top of an organic neural network that somehow then learns a network that's very meta. But. But then the other account. Does it also posit that the brain has to learn the structure? Yeah, it does. Okay. [00:57:45] Speaker A: So I think this is why some of the work in distributed distributional codes actually secretly manifests this same structure. So it's kind of representing the graph anyway. So it's secretly like a local transformation of a probabilistic population code, but globally it's the same as a probabilistic population code. This is pretty technical. We'd have to go through some math for it. But the idea of a hidden graph I think is pretty accessible. And we're trying to develop methods to discover those hidden graphs and whether information is actually flowing along those hidden graphs, not just there representing things that could be present, but what you're guessing about the world as it is right now. What you're looking at are those signals flowing along some implicit graphics in your mind. Can we find that graph? Can we find how the signals are transformed from variable, from representation to representation? And here is where I think the inductive bias becomes really critical, which is that we're imagining that the brain is good at representing probabilities. And one way that it could do that is by just living in a world where that's helpful. And this means that every time you're faced with a new problem, the right way to solve that problem, if you practice it a lot, is to use probabilistic reasoning. [00:59:14] Speaker B: Bayesian. [00:59:15] Speaker A: So Bayesian reasoning. So every time, you know, you're, you're, you're doing an auditory discrimination, you're trying to run down to catch the ice cream truck. Like all of these different things that you might be doing, you're. For every one of them, the best solution is going to weigh evidence by its probabilities and synthesize them together in this Bayesian way. [00:59:35] Speaker B: But that's super expensive. [00:59:38] Speaker A: So the question is, is there some motif that lets us do that with less cost? Is it something which is reusable? And that's, I think, a big question. I don't know that we, we have that, but if we don't, let's say that you're well trained and you end up with good Bayesian solutions for all these different problems. Does that mean that we are actually Bayesian brains, that we use Bayesian brains? Well, that might just be an emergent property of a well trained network that did not have an inductive bias that favored Bayesianism to begin with. It's just the result of the training. So if you just took a generic neural network, and in fact this was done by Orhan and Machine learning, they had an earlier version which I like the name of better. It was called the Inevitability of Probability. [01:00:29] Speaker B: Badass. [01:00:30] Speaker A: Yeah, they had a new title when it was eventually published, but they were saying, hey, let's just throw a generic neural network at these things and lo and behold, probabilistic representations emerged. But what they would not have, because they have no mechanism for this is parameter sharing, such that if it learned to do Bayesian inference for this task, that it would then automatically be better at Bayesian inference in a new task. So that element is something that is, it doesn't. Those kind of networks do not have a propensity towards probabilistic reasoning that emerges from the data, not from the inbuilt, the innate capabilities. And so that means it does not have a good inductive bias for Bayesian reasoning. And though it emerges, I would say that's not a strong case of a Bayesian brain. That's like, okay, just good training because. [01:01:23] Speaker B: It'S not normative in that sense. [01:01:25] Speaker A: It's not using that principle in lots of different cases. It has to relearn that principle every single time. [01:01:32] Speaker B: Is that like more amenable to a sampling based approach? [01:01:36] Speaker A: I think the sampling is exactly the same kind of issue. You need to be able to use samples with the right probabilities, right? Like you still need to be able to take the mechanism. So it might be like. Yes, it might be that sampling is an easier thing to have parameter shared. For example, if I would say maybe a single neuron representation is easier to have parameter sharing because you could like genetically encode it, then a population based thing might be, I don't know. And this is I think a really. [01:02:09] Speaker B: Good critical question because then it doesn't need to be emergent, it can just be hard coded. [01:02:15] Speaker A: Some elements can just be hard coded. And you could have, you could have different microcircuits that are really good at doing representations of probabilities, say a cortical column. Cortical column or different brain areas. Another way that you could do it. So here we're talking about parameter sharing over space. So you have some, this group of neurons that does something and then you copy somehow, which is non physical, you could copy its parameters to another group of neurons. And you can't do that by learning, but you could do that by development. Like they're both programmed to go down the same developmental path and then you end up with kind of good probabilistic reasoners at different locations in your brain. [01:02:57] Speaker B: Okay, so we've really gotten into the weeds about the probabilistics stuff. And I want to move on because I want to talk about your specific work and sort of your reflections on it, but just backing up, right? So there's this generative adversarial collaborative collaboration. Everyone has different perspectives on how probabilities Might be used or computed in brains to do things. But the brain is. The capacity of the functions of the processing in the brain is so vast. Couldn't you just be using all of them depending on the context? You were just saying that the two distributional codes, population codes, sort of had this nice mathematical relationship trade off. Right. Back and forth. That would be useful in different situations. [01:03:41] Speaker A: Yeah, Duality there. [01:03:42] Speaker B: Yeah, yeah, inverting. But can't it just use it all? [01:03:44] Speaker A: Sure it could. It could be that. Yeah. I mean we're looking for universals, but we may not find them. Right. It might be that different things happen at different locations. Um, what's your bet? [01:03:58] Speaker B: Do you think of the brain as a kludge? As like lots of different things just kind of working it out, but you're also a normative person. So you, you think of the brain as optimizing in a normative fashion. So where do you land on this? Like what whole brain sort of perspective? [01:04:13] Speaker A: Yeah, I mean, to be honest, I don't know yet. [01:04:16] Speaker B: Or is it both? It could be every. It's always both. [01:04:19] Speaker A: And it's principled. I mean, I guess that's generally the way that I go, is that there's going to be elements of both there. The hope is that you can find some principles that make things understandable. In a sense, the only things that are understandable are the non kludges. The only things that are understandable are the principles. In fact, this is a point that I like to make when trying to discover one of these graph structured algorithms in a data set. So if there are dynamics in the brain, computations that proceed along a graph, and if that graph was just every edge did its completely separate thing, then it's not even really very meaningful to talk about that as an algorithm. An algorithm. If the brain has an algorithm, it means it's doing the same thing in different contexts. Right. [01:05:16] Speaker B: An algorithm is a defined set of steps that need to happen to accomplish something. [01:05:20] Speaker A: Yeah. In fact, the name of my, just to give some context here, the name of my lab is the Lab for the algorithmic brain, also abbreviated as lab. Right? Yeah. So the first, actually I should say it's lab for the algorithmic brain. And then the first lab stands for lab for the algorithmic brain. And then the first lab there, it's. [01:05:41] Speaker B: Like Douglas Hofstadter would really like that. [01:05:43] Speaker A: He would love it. And the GNU UNIX people like it too. [01:05:48] Speaker B: Okay. [01:05:49] Speaker A: Because GNU stands for GNU is not Unix and the GNU there is GNU is not Unix. [01:05:56] Speaker B: Recursive. [01:05:57] Speaker A: All the way down, all the way down. So looking for those kind of structures, I feel like if we don't have a shared, repeatable series of steps, there's nothing to learn. It's just a big hack. So anything that I am going to learn is going to be from some kind of principle that is shared. You know, if every brain does something different, if every part of every brain does something different, it's going to be hard to make any kind of sense of anything. It's just. So now some people actually do believe that, and the place that they look for principles is not in the functioning of the brain per se, during, let's say, inference or operation, but rather in the learning, that there is some underlying learning rule and that you have a goal or an objective and that that's what we can understand, not the resulting emergent computations, which are just whatever happens when you learn with this data set. [01:06:58] Speaker B: Oh, it's not that that principled learning rule would result in essentially the equivalent of a shared computation. It's that it learns whatever it needs to learn. And the fundamental thing is. [01:07:10] Speaker A: Yeah, exactly. So some people kind of lean in that direction. [01:07:14] Speaker B: That's kludgy. That's the Kludgy direction. Right. [01:07:18] Speaker A: It's kludgy in the final result of what computations happen, what inferences happen. But it's not kludgy in how you get there. [01:07:26] Speaker B: It's the fundamental learning principle. It's not kludgy. [01:07:29] Speaker A: Yeah, that's the idea. But you know, there's, there's also the evolutionary history that we should account for. And I, I don't know to what degree some of that is kludgy or what degree some of that is like core principles. And in fact this, this brings me to one other major collaboration that we've just started, this collaboration, Simon's collaboration for ecological neuroscience, which is basically saying like, okay, so ecological neuroscience, let me just give a little background on that. Ecological psychology was a field founded by Gibson and Gibson in the 60s. [01:08:07] Speaker B: You included both Gibsons. That's great. [01:08:09] Speaker A: Both Gibson. [01:08:10] Speaker B: Most people just give the one. [01:08:12] Speaker A: Yeah, the husband and wife team, the husband was more focused on computations and the wife was more focused on development. But they're both critical there, like childhood development, that kind of thing. And they were arguing against the idea that the brain creates representations. So these, these Gibsonian psychologists, they're, they really don't like this idea of having, like using the word representation is often anathema to them. [01:08:44] Speaker B: It's anti. [01:08:44] Speaker A: Representational as some People say, yeah, yes, yes, exactly. I've found it's a little softer in practice than I expected when talking to Gibsonians, and softer in that they're. They will use the word representation. [01:08:58] Speaker B: Oh. [01:08:58] Speaker A: Like maybe they let it escape their lips accidentally, but. And I'm not going to name any names, but you know who you are. Um, and so the idea, the contrast would be here, you're picking up a. A coffee cup here, Right. So you look at the scene, you have a cup, it's got some edges, it's a black object, it's got a. Some rounded shape there, and it's got a darker patch in the middle. You're holding up that thing right there. Exactly. [01:09:27] Speaker B: Before I picked it up, I figured out all of the planned movements I needed to do. I had a complete mental model of what was going to happen. [01:09:35] Speaker A: Right, Excellent. Yes, perfect. And so that way of kind of putting together objects from pieces, that representation is something that the Gibsons didn't like. They thought that that was a mistake. And instead they think that we have. There's sort of two aspects. One is direct perception, which I'm not a big fan of, but the other is that we interpret the world in terms of things we can do with the world. [01:10:01] Speaker B: Affordances. [01:10:02] Speaker A: Affordances. It's a word that Gibson, J.J. gibson came up with. The word affords was already there, but not as a noun. He said, you coffee cup affords, picking up by the handle. And so he says that that is an affordance. You can grasp the cup, you can fill the cup, you can throw the cup, you can bop the top of the cup and make a boop sound. [01:10:27] Speaker B: I like that one. Yeah. I'm somewhat surprised that I'm hearing you say this, because my perspective is that you're more on the representational side because you're learning graphical models and there's a lot of structure and world models, et cetera. So is this something is ecolog psychology. And I guess you'll describe what ecological neuroscience is. It's something that you have. Are coming to appreciate or have appreciated. [01:10:56] Speaker A: So I love the idea of affordances. I think it's a powerful idea and it helps us structure things in the world in a way that is focusing on the stuff that's useful as opposed to. So this team, which we call Scene, Simon's collaboration for ecological neuroscience, we're really trying to test these ideas, ideas through neuroscience and sophisticated behavioral experiments with a whole variety of animals. We have mice and monkeys and humans and babies and bats, baby humans. And looking for different tasks where they have some information that they can't do anything about, some signals they can't do anything about. They're not affordances. They don't afford anything. They have some things that are useful, right? You can do stuff, you can act upon them. They're controllable. And then you have some things that are rewarding. So some of the things that you can do are not necessarily experimentally rewarded. They're not part of the task. But when you think about the main themes of the big theories of neuroscience and machine learning, they're in these two categories. One is reinforcement learning. So, like, you do everything that you can to get your goal, and you learn stuff insofar as it supports your goal. That's one extreme. And then on the other extreme, you make a generative model of everything. You try to describe the causes of all of your sensory observations as a compression of your world, even if it's not useful, even if it's not rewarding. And so we think kind of in between and a little off to the side is this other possibility that you don't learn everything, and you don't learn just the rewarding stuff. You learn stuff that you can do, and that's the affordances. So that becomes a compelling idea, because I think it allows you to use your limited data and your limited resources more efficiently for things that will generalize better if you just focus on what's useful. Right now, the world is constantly changing, and you're going to miss a bunch of things that could have been useful later that you didn't know about. And if you try to learn everything, it's a little bit like Sherlock Holmes saying, well, if the sun goes around the moon, the sun goes around the Earth, or the Earth goes around the sun, it makes no difference to me. So I'm going to promptly forget it. Sherlock Holmes is very practical about that because it's explaining the same data. But now the generative model would have. You'd be trying to describe whether the sun goes around the Earth or the other way around, not because it's useful for you, but just because you're trying to explain everything you can. And so you're using a lot of brain power in that model to do things that you might never use. So this gives us this affordances idea, I think, gives us an interesting potential balance between generalization to new things that you might be able to act upon that could be rewarding later, like total generalization versus. Yeah, you're not going to the generative model, which models every causal, every cause in the world. That's going to be the best generalization. But you pay for it by a lot of data that you have to accumulate. [01:14:23] Speaker B: It's less intelligent too, because you're learning a shit ton of a lot of things that you are not going to need or use. So in some sense it's less intelligent if. If intelligence is solving problems at hand. [01:14:33] Speaker A: Correct. And some ensemble of ecological problems that you actually encounter in nature. Yeah. So this becomes, I think, a useful third theme or third thread that we could explore. And we have some neuroscience experiments to test them, which are. Which are a lot of fun. Now the other element to ecological psychology was direct perception. And this basically says that we don't have steps of computation, we just directly know things that are there. [01:15:06] Speaker B: This has rubbed people the wrong way historically. Not just you. This is like the main thing that people are like, what the hell. [01:15:12] Speaker A: Yeah, exactly. One good example of this which gets back to the kludges is like Paul Cicek is. Is an ecological neuroscientist, I guess, who, who is very much in this camp, this anti representation camp. And he's written a couple beautiful papers on the evolutionary history of brains. Just, I love this. There's a couple papers that he's written that are gorgeous and synthesis wonderful syntheses. I highly recommend them. [01:15:46] Speaker B: He's spending a lot of time on that. And like in my conversations with him, he was. I said, you know, because it's sort of off the beaten path and it's a passion project for him. But yeah, it's beautiful work and I'm glad that he's doing it. [01:15:58] Speaker A: Yeah, I'm grateful as well. So he has an example of how you might find shelter as a lizard. So you're moving around in the world and you see a patch of light that is bright on top and dark below. And as you move in one direction, the dark part gets bigger and the bright part gets higher. That suggests an overhang. You're getting closer to an overhang which might be shelter. So now you can imagine direct perception where you just wire this particular visual pattern of a bright patch going up and a dark patch growing. You just wire that directly to move forward right under some context at least. And then there's like loops of loops that are feedback and regulating and all that. But that basic movement would be kind of a direct perception approach. You don't have to know, you don't have to make a model of the fact that there's a convex, a concave area there. You just do this particular. You connect this sensory input to that motor output and you're done. [01:17:02] Speaker B: That's probably why it's hard to swat. [01:17:03] Speaker A: A fly, because they're really good at those kind of things. They have very quick close reactions. Whether that describes, I mean, Gibson thinks that that describes sort of everything and it seems just false on its face. [01:17:19] Speaker B: Right? [01:17:19] Speaker A: That that happens in the brain. Right? It's just false. There was a whole article of like a brain and behavior journal that I actually took out of the library. Like, wow, last year it was, I know I was holding this book and it was like a real book here. [01:17:34] Speaker B: Did you get calluses? Did you develop calluses from the. [01:17:37] Speaker A: No, they got really sore because turning the pages was, you know, it's muscle I hadn't used in a while. [01:17:42] Speaker B: Right. [01:17:43] Speaker A: And there were a bunch of responses to, I think it was Shimon Ullman who was, who was critiquing direct perception in this way and a lot of luminaries giving their responses to it. And one of them was, I thought, pretty interesting from Geoff Hinton, which was like, Gibson couldn't possibly have meant that, that it's just there are no intermediate steps. I think instead what he was probably arguing against was more the good old fashioned AI sense of like first you extract edges and then you put them together into contours and then you do this thing and then you do that thing in a very computer sciencey way. [01:18:28] Speaker B: But had Hinton read the Gibson. I have not read the main text of Gibson. So I, I mean, yeah, I wouldn't know. [01:18:35] Speaker A: I mean, I don't know what Hinton had read, but I mean he was, he was trying to reconcile these things and I think his connectionism was along those lines. He was saying, yeah, it's just an emergent behavior of all these neurons working together and that's. That that could be perfectly consistent with this idea of non step by step algorithms that Gibson may have had in mind. Now from my perspective, it might be possible to take one of those connectionist architectures and actually interpret it as, hey look, here's some sequence information about contours and the information flow actually prioritizes along the contours and some other information is off the contours. But it's not like here's a contour neuron and here it's right, there's some balance here. But saying that these neurons don't interact with those neurons in this context, that would be structure to the computation that I suspect is still going to be. [01:19:29] Speaker B: There, but you would still find single neurons. So you mentioned Horace Barlow earlier, right? And like the, what is it? The neuron hypothesis? No, the neuron doctrine. Single neuron doctrine. Single neuron doctrine, partly responsible for, you know, you would still find neurons that correlate with the contour. Right. But it, but you wouldn't just think of it as a contour neuron. [01:19:49] Speaker A: Correct. [01:19:49] Speaker B: But you could decode like that. It probably has to do with the contour from that single neuron. But as opposed to the historical single neuron doctrine of Barlow at all, you wouldn't call that a grandmother cell, a contour cell, for example. [01:20:02] Speaker A: Yeah, it may have. Neurons will have more and less specificity. And so I'd be perfectly fine calling that kind of a contour neuron. But the real question is what is the range of generalization that, that it has? Right. If it generalizes that over a wide variety of contexts, backgrounds, contrasts, then yeah, you might say that, that, that that neuron has kind of localized some information about contours. You can always find such information, like whatever neuron, a neuron responds selectively to whatever it turn, whatever turns it on. [01:20:33] Speaker B: But I think by calling it a contour neuron, you hearken back to like, well, if you kill it, you don't see contours anymore or. That's right. No, your mental representation of a contour is due to that neuron. That's the sort of implication that people rail against. [01:20:45] Speaker A: Yeah, yeah, absolutely. I think, you know, people, even people who are thinking about the single neuron doctrine, weren't necessarily saying that it was only that neuron. [01:20:54] Speaker B: I agree. [01:20:55] Speaker A: Right, yeah. But, you know, I think that the evidence shows that these pieces of information are more widely distributed. And so, you know, I think that that finding the right basis, the right compound, we're looking for patterns and how the patterns relate to each other. And this brings up, I think, a fundamental point that people who think about representations often neglect, which is that representations are useless by themselves. They need to be connected. So you need to think about the brain in terms of representations and transformations. [01:21:31] Speaker B: What do you mean by. So I just had a panel on to talk about representations and it kind of got side railed because Jon Krakauer related everything to mental representations. So in this case you mean the structure of the neural activity, Is that what you mean? [01:21:44] Speaker A: I mean the relationship between the neural activity and the external world. And I listened to that episode that you're talking about and I talked to John. [01:21:52] Speaker B: It's just a contentious term. [01:21:54] Speaker A: It's a contentious term. I actually have another generative adversarial collaboration. [01:21:59] Speaker B: That I've been Working with what makes a representation useful. [01:22:03] Speaker A: Yes, exactly, exactly. So different people use the word representation differently and, you know, it's fine, just say what you mean and then we can go on and say, well, does that thing happen? Does this thing happen? [01:22:16] Speaker B: So in your sense, in your case, it's some relation between the neural activity and something happening in the world. [01:22:21] Speaker A: It's so. Yeah, that's. So I'm going to use the word representation right now in this joint sense of having information. So it's just the simple, typically neuroscience style. It's actually the style I started with, the meaning I started with before I was convinced otherwise. Like, if you take the view that something is a representation, if it has information about the world. [01:22:49] Speaker B: Shannon information or meaning information, any information about intentionality information or something, I think. [01:22:55] Speaker A: It doesn't really matter. [01:22:57] Speaker B: Okay, well, that's another contentious term. [01:23:02] Speaker A: Yeah, let's just use the term Shannon information for that. [01:23:05] Speaker B: Okay, sure. [01:23:07] Speaker A: Then it is not possible to have a misrepresentation. [01:23:10] Speaker B: Oh, okay, right. [01:23:12] Speaker A: And this is something that other people have made as a point before, and I didn't appreciate it, but some philosophers actually pointed it out to me and I was like, ah, okay, so that makes sense. So now you need to use that information in a way that is consistent with some rules of transformation in the world for it to be a correct representation. Right. Like if I. If I see an image on my left and I turn to my left, then I am correctly representing. Like if I want to move, if I want to be directed towards it, then I have a representation that is used appropriately. But if I see something on my left and I actually move to the right, then that's a misrepresentation. [01:23:52] Speaker B: Right. [01:23:52] Speaker A: If I'm wearing like prism glasses or something. And so representation by itself, it's just sitting there. It needs to have some function. It needs to do something. So we always want to be thinking about information that's there in the neurons about the world, but also how it gets transformed either to behavior or to other neurons. And that joint representation and transformation is what we should be studying. [01:24:20] Speaker B: Ah, okay. [01:24:21] Speaker A: Because you can have the same representation transformed in different ways, which would mean that it might be a misrepresentation. It's instead. Right. You can have one transformation applied to two different representations, and one of them would be useful and one of them would not be. So, for example, let's say the transformation is just adding things together. That is the correct thing to do for probabilistic inference. Bringing back that other idea, if you are representing log probability, and then you, you multiply probabilities by adding log probabilities. So that's a match between the representation and the transformation that is helpful for that particular thing. But if you were trying to use a different. If you were trying to add together probabilities directly in order to synthesize evidence, that would, that would be a mistake. You would not get the right answer that way. So you need to have this alignment between how things are like the patterns of neurons and how they relate to the world and how you change those patterns over time or over space to get to new formats of things that matter for the world. [01:25:34] Speaker B: And how did we get here from ecological neuros? Did we? Did we. Did you. [01:25:39] Speaker A: Because anti. Representation, right. Like, I think you asked me, like, hey, how could you. [01:25:44] Speaker B: Like, we're talking about the middle ground where there is representational structure. Yeah, but it's not completely general and it's not completely brittle. So there's not direct perception. [01:25:55] Speaker A: Those are. Yeah, it's against direct perception. But it's something like. I think that there are some cases where, you know, you find the right combination of features and it does the right thing. You can say, oh, that's direct perception. Or you could say that, oh, this is a useful computation for this particular task. Now it becomes a matter of semantics. Right. It depends on what direct means. So I don't think it's all that important to go into tremendous detail of what Gibson might have meant by direct perception. But I think we can find that there are spatiotemporal patterns in neurons that relate to spatiotemporal patterns in the world. And that the way that those patterns evolve over time and space are the computations that we do. This is computation by dynamics. And so this is the thing that couples the patterns of input, which we can call representations if we want to, the way that we use them and behave upon them, which is the thing that lets us know whether, in Krakauer's sense, we have a mental representation of predictions about the future. States of the world are reacting in a way that's consistent with where we want to go. [01:27:08] Speaker B: Mesh this with the idea, your work that we were talking about way earlier on inverse rational control, where an animal is not necessarily optimizing for the task at hand. It is optimizing for something that's suboptimal relative to the task, but it's rational in the sense that it's optimizing for something that has evolved to. To optimize for. [01:27:36] Speaker A: Yeah, you could say that if they're behaving rationally in this false world. Right. Then they're misrepresenting their sensory evidence. So they're getting one form of sensory evidence. [01:27:48] Speaker B: So that would be an error. That's what I wanted to bring it back to. Like the error in representation. You would consider that an error? [01:27:53] Speaker A: I would consider that an error in the sense of related to the external world. But it's self consistent, right? You might say, yeah, it's rational. So that's why I like to distinguish rational from optimal. [01:28:08] Speaker B: Okay. [01:28:09] Speaker A: Other people use the word rational differently. [01:28:12] Speaker B: Yeah, that's what I was going to say. Some people confound them. Right. Or they could be confounded in some people's definitions. [01:28:17] Speaker A: Yep. And so some people talk about bounded rationality, which are things where you might have some superstitions or, or some other approximations that you make, like in the sampling context. Ed Voule has this paper one and done where you basically, there are some cases where behavior is consistent with taking a single sample of a probability distribution and just acting upon that. That's rational, but under the bound that you have extreme time constraints or some other kind of resource constraints. So I think in a lot of these, we're talking, we have a shared understanding of what some words mean. When we find some conflict now we have to go through and say, all right, well, do we mean the same thing by these words? Is there a conflict? [01:29:04] Speaker B: But we don't want to be philosophers. We don't want it all to be about that. We want to move forward. [01:29:09] Speaker A: Yeah, exactly. So I think we do move forward. We assume that we know what each other is talking about until we don't. Then we try to resolve our. We try to reconcile the way we're using terms, and then maybe we don't. Somebody says, I don't want you to use that term that way. Okay, fine. Well, we can agree or not, and. But let's go on to the substance, right? [01:29:33] Speaker B: Yeah, exactly. Okay, Zach, so we've spent a lot of time already and we haven't even talked, really, except for the inverse rational control about some of your projects. Just to list some off, you know that some of, some of the recent work that you've done is, you know, you've studied how attention fluctuates, like the pattern of high and low attention based on the constraints of the task and the probabilities, etc. How we. How we control what we do. What is the phrase move? I have it in my notes here. Moving more to think less. You have work on the recurrent graphical probabilistic models, but So I don't. We can't go through all these. But, you know, is there something you want to highlight? Is there something you're most proud of that you are most joyous of because you are a person of many pursuits. So I want to leave it up to you to sort of highlight what you think is most fun or interesting. [01:30:33] Speaker A: It's like asking you to pick what's. [01:30:35] Speaker B: I know. What's your favorite color? Yeah. Your child. [01:30:37] Speaker A: Oh, I can say favorite color. My favorite color is green. [01:30:40] Speaker B: Okay, sure. I can say favorite child easily too. But I won't reveal it. No, I'm just kidding. I can't say it. [01:30:48] Speaker A: Yeah. Which is my favorite project lately. I'm really happy doing this ecological psychology neuroscience stuff. [01:30:58] Speaker B: I think that I was super surprised that you. I didn't know about that and I was super surprised. [01:31:04] Speaker A: We have zero publications on it. It just started in July. [01:31:06] Speaker B: Okay. [01:31:07] Speaker A: And it was a long, slow process of putting this together. And the team is big. It's 20 people in this team, but there are six theory teams within this group. It's very theory led. [01:31:21] Speaker B: How do I explain? Just. [01:31:24] Speaker A: All right, stay tuned, stay tuned. I think there will be opportunities for broadening these. Fair enough. It's going to be a 10 year project. [01:31:33] Speaker B: Holy cow. [01:31:34] Speaker A: Yeah. So this will take us a while. So that one's a lot of fun. We've already talked about it. I'm really excited about this dynamic graph thing that we talked about where I'll send you that shape, the little tetrahedron. In fact, if you make a graphical model of this where you have one circle for each variable and then a square for how those variables are interacting, you actually end up with a picture that looks like the flux capacitor from Back to the Future. These three things coming into the middle. And I, I am excited about using that motif to explain a whole bunch of things. I call it the statistical transistor. [01:32:21] Speaker B: Nice. [01:32:21] Speaker A: Because once you have that third variable gating, whether the other two are interacting, the. The expressive power of that structural graph becomes vastly larger. And so it explains some interesting properties in just in low level visual perception. I think it can explain a whole lot of structures in the way that we're changing cognitive patterns. Interestingly, the fundamental math equation there that you use for these three interacting variables is X times Y times Z. And remarkably, two things show up out of this. First, that is what you get in Transformers, like Transformers, the very popular machine learning architecture that underlies large language models at the moment. And this kind of multiplicative gating has Ancient histories back, like if you go back to the 60s, that's ancient in computer science sense. People were using sigma PI networks where this product is sigma is the sum and PI as the product. And so multiplying these X times Y times Z actually gives you a lot of interesting expressive power. Transformers use that in a, like a slightly specific way, but fundamentally the unit is that you're multiplying these different elements together and using that as what they call attention, which has a lot of similarities with biological attention in terms of gating things. And the other thing that I really like about this, that I find exciting is that that motif emerges naturally when you give neurons the more capabilities than traditional neural networks. So traditional neural networks, in artificial neural networks, you take one neuron, you take all of its inputs, you take a weighted sum of all those inputs, and then you pass it through a nonlinear function. That's your neuron's response. And this has been stupendously valuable. [01:34:20] Speaker B: And it gets the same one every time it passes through. As long as the weights are the same, it gets the same input. [01:34:25] Speaker A: And every other neuron is going to be the same kind of nonlinear function that just takes a different weighted sum of maybe a different set of inputs. So the weighted sum is what we call a projection of the input. If you allow it to have to learn a nonlinear function mapping inputs to outputs, but you learn it based on not just one weighted sum, but two weighted sums. Now automatically emerging generically is a product X times Y, that very operation that shows up as the critical new ingredient in transformers, this product and was something that people dabbled with in the 1960s. It emerges naturally when you give neurons the kind of power that they automatically have in biology. So in biology, you have neurons that don't just like the dendrites don't just come in and give one input to the neuron, right? It's, you have an apical dendrite, you have a basal dendrite. There's more structure there, but at least you have that. And so if you give neurons that very simple biological structure, all these things emerge. You get attention, you get this gating, you get these statistical transistors just popping out for free generically. So this to me is connecting the low level microcircuit, like a small neuron scale thing, that you could share across lots of neurons easily by just giving them apical and basal dendrites and all this stuff, all these good computational properties emerge out of that. So that, for me is really beautiful because it connects something very simple and low level to something very powerful, abstract and computational. [01:36:04] Speaker B: Something that you said, so I'll tie this in, but something you said at the Neuro AI Conference in Washington D.C. a few months back surprised me. But now you saying this, it doesn't surprise me as much in terms of what's missing. What do we need, what do we need funding for? What do we need to accomplish? And you said we need a synaptome, essentially we need to learn all the connection strengths of synapses, which is such a low level detail. And there's resistance to, you know, we don't want to go all the way down to electrons, we don't want to go down to ion channels. But you want to go down quarks. Quarks. I was avoiding saying it because I always say quarks. But you want to go down to the synapse strength level, which what you just said, being excited about some of these low level implementation processes, different, different weighted sums essentially separated into like, what are biological compartments, apical and basal dendrites. And we know it's like way more complex than that, but even just with those two subsets, and people like Matthew Larcombe do work on apical versus proximal distal. Yeah. So that makes a little bit more sense to me. Why you would want then are those two directly related? Like, is that why you're wanting that sort of synaptic? Because you see these low level ways in which the higher level, let's say the attention in this case higher level functions, you are connecting those in your pursuits. [01:37:37] Speaker A: There are connections there. But that's not what I had in mind when I said that we need the synaptome or whatever. What I really had in mind was that one of the fundamental things that our brains do is learn, but we can't measure it. [01:37:53] Speaker B: Okay. [01:37:54] Speaker A: The thing that is taken from, I mean, not, not at scale, right. That has taken us from the single neuron doctrine to the population doctrine and understood at least some elements of how patterns of neurons are related to driving behavior and understanding the world, is that we can measure them at scale. We can measure now, I mean, I think the world record is a million neurons at the same time in one animal doing one thing. And that gives us much richer understanding of what, what is there. Right. What the brain computations are doing. And so when we talked earlier about, well, maybe the only thing that we can really understand about the brain, although I, you know, I don't agree with that claim, but those people who think about that is that the only thing we can learn is the Learning rule and the objectives of the brain. We, we will not be able to see that learning rule operating in its natural context until we can see the synapses changing. I actually did a small project with somebody trying to, trying to infer the learning rule from just neural activity. In a simple case where the learning rule was really simple and it required like inferring synapses, synaptic weights from activity is really hard. In fact, in many cases it's impossible. But let's say you have causal perturbations. It's really hard to figure out what the connections are. But now you have to not just learn what the strengths are, but how they change and how that change depends on other activity. It's super hard. But if we had direct measurements of the synapses as people like Ehud Isakov are starting to study, like starting to measure, then we have a chance of understanding what the learning rule is doing at scale. The kind of rapid learning that I would love to measure is one shot learning. The first time that you know what a car is, the first time you see a convertible, what all of a sudden changes. Bobby Kasturi, who was involved in some of the earliest connectomes in the modern age here, he was suggesting that we could use bird imprinting as a model system to find really rapid, huge scale synaptic changes that cause major computational consequences. So all of that requires looking at the synapses. And that's not something I've done in the past. But to me, this is the biggest gaping hole in neuroscience is that we don't understand how learning works. All of the machines that we use for learning, they're doing gradient descent these days. That's basically what they do. And the brain doesn't do gradient descent. Maybe it approximates it. What are the approximations? What are the constraints? We don't know. And we don't know because we can't measure it yet. [01:40:52] Speaker B: But why do we need to know the low level implementation details? If we know the learning algorithms or even approximates that have the end result. [01:41:01] Speaker A: Be the same, then we could, I mean, if you can find a way that we don't need to know the low level details, that would be fine. But we do think that the machinery depends on the synapses are the things that do a lot of the learning. But I'm definitely an advocate of taking a step more abstract. [01:41:21] Speaker B: Yeah, that's why I was surprised. [01:41:23] Speaker A: Yeah, I mean, I would be very happy trying to understand that. I'm not sure I have a lot of confidence that it's going to be easy to do. But let's say we have a population of neurons that are physically connected with, with all these synapses. And then what do you do? You abstract from that, some kind of graph structured inference algorithm that's there at a more abstract level. And now at the same, like in parallel, you have the low level synaptic updates and you have the updates in that graph, that abstract graph that you extracted. [01:41:52] Speaker B: Right. [01:41:52] Speaker A: If you could just do the abstract graph updates, that would be great. Then you could understand. Maybe that's a good way of understanding the way that things work. Or maybe you need to go down to the low level synapses, I don't know. But one thing I do know is that when we measure new things, we find new insights. And so that's why I was saying this is like, I think we're right on the cusp of being able to do this and I'm sure that it's going to provide a new, a lot of new insight. [01:42:20] Speaker B: One, maybe, maybe we'll end with this thought and you can reflect on it as well. Because I have to actually go get doing some neuroscience here in a minute. But. And we'll have to have you back on because we didn't even talk about your work. I mean, we talked about a lot of the principles which undergirds a lot of all your work. And I'll point to all your papers and stuff in your lab in the show notes. But the interesting thing, so the arguments against studying the implementation level in the past. How do I want to phrase this? I want to see what you think about this. So in modern neuroscience, we have these neural networks to work with, these probabilistic graph models, these tools, theoretical tools, which we can, when we look at, let's say synaptic strength changes and measure them now we can actually relate them to these theoretical entities that have been developed, you know, in the past couple decades, like better developed. Whereas in the past, like you're sort of measuring synaptic strength with the assumption that they lead to learning and you can kind of measure it experimentally. But then there's still this huge gap. Maybe we're in an era where we're being able to actually go across levels and close those gaps a little bit better and understand how the low level implementation effects and details matter for the higher level properties. How would you reflect on what I just said? [01:43:50] Speaker A: I would say that would be a great cause for celebration. Ah, all right, yeah, absolutely. I, I think that. [01:43:56] Speaker B: Do you agree with it though? [01:43:57] Speaker A: Yeah, Yeah, I. I think that we are. I mean, I, you know, progress is moving gradually. I think we're. We're gaining more insight. There's some really fundamental things that we don't understand yet. There's some other things that we, you know, I think we understand some things, and, you know, whether that's a lot or a little is going to depend on the judgment. I was having a discussion with Conrad Kerning the other day, and, you know, he was. He was saying, we don't understand anything. And it's like, well, I think we understand some things. And then, you know, it was. Understand the brain. Do we understand the brain? Well, no. [01:44:31] Speaker B: Well, what does that mean, even? Yeah, yeah. [01:44:33] Speaker A: So I think we do understand some things about the brain. Right. I think we understand. It's not like a. It's a mystery in a huge number of ways, but it's not a total mystery anymore the way that it used to be. It's not like just a big ball of kind of tangled fat and string. And so we know that there are patterns. We know that there are synapses. The synapses change. The patterns have influences on our behavior. There are feedback loops. There's a lot that we don't understand. Symbols, we don't understand, language. We don't understand many of the dynamics. We don't understand some fundamental models like memory or, you know, a lot, most of the computations. But we. We have some hints. I would say we're. We're on our. On our way, and it's been facilitated by massive data. [01:45:20] Speaker B: Yeah. [01:45:21] Speaker A: So I think this is a great time. Like, we have such an amazing confluence of factors right now that is. Makes this a really good time. One is that we have analysis tools that have very high power from all the AI stuff, and we have incredible neurotechnology, which is giving those models something to chew on. So it's a good time to be a theorist and a good time to be partnering up with people collecting some of this epic new data. [01:45:52] Speaker B: All right, Zach. And so we carry on into the mystery of the brain and its functions with some less mysterious things along the way. So thank you for your time. I'll see you around campus, and we'll have to have you back on. I appreciate it. [01:46:06] Speaker A: This was great, Paul, thanks so much. [01:46:15] Speaker B: Brain Inspired is powered by the Transmitter, an online publication that aims to deliver useful information, insights, and tools to build bridges across neuroscience and advanced research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives written by journalists and scientists. If you value Brain inspired. Support it through Patreon to access full length episodes, join our Discord community and even influence who I invite to the podcast. Go to BrainInspired Co to learn more. The music you hear is a little slow jazzy blues performed by my friend Kyle Donovan. Thank you for your support. See you next time. Sam.

Other Episodes

Episode 0

February 12, 2025 01:39:05
Episode Cover

BI 205 Dmitri Chklovskii: Neurons Are Smarter Than You Think

Support the show to get full episodes, full archive, and join the Discord community. The Transmitter is an online publication that aims to deliver...

Listen

Episode 0

August 28, 2019 01:16:52
Episode Cover

BI 045 Raia Hadsell: Robotics and Deep RL

Support the Podcast Show notes: Raia and I discuss her work at DeepMind figuring out how to build robots using deep reinforcement learning to...

Listen

Episode 0

May 08, 2019 01:18:36
Episode Cover

BI 034 Tony Zador: How DNA and Evolution Can Inform AI

Show notes: Tony’s lab site, where there are links to his auditory decision making work and connectome work we discuss. Here are a few...

Listen