BI 202 Eli Sennesh: Divide-and-Conquer to Predict

January 03, 2025 01:38:11
BI 202 Eli Sennesh: Divide-and-Conquer to Predict
Brain Inspired
BI 202 Eli Sennesh: Divide-and-Conquer to Predict

Jan 03 2025 | 01:38:11

/

Show Notes

Support the show to get full episodes, full archive, and join the Discord community.

The Transmitter is an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advance research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives, written by journalists and scientists.

Read more about our partnership.

Sign up for the “Brain Inspired” email alerts to be notified every time a new Brain Inspired episode is released.

Eli Sennesh is a postdoc at Vanderbilt University, one of my old stomping grounds, currently in the lab of Andre Bastos. Andre’s lab focuses on understanding brain dynamics within cortical circuits, particularly how communication between brain areas is coordinated in perception, cognition, and behavior. So Eli is busy doing work along those lines, as you'll hear more about. But the original impetus for having him on his recently published proposal for how predictive coding might be implemented in brains. So in that sense, this episode builds on the last episode with Rajesh Rao, where we discussed Raj's "active predictive coding" account of predictive coding.  As a super brief refresher, predictive coding is the proposal that the brain is constantly predicting what's about the happen, then stuff happens, and the brain uses the mismatch between its predictions and the actual stuff that's happening, to learn how to make better predictions moving forward. I refer you to the previous episode for more details. So Eli's account, along with his co-authors of course, which he calls "divide-and-conquer" predictive coding, uses a probabilistic approach in an attempt to account for how brains might implement predictive coding, and you'll learn more about that in our discussion. But we also talk quite a bit about the difference between practicing theoretical and experimental neuroscience, and Eli's experience moving into the experimental side from the theoretical side.

Read the transcript.

0:00 - Intro 3:59 - Eli's worldview 17:56 - NeuroAI is hard 24:38 - Prediction errors vs surprise 55:16 - Divide and conquer 1:13:24 - Challenges 1:18:44 - How to build AI 1:25:56 - Affect 1:31:55 - Abolish the value function

View Full Transcript

Episode Transcript

[00:00:04] Speaker A: Why do things feel like stuff? Oh, why do we engage in the behaviors we behave in, you know, not why, in the normally scientific, you know, reductionist sense, what are the mechanisms once we hold the behavior fixed, but instead, if we don't hold the behavior fixed, what are you or any other organism going to choose? And why that choice instead of something else? There's sort of this problem where in neuro we are often doing paradigms or tasks that from a pure AI point of view might be considered almost trivial, but from a biological plausibility point of view, that often makes them hard. Again, I had actually been prepared for the concept that you might walk arrogantly into experimentation with some grand theory and think this is gonna totally be right. And you do your first experiment and it's totally wrong. And in fact, that happened. [00:01:17] Speaker B: This is Brain Inspired, Powered by the transmitter Good day to you. I am Paul this is Brain Inspired podcast. As you just heard, Eli Senesh is a postdoc at Vanderbilt University. One of my old stomping grounds. Eli is currently in the lab of Andre Bastos. Andre's lab focuses on understanding brain dynamics within cortical circuits, particularly how communication between brain areas is coordinated in things like perception and cognition and behavior. So Eli is busy doing work along those lines these days, as you'll hear more about in a moment. But the original impetus for having him on this podcast is his recently published proposal for how predictive coding might be implemented in brains. So in that sense, this episode builds on the last episode with Rajesh Rao, where we discussed Raj's active predictive coding account of predictive coding. I've said predictive coding much multiple times now. So as a super brief refresher, predictive coding is the proposal that the brain is constantly predicting what's about to happen, then stuff happens, and the brain uses the mismatch between its predictions and the actual stuff that's happening to then learn how to make better predictions moving forward. So I refer you to the previous episode for more gruesome details about that process. Eli's account of predictive coding and how it might be implemented in brains along with this co authors of course they call it, quote, unquote, divide and conquer predictive coding, and you'll hear why in our discussion. The divide and conquer approach, among other things, uses a probabilistic approach to account for how predictive coding might be implemented in brains. But we also talk quite a bit about the difference between practicing theoretical and experimental neuroscience and Eli's experience moving into the experimental side from the theoretical side, which well, you'll hear. It turns out everything has its own challenges, let's say. All right, show notes for this episode are BrainInspired Co podcast 202. As always, thank you for being here. Thank you for listening. Thank you to the Transmitter for your support of this podcast and thank you to the patrons who also reach out and support. Here's Eli. So, Eli, are you ready? [00:03:59] Speaker A: Yes. [00:04:02] Speaker B: We were just chatting about how you are just a few floors up from where I did my postdoc. And this is your first postdoc, right? [00:04:09] Speaker A: Yes. [00:04:10] Speaker B: Yeah. In Nashville, Tennessee. And I'm curious. So this is kind of in some sense a follow up episode because I just had Rajesh Rao on to talk about his active predictive coding work, which updates the original predictive coding framework from 1999 that focused all on sensory. Right. And so what he did here was like basically bring in an action part of the story into the predictive coding. [00:04:42] Speaker A: It's very lucky timing. We actually just read his APC paper in Journal Club. [00:04:47] Speaker B: Oh, really? [00:04:48] Speaker A: Two days ago. [00:04:49] Speaker B: Oh, we did it a few weeks ago. Yeah, it was helpful. And so we'll get to your related work. Compare, contrast, et cetera. But first, I kind of. I know you have a computer science background, and I'm trying to understand your worldview. People ask me what my worldview, and I can't describe it because it assumes I have a worldview, but how you approach the world, because I know you have that background in computer science. And so there's kind of a computational, I don't mean dry in a bad way, but a very computational kind of algorithm centric approach that I thought, well, maybe that's kind of where he's coming from. But then I know you did some work with Lisa Feldman Barrett. It's all about feelings and how that drives so much of what, how we interact with the world. So I'm just curious what your kind of worldview in the neurosciences is. Oh, man, he's getting comfortable. [00:05:49] Speaker A: Yes. Okay. So, I mean, there is actually. I don't know if I have a worldview, but I sort of have a direction and vibe. [00:05:59] Speaker B: Okay. Oh, I like that. That's a good way to phrase it. [00:06:01] Speaker A: Like, I feel things out for what I think could be a workable scientific approach to try and address, you know, the questions I'm sort of interested in. And overall I feel like, you know, the question I'm interested in is, like, this is going to sound even sillier than saying consciousness, I'm sorry to say, but why do things feel like stuff. Oh, why do we engage in the behaviors we behave in? But why in the, you know, not why in the normally scientific, you know, reductionist sense, what are the mechanisms once we hold the behavior fixed, but instead, if we don't hold the behavior fixed, what are you going, you know, what are you or any other organism going to choose? And why that choice instead of something else? [00:06:55] Speaker B: What does that mean? Hold the. Hold the behavior fixed. [00:06:58] Speaker A: Oh, okay. So now I just get to channel Lisa straight up. You know, so often in neuroscience experiments, and I'm thinking, particularly actually, you know, some of the animal experiments we do here at Vandy, we basically, you know, head fix your animal or like, first you chair the animal head fix, then you train them to fixate on a dot on screen. Monkeys are much smarter than mice, so they can do this in exchange for juice, Right? [00:07:30] Speaker B: Yeah. You just described my entire academic career there. [00:07:35] Speaker A: Yeah, an increasing portion of mine. Right, right. And you know, then you basically have them move their eyes as the only motor output of whatever you're having them do. [00:07:50] Speaker B: So the most highly constrained lab experimental setup as you can. So that when you're asking the question, does, for example, frontal eye field do encode some sort of decision process related to the behavior, you don't have to worry about all of the other conflating factors that are involved in the other behaviors. [00:08:12] Speaker A: Yes. And so, you know, I'm very, I'm a strong believer that experiment is theory laden. And this means that if you're doing one of these highly constrained experiments and you have a theory about what frontal eye field is doing, great, you've controlled everything else so that you can test your theory about frontal eye field. Now back to channeling Lisa. You know, if what you're trying to investigate is not something that you can leave, you know, something that you've left unconstrained in your setup, then you can't actually test theories about it. So I would say, for instance, you know, like, can you use a head fixed monkey in a chair to strongly test, you know, these physiological theories about allostasis, interoception, these other nice keywords that I wrote about with like Lisa and Karen. Questionable. That is really how I ended up. Gosh, there's a whole story actually of how I ended up working with Lisa and Karen. [00:09:18] Speaker B: Yeah, let's hear it. If you're willing to divulge. [00:09:20] Speaker A: Yeah. So the short version is the stars aligned in a way that they never have before. Since obviously, doesn't every academic have that story. [00:09:33] Speaker B: That's right. [00:09:34] Speaker A: But you know, really it's that I had this computer science background, and then late in my master's, actually, I started getting interested in, like, cognitive science, sort of very like Brendan Lake, Josh Tenenbaum type of stuff. And so I had to, like, spend a couple years studying things to sort of go back and try and give myself the background to engage with any of this, try to change direction. And I discovered that, like, what I was really interested in was sort of the feeling and the why. And so I started trying to figure out, well, okay, who has an approach that is kind of like this to these subjects where the kind of, like this is sort of, you know, very probabilistic. You know, they were using, like, probabilistic programs to model concept learning, and this was all working very nicely for them. You know, they had that science paper in 2015. I was so impressed. You know, I started reading neuroscience and. [00:10:45] Speaker B: And you decided to continue? [00:10:49] Speaker A: Oh, no, it just gets so much worse. I'm kind of embarrassing myself here. You could probably guess now, if someone's looking for probabilistic approaches in neuroscience, mid to late 2010s, who are they running into? [00:11:04] Speaker B: Well, not many people, but you've already named some of them. But I was about to ask you, why probabilistic? Maybe we could start there. [00:11:12] Speaker A: Because I was just. I mean, at the time, I was an amateur, and I was just sort of vibing and trying to go, you know, trying to go from one thing that I felt like I could kind of understand to another thing I felt like I could kind of understand. And at least partly at the beginning. The interesting part about the Tenenbaum and Lake kind of work to me was, hey, unlike that old field of AI that I took a course in in. [00:11:40] Speaker B: Undergrad and hated it, this is symbolic style. Old field or connectionism? Old field. [00:11:47] Speaker A: Oh, I mean, I went to undergrad at UMass Amherst. So the AI class was symbolic search heuristics, all of. [00:11:56] Speaker B: Yeah, logic. [00:11:57] Speaker A: Then the machine learning classes that I didn't take at the time were like, random forests, SVMs. You know, I think there was some neural networks, but they've hired a lot more people doing neural networks since then, you know, and like, to be. Oh, RL RL was absolutely huge at UMass Amherst to the point that they hosted the RL conference, you know, in Amherst this past year. And, like, I eventually realized, like, oh, those big guys, Sutton and Bardo. Wait, Bardo, Like Andy Bardo, who would just walk through the hall? Yeah, that Andy Bardo. [00:12:33] Speaker B: That's an. Isn't that A super interesting thing about academia when you meet, I don't know, you know, maybe hero is the wrong word, but these sort of godheads of classic things and then, oh, they're just regular folks. [00:12:46] Speaker A: I mean, that's why I say this is so embarrassing. Is like, I actually went to school in a department full of such great people. [00:12:54] Speaker B: Well, when you don't know, you don't know. [00:12:56] Speaker A: Right? And I just, like, I was honestly kind of dismissive about it because I was like, okay, this is all just, you know, heuristic search. Like, it's heuristic search. You throw a lot of, you know, processing power at it, and maybe sometimes it kind of works. But, like, this is really, like, this is not actually how a brain or a mind would work. Like, you know, this isn't the real thing. And then when I started reading those, like, Tenenbaum and Lake things, you know, they were saying, well, we fit to behavior. You know, we've done an actual experiment and checked. So we're not just defining some toy task that we can then solve computationally with reasonable ease and then go back and forth between approximations and heuristics, you know, for the rest of our careers until an AI winter hits and wipes us out. Gosh, maybe I did take something from UMass Amherst, actually. Like, maybe I took some residual post trauma from the AI. [00:14:02] Speaker B: Yeah. [00:14:02] Speaker A: But, yeah, they were. They were fitting behavior and actually fitting a wide variety or a reasonable variety of experimental tasks with human participants. And I said, okay, now there's something here. Like, now there's a real world to compare against. [00:14:21] Speaker B: So then how did that take you to Lisa? [00:14:25] Speaker A: Oh, sorry. Yes. So I was trying to prompt you for the name Friston. [00:14:32] Speaker B: Sure. Carl Friston. Just to be. [00:14:36] Speaker A: Yes. Yeah. So actually, via. See, I was working with this embedded electronics company. I still have the hoodie over there. And, like, they had an MIT postdoc. He mentioned some of the Carl Friston stuff around the same time that Andy Clark's book came out. [00:14:59] Speaker B: So that's surfing Uncertainty. Is that the. Yeah, that one. [00:15:03] Speaker A: And that one had a lot of citations to people that I already, you know, names I already recognized. So I read it. I went absolutely wild for it. And he was sort of mentioning in the book, like, you know, there's some people who are actually applying this approach to emotion. [00:15:20] Speaker B: I see. [00:15:21] Speaker A: And even better, the people who were applying this approach to emotion, you know, Lisa and Karen, or at least Lisa and Karen locally to me. Right. I was in Boston at the time, you know, had a Collaborator in this big interdisciplinary group that they had tried to form and maintain with varying success. It really shown for a while, and I think the pandemic might have done it in a little bit, but, you know, they had a collaborator, Jan Willem van der Maint, who actually did the computational side probabilistic programming. So of all damn things, I wrote a cold email knowing no better way to go about this. [00:16:09] Speaker B: Sure. Yeah. [00:16:10] Speaker A: And they actually answered. [00:16:14] Speaker B: They were probably fairly thirsty for someone interested in it because it's still not that widespread, Right? [00:16:23] Speaker A: Yeah. I mean, as far as I know, like, none of this is widespread. If you take the Friston stuff too seriously, people say you're in a cult. Oh, I actually didn't join the cult until later when I met Maxwell Ramstad. He's, like, eventually convinced me of a lot of free energy stuff. [00:16:44] Speaker B: Wait, so Steve. Okay, so, yeah, so Carl is famous for the free energy principle, and he considers it a framework, not a theory, by the way. I mean, people, when pressed at least a couple years ago, he considers it a framework for thinking about the overall function of the brain instead of a theory, for what it's worth. And it has a lot of detractors and a lot of cheerleaders. And so you drank the Kool Aid? [00:17:12] Speaker A: Eventually, I sipped the Kool Aid. Didn't go all in because by the time I was being given it to sip, I had sort of been around Lisa and Karen enough that I had really absorbed, like, nope, you got to have your evo devo, your neuroanatomy. [00:17:29] Speaker B: Right. [00:17:29] Speaker A: Your mapping onto actual biology. It's the biology that really, really, really counts. [00:17:37] Speaker B: Right. [00:17:37] Speaker A: Okay, so I drunk my advisor's Kool Aid instead of the cult's Kool Aid. [00:17:42] Speaker B: Okay, good for, you know, good for them, good for your advisors. But some people with your background would then, instead of embracing the biological neural plausibility, would go the other direction. Right. Back to where it feels safer. I mean, I'll just say, like, right before, you know, we spoke for two minutes before I press, you were talking about how neuro AI is hard. Is it the neuro part that's hard? [00:18:12] Speaker A: So I'm. I don't want to make, like, a public announcement, right? But, like, I mean, it's not like. Okay, you know what? No one cares. No one is ever thinking about you, right? You're on camera, you're in front of an audience, but no one's ever thinking about you. Sure. Okay. I think I might well end up heading in sort of the neuroai direction. I don't want to say as just like a career, like just for the money or you know, for a career prospects thing, but what I have noticed is that a lot of computational neuroscientists are sort of renaming their work that now. [00:18:55] Speaker B: Oh yeah, it's a big. I just, I just got back from a Brain Initiative workshop called Neuro AI. I just got back from a Norway workshop called Neuro AI and that term is really being embraced because it sounds cool, I think mostly. [00:19:12] Speaker A: Yeah. And I've sort of got this impression that like the real difference between one thing and another is basically what was your training and what department are you looking for a job in? And the number of departments. I think the number of departments that would do my completely ideal thing is null, you know, and I'm sure most people end up saying that well before they go on the job market, but. And I'm not going on the job market right now, so luckily, lucky me. [00:19:40] Speaker B: But what would that be? Do you. Can you describe what that would be. [00:19:46] Speaker A: Ideally, like really, really question based or question driven science? Something close to cognitive science. In my PhD, I used to make up a fantasy field, computational affective science. [00:20:00] Speaker B: Okay. [00:20:01] Speaker A: By cog, you know, by analogy to computational cognitive science. Now, computational cognitive science is already a fairly small subfield that often overlaps, you know, into the computer science departments because that's who will give some of them jobs. And the number of cognitive science departments at universities that do like the full six discipline hexagonal multi handshake thing is a handful. [00:20:31] Speaker B: Or less. Yeah. [00:20:32] Speaker A: Or less. Like there's psychology departments who want you to do psychology experiments. There's neuroscience departments who often want you to do neuroscience, either theory or experiment. But they're defining the discipline often quite narrowly. Like I had a culture shock when I came to Vanderbilt and found out that what they mean by computational modeling or theory is basically like biophysical or bust. [00:21:02] Speaker B: Well, that depends on who you're talking with. Right. Because you have people like Gordon Logan there also who. That's. I'm not sure if you run around, run past him much, how active he even is still. [00:21:13] Speaker A: Well, I don't run into him, but yeah, you know, at least I'm talking, let's say about my lab and a couple other labs that I interact with. [00:21:22] Speaker B: Yeah, yeah, yeah. [00:21:23] Speaker A: Like there's a real emphasis on, you know, be biophysical or don't do anything at all or be biophysical or give up theory and become an experimenter. [00:21:35] Speaker B: So how. Okay, so then where do you sit in relation to that push? Right. I'm trying to suss out your, like, your level of abstraction and what you think is important. [00:21:46] Speaker A: So my level of abstraction is that when I reached the end of my PhD, I said, okay, I formally did my PhD in a computer science department. If I'm ever going to really investigate questions, I need to go get experimental training. [00:22:02] Speaker B: Yeah, I think you told me this a while back. [00:22:04] Speaker A: Yeah, yeah, yeah. So, you know, I basically said, all right, I'm going to go get as hardcore a postdoc as I can. [00:22:12] Speaker B: And that was the biggest mistake you've made. No, just kidding. But. But is that what, is that why you're saying like that the difficulty of neuro AI is, is the joining of the two, kind of like that experimental and computational approach? [00:22:27] Speaker A: Yeah. Like it's not a mistake to go and get experimental experience, but it is a culture shock. It took me about six months to really be able to make progress on absolutely anything on the experimental side. [00:22:43] Speaker B: Why is that? Why is that? So I just, I mean, I know these things and people who do experimental work, we all kind of cry together, you know, and talk about how hard everything is and, you know. But it, but it. [00:22:55] Speaker A: Yeah, no, in my case there, you know, I'd rather not talk about it. It's sort of private to the lab stuff. [00:23:02] Speaker B: Sure. Okay. [00:23:04] Speaker A: You know, I don't. [00:23:05] Speaker B: But suffice it to say that you run into way more problems than you would imagine you might. Would that be a summary of it? [00:23:13] Speaker A: Yes. [00:23:13] Speaker B: Yeah. [00:23:15] Speaker A: Yes. Way, way, way more. And the thing is, I had been prepared for the. I had actually been prepared for the concept that you might walk arrogantly into experimentation with some grand theory and think this is going to totally be right and you do your first experiment and it's totally wrong. Complete null result. And in fact, that happened. But I was prepared for that. The part that I was much less prepared for is how do I even connect a theory to an experiment? So the part that I wasn't, you know, null results were sort of a thing that I like, steeled myself. You know, work it out on, you know, work it out exercising. Basically just try to sweat until you can't be frustrated anymore that your theory is wrong. Oh, well, the theory is wrong even while you're submitting a theory paper about it. [00:24:20] Speaker B: But see, that is, in the Popperian sense, that is the best kind of progress. Right. Because it's an answer. [00:24:29] Speaker A: It's an answer. Though I hate to say it, but now that I'm looking at another way of analyzing the data, it might get more complicated again. [00:24:38] Speaker B: Sure. Yeah. [00:24:40] Speaker A: So let me tell you about the actual experiment that we have in both mice and macaques. You know, we have this thing called the GLOW paradigm. Global local oddball. So, you know, first you give three identical stimuli per trial, aaa. This used to be done in auditory. Now we're doing it in, you know, we've been doing it in visual. And then, you know, the local oddball is. That fourth stimulus is B. It's something different. Well, okay, what the heck is a global oddball? You know, in our manuscripts, we describe it as more complex oddballs. Well, a global oddball is where we set up the expectation for the animal. Right. We try to intervene on the internal model and make it think there's a B coming, but then we give it an A. So let's say what we end up doing is testing. These are intermixed for the animal about 80, 20. So 80% local, 20% global. [00:25:49] Speaker B: So you're really setting up the expectation. [00:25:52] Speaker A: Yes, actually, there's like, days and days of habituation followed by 50 trials of pure local oddball at the beginning of recording. So that we're basically habituating and queuing the expectations as powerfully as we can. And so what we're trying to do is disentangle what happens if you have a predictable change versus an unpredictable repetition. And the idea is, from a neurophysiologist's point of view, is that then at the end, you're going to have a bunch of controls. Those come after the main block. So after the main block, we record a series of essentially control sequences that are going to allow us to do statistical contrasts. And the idea is to then eventually say, all right, well, if you can figure out if you can control for every other mechanism you can think of. So adaptation of the sensory neurons in V1, you know, like, this is where. [00:26:58] Speaker B: It starts to get really messy and hard also. [00:27:02] Speaker A: Well, not just messy and hard, but like, if you can control for everything you think of, and there's still some difference between global oddball, AAA, unexpected A, and just pure repetition or adaptation. Aaaa. Then, ah, now you found a signature of surprise processing. And for a long time, I have just been staring at this experimental setup going, how is that surprise processing? Or like, what theory have we articulated about predictive coding in the Rowan Ballard sense that says this is surprise processing, you know, rather than. I mean, you know, who says the brain is tuned to look at angled gratings, moving angled gratings on a screen that flash on and off? [00:27:53] Speaker B: Well, you. Okay, so in other words, you can't control for everything or it's not just. [00:28:01] Speaker A: That you can't control for everything, it's that, as I said, I believe, you know, experiments is theory laden. And if your theory is about the brain, you know, predicting the continuous stream of sensory input, then flashing a series of angled gratings that are optimized essentially to drive, you know, V1 to a maximum degree. Well, under predictive coding theory, that's saying you're trying to drive, you're trying to optimize prediction error. [00:28:34] Speaker B: Right. [00:28:35] Speaker A: So how do we expect to simultaneously optimize prediction error while also provoking another. [00:28:43] Speaker B: Kind of prediction error, that being a surprise? [00:28:47] Speaker A: Yeah. Or like. Well, that's the thing, our setup, you know, conflates like prediction error, surprise, you know, visual change. [00:28:58] Speaker B: Yeah, right. Because you're using that oddball. There's a visual difference in the oddball that you're using. [00:29:03] Speaker A: Oh, and I didn't get, I should have said, actually this is pretty much the standard paradigm, as it turns out, for studying predictive coding and goes back to about.09. [00:29:14] Speaker B: You said that surprise and prediction errors are often conflated. So what is the difference then between surprise and prediction error? Theoretically, perhaps, maybe if. [00:29:25] Speaker A: Yeah, So I would say you need to commit yourself to a theory in order for there to be a difference. But then the problem is if you're trying to test a particular theory, you should use the definitions from within that theory. Okay, so prediction errors within predictive coding theory, you know, they're the residual when you subtract the prediction from the data. [00:29:51] Speaker B: Yeah. What the organism expects top down signals, then it gets some observational data, bottom up signals, and then there's a difference between, in the mismatch between the prediction and the actual observed data. And that's what gets passed forward. [00:30:09] Speaker A: Exactly. And I guess I would. Well, okay, so how to relate that to surprise? You know, I would reach for my information theoretic definition because I'm a quant person. [00:30:25] Speaker B: Okay. [00:30:25] Speaker A: And say, okay, well surprisal is the negative log probability of the stimulus, you know, and essentially those would be two different quantities. You know, when I eventually wrote my own like computational modeling paper, prediction error was the gradient of surprise. So they're related but distinct. And you sort of have to use math to talk about how. But you know, I'm, I'm, I guess I'm trying to just describe the culture shock of going from, you know, sort of this environment that was, wasn't oil and water, you know, we mixed. But like there was a very quantitative side that I worked on and a very biological side. And then, you know, I come to this, like, glow paradigm, this experiment, and I find that, oh, the quantitative side is just removed out from under me. I have to reconstruct it entirely myself. [00:31:33] Speaker B: So that's what you were getting at when you were talking about how we'll call it neuro AI is hard. [00:31:42] Speaker A: Yeah. Actually taking. There's sort of this problem where in neuro we are often doing paradigms or tasks that from a pure AI point of view might be considered almost trivial. [00:31:57] Speaker B: Yeah. [00:31:58] Speaker A: But from a biological plausibility point of view, that often makes them hard again. And then if you're actually trying to explain neuronal data, or worse, trying to map some real theory of the brain onto neuronal data rather than just suggest that there could exist some mechanism explaining this behavior because there's been multiple computational models of the same behaviors. I'm sort of thinking of the famous drift diffusion models of decision making. How do you know if the brain is doing a drift diffusion, you know, accumulate evidence to a threshold and then decide algorithm for decision making or, you know, resource constrained reinforcement learning algorithm for decision making. There are experiments that have been fit with both these kinds of models. [00:33:02] Speaker B: Yeah, that's right. [00:33:04] Speaker A: How do you know? Massive, massive shock for me that there's just like, oh, wait, is everyone just pretending? [00:33:12] Speaker B: What do you mean pretending? Pretending that what they're doing is valid and what everyone else is doing is not, or what? [00:33:18] Speaker A: Well, pretending that like just taking data and oh, you know, fitting it such that you can claim to use your theory to explain behavior, but you haven't actually tested it against substantive alternative theories rather than some kind of null hypothesis. Like, what the heck is our null hypothesis regarding behavior in the brain? [00:33:43] Speaker B: Or alternative hypotheses? It doesn't even have to be null, just a clear alternative. [00:33:49] Speaker A: Yeah, yeah. [00:33:52] Speaker B: This is something that actually, that Jeff Schall, I'll just elevate him in this regard. Like every year when I was a postdoc, there's a fundamental set of papers, one of which is like the method of alternative hypotheses where we tried to base, I think, because of these things, because it's hard. Like you mentioned drift diffusion and I was doing drift diffusion work, essentially stochastic accumulator work, which is exactly what you're saying. Does the neuron like ramp up to some threshold and then that actuates the behavior? And that's one of the things that Jeff Shaw is famous for. And so, you know, the idea is to look in the brain and test it and ask it right through recordings. And of course it's not super clean because we're dealing with different kinds of stimuli in this very controlled environment. The frontal eye field. As we know now, any given brain area doesn't just have a single function. Right. So there's, you know, mixed selectivity in brain areas where they're doing overlapping populations of neurons or doing overlapping functions, things. But anyway. [00:35:07] Speaker A: Oh yeah. I mean, any talk of like, frankly, any talk of selectivity slightly makes me want to scream. And I've just been reaculturating myself to an environment where like the word degeneracy and you know, to an environment where these things are not the assumptions anymore. [00:35:25] Speaker B: Wait, where degeneracy is not an assumption. [00:35:28] Speaker A: Where degeneracy isn't the assumption. You know, top down influences often aren't the assumption. Like it's a very. And I'm not saying this as a negative thing in a certain way. I like it, even though I don't think I can make a career out of it. Like very. Andy Clark, quoting Quine, had this thing about desert landscapes. Like a neurophysiologist point of view is a very desert landscape point of view. There's the things I can measure. Nothing else. Nothing else exists. I'll talk about selectivity because I think I can measure it. And if you tell me that that's actually caused by what I do rather than an observation of a causally independent system, then I will get in an argument with you because I think I'm measuring something real. [00:36:22] Speaker B: I see. So what you're describing, it's interesting that you find yourself in that world now because in some sense that's kind of the old school world, which is still very much alive and thriving, whereas there's been this recent push into much more naturalistic types of tasks and removing the constraints from the lab, you know, the lab based experimental stuff. And that's hard in a very different way. [00:36:53] Speaker A: So let me make some applause or give some applause to Andre here. Right. I think he doesn't do that kind of experiment yet because he's actually pushing something that's already very risky and innovative. He calls it Madeleine, multi area, high density laminar electrophysiology. [00:37:14] Speaker B: Okay. [00:37:15] Speaker A: Which basically amounts to saying, you know, let's have like not just one neuropixels probe in one area, let's just cover the brain in neuropixels probes. [00:37:26] Speaker B: Yeah. So neuropixels probes are like these really high density multi electrode probes. [00:37:31] Speaker A: Yep. [00:37:31] Speaker B: So that when you put them in any given area of the brain, you're getting recordings of hundreds to sometimes thousands of neurons. [00:37:39] Speaker A: Exactly, yeah. You know, and all of our work includes the lfp, the local field potential, as well as, you know, the individual spiking signals, you know, and then we analyze both together, which, you know, I won't say who, but like someone I really, really respect a lot, I went and visited their lab, actually one of my scientific heroes, you know, I went and visited their lab at one point. [00:38:09] Speaker B: Can't say who. You can't say who? [00:38:10] Speaker A: Yeah. I'm realizing I can't even specify this little. Well, the point being, at one point I asked, you know, do you analyze the LFPs? And they said, no, we just look at the spik, you know, and I think, you know, respect to Andre, like, he's. I didn't talk about it before because, like, it's not as native a part of my worldview. It's what I'm learning. But, you know, this is actually a very ambitious thing, you know, even for a simple experiment. We'll have like two full neuropixels probes taking, you know, multi unit activity, individual spikes that we sort with killisort, you know, then lfp, which is sort of. [00:38:52] Speaker B: LFP is what people talk about as measuring when people use the term oscillations. [00:38:59] Speaker A: Sorry, yeah, no, I was saying population level signal. [00:39:03] Speaker B: Oh, there's that too. Yeah, but, but it's a different. It's a kind of a complementary signal. The other thing is spikes are definitely the outputs of neurons, whereas LFP is thought to more. More closely track that population level input. [00:39:17] Speaker A: Yeah, so then we also, you know, and then we like analyze both. So we're often doing, you know, cross correlation or coherence measures of like LFP to spike. [00:39:28] Speaker B: Yeah. [00:39:29] Speaker A: And this actually tells you quite a lot. And it's, you know, it's difficult, it's ambitious. And my understanding is that it's also not easy to get grants in. Like, I think Andre won his NSF career just this year and that was the first grant that the lab had gotten in, I think possibly three years of operation for joint. [00:39:55] Speaker B: Specifically for joint spike LFP analysis for. [00:39:58] Speaker A: Madeleine as a whole. [00:40:00] Speaker B: Yeah. Okay. [00:40:01] Speaker A: For like this research program of, you know, let's measure in multiple areas, let's measure the LFPs and the spikes. Let's try to capture as much as we can, so to speak, as many times as we can. Let's really try to push the limits on how dense the sampling can be in electrophysiology because of, you know, essentially the resolution issues with imaging or EEG that you would not want to use those, you would want to use electrophysiology. [00:40:32] Speaker B: Yeah. Okay, so backing up here. So I was just at this brain initiative workshop and it was brought up multiple times. So the idea was to think in terms of like, well, what would we need in 10 years? What's an ambitious goal for 10 years in NeuroAI? And two people, one person suggested this and then it was echoed by another person that what we need is to be able to record synaptic strengths. So you know, for example, neural networks, the strength between the units is where all the parameters are. That's those billions and billions of parameters that in these large language models, etc. Those are what get changed that, that strength between in the connections. And if there was just a way for us to measure that in the brain, then that's an ambitious goal and it's a worthwhile goal. But my immediate thought was, you know, there's that age old question like what would you do if you could measure all of the spiking from all of the neurons? Would you even know what to do with it? And no, the question is no, we don't because it goes back to the theory ladenness like you have to have, you have to come from some sort of framework or theory to then ask questions of that data. So just collecting the data is not going to get you there, right? [00:41:51] Speaker A: Yes. And I think that's where I'm just going to put my cards on the table and say, I think that's an open challenge for the field and I'm happy to be working on it. [00:42:02] Speaker B: What, what is the open challenge? Sorry? [00:42:04] Speaker A: To. To figure out how the heck you analyze your data in a properly, you know, theory driven or question driven way rather than just. I don't want to say this like it's too bad of a thing. Right. But rather than just running statistics and then saying I found an effect like. [00:42:25] Speaker B: Well that's, that's interesting because that's kind of what the AI side does in NeuroAI. It's like throwing a bunch of statistics at the data. And even Terry Sinowski brought this up at the workshop, like what principles have we learned? What principles are there to gain from this approach. Right? Yes. [00:42:46] Speaker A: Here's where I would sort of reach back into my training with Jan Willem as a probabilistic programmer and say, for God's sakes, we need to be writing down generative models, fitting them to data and then doing model comparison. We need to actually have some measure of how well does something fit the data, what theory motivates it, and then compare them in a principled way. And I think that machine learning can actually help with that. And I've seen a lot of very, very productive and a flurry of new work essentially in just analyzing neural data. But then you also have to convince, here's the hard part, those things can get published in machine learning conferences. And then you have to both teach the experimenters to use them and convince them to use them and teach it to them in such a way that they don't need you as a statistician or machine learner to actually, you know, stand over their shoulder telling them how to encode every little hypothesis because you want them to use it a dozen different times. And they can't just keep you around forever as some kind of consulting machine learner. [00:44:04] Speaker B: Right. Well, you know, actually, so I'm going to. It's not name dropping because I wasn't like talking with him, but I remember Jeff Hawkins years ago at giving a keynote, I think at the annual Society for Neuroscience lecture. And I'm sure he's made this point over and over again. You know, the traditional physics approach is you have your theorists and you have your experimentalists and they're sort of happy to play together, and that's not the case necessarily in neuroscience. And that we need to get to a point where the experimentalists are happy gathering the data to feed to the theorists who then can analyze it. But that sounds awful to me too. [00:44:43] Speaker A: Right? I mean, I. So I will actually say I would much rather that experimentalists be capable and happy of analyzing their own, with analyzing their own data. And the reason is that, you know, if I say I'm going to be a theorist or a computationalist, then, you know, data analysis is something that pays the bills. Perhaps it's something that can help get a routine number of papers out the door for a machine learning person. I am actually thinking of someone, Scott Linderman over at Stanford. You'll notice that a lot of his papers are basically just machine learning based data analyses for neural data. And that's great. That's the thing that can build a career. Now personally, is that what I would want to think about as a theorist? How do we analyze data? No, no, like, you know, that is not the thing that I have, you know, a secret manuscript that I've been trying to finish for a year. You know, the thing where I have a secret manuscript that I've been trying to finish for a year is, you Know, how do we explain emotion in a quantitative way? Or affect core affect, valence and arousal in a quantitative way? By going all the way back to, you know, the urbilaterian and then picking C. Elegans as a model organism. [00:46:16] Speaker B: Yeah, good luck with that. [00:46:18] Speaker A: Exact. Yes, see, exactly like, good luck with that. [00:46:23] Speaker B: But people like you mentioned, Scott Linderman, and so he develops a lot of tools that are being used in these naturalistic kinds of tasks. Right. And that skill set seems to be what is really valuable in the academic marketplace, at least these days. Do you think I have that right? [00:46:45] Speaker A: Yes. Yeah. So I'm going to use myself as an example instead of him because I know myself better. Right. And I don't think I could speak for the narrative arc of his career, but I know that when I started my PhD, the starter project that I got put on was here's a new way of analyzing FMRI data in a little bit more theory driven way. And it worked. [00:47:14] Speaker B: Sorry, but what was the. Oh, you just needed to employ that method. [00:47:18] Speaker A: No, I mean, it wasn't just, oh, there was some method and we employed it. We were building something new because, you know, our collaborator on the psychology side had some data and he wanted to analyze it and the standard ways of analyzing it were inadequate to the theoretical question he wanted to ask. [00:47:37] Speaker B: Yeah. [00:47:37] Speaker A: So he wanted us to build something new. We built it, we published it. You know, that gets citations. There was a follow up. You know, I think there's now follow ups to the follow up, like by other groups. [00:47:51] Speaker B: Right. [00:47:52] Speaker A: You know that, like that stuff is. This is going to sound horrible, but I don't mean it in a bad way. That stuff is good commodity science, but it's also necessary. [00:48:06] Speaker B: I can, I can make it sound even better. [00:48:08] Speaker A: Yeah, it's like the, it's the Toyota of science. Right. Like, I drive a Toyota. I only bought a car this past year, but I drive a Toyota. Because you know what? It's practical. [00:48:20] Speaker B: Yeah, yeah. [00:48:23] Speaker A: You know, that is very practical science that you can reliably like, never run out of new reasons to do more of it and therefore never run out of publications. [00:48:34] Speaker B: Well, that's right. That's right. But this goes back to the, to the idea of. So does that contribute to progress in theory, progress in understanding principles, or is it just a very practical way to harness and say something about the data that's being generated? [00:48:55] Speaker A: I think a lot of the. I think it has the potential to do both, but by default it mostly does the second one. And that's not a criticism that's to say, I think the field has, you know, the ingredients for a really great synthesis sort of laying around in different people's labs. And what we need is essentially like a small conference or workshop's worth of cross pollination where you can get the people with the appropriate skills all in the same room, give them the incentives to work together. And I think it's actually the incentives that are the hard part. [00:49:37] Speaker B: This idea of getting the proper, the people with the proper skill sets in the same room for a couple days. It's awesome. The proper skill set is a shifting landscape itself. Right now we have a very specific one, like people like you and Scott, whom you mentioned, and stuff like where these commodities, these tools are extremely valuable, widely used. But you know, going back to Hubo and Weasel, right, they're on transparencies, they're like putting like just little shapes and trying to listen for the sound of neurons. Even like Jeff Shaw, whom I mentioned earlier, would tell us stories about, you know, you're in lab, you'd make like a little hole out of like a wooden cutout and you'd like put a light up in there and is the neuron active or not? It's a very different world back then, very different skill set. And so I don't know how we track that. And that's a meta problem. [00:50:32] Speaker A: Yeah, I mean, that's why I say, like, if you're going to have a division of people's jobs or departments into theorist and experimenter, then I would want the experimenters to be able to analyze their own data because then they can do that even if it's a bit quantitative and even if that's something of a moving frontier sometimes. And then the theorists, they can focus on asking questions like, well, how does the brain actually work now that we've measured it? You know, now that we're able to interpret the measurement. [00:51:05] Speaker B: Let's get back to predictive coding though. I mean, are you. So are you. No, you don't want to pin yourself into a very narrow corner. But I mean, where are you in terms of. So the idea of predictive coding, predictive processing, is that we are constantly predicting what is coming into our senses. And so we have to have sort of a model, to use the term loosely, of what we infer to be causes of things coming into our senses, infer to be a cause in the world. So we're making these predictions from our world model. Bayesian brain hypothesis is one way to say it. Free energy principle is another sort of framework implementation. So are you on board with, like, this being the function of the brain? A major function of the brain. Where does this sit? [00:51:55] Speaker A: Let's say major function of sensory cortex. [00:51:59] Speaker B: Major function of sensory cortex. [00:52:00] Speaker A: Yes. [00:52:01] Speaker B: Why sensory? [00:52:04] Speaker A: So there's. Okay, gonna lapse into neurophysiology vocabulary for a little bit. You know, sensory cortex is usually, well, laminated. Like there's laminar sensory cortex down in these low areas. And then as you move both up the hierarchy towards cognitive areas. What we think of as cognitive areas. [00:52:24] Speaker B: Yeah, nice. [00:52:24] Speaker A: And also sideways over to motor, you get different patterns of lamination. [00:52:29] Speaker B: So the cortex is a laminar structure, meaning it has a fairly repeated. Well, very repeated motif of like six layers. [00:52:38] Speaker A: Like a layer, six layers, let's say. [00:52:40] Speaker B: Yeah. [00:52:40] Speaker A: Now, the one that raw sensory stimulus comes into is layer four. And the thing is that when we talk about different lamination patterns, you know, we're talking about. I believe they're called agranular and dysgranular, and those have either much less layer 4, or they're entirely missing it. [00:53:04] Speaker B: I think that. That's right. I think agranular has no layer 4, and dysgranular maybe has a. A weaker. [00:53:11] Speaker A: Yeah, like a weaker layer four. But now, if you were asking yourself, you know, okay, so if I'm doing Bayesian computation, then my observed random variable, which is the stimulus, it has to come in somewhere. And if I'm, you know, using this hypothesis about the laminar microcircuit doing predictive coding, then where's that coming in? It's coming in, in layer four. So what is the circuit doing if it doesn't have layer four? [00:53:41] Speaker B: That's where the generative network is. Right. [00:53:45] Speaker A: Maybe, like, logically, it can't be doing variable by variable Bayesian inference. It could just store priors. But then why does it have a layer two, three? Because that's the one that computes errors and thereby updates the predictions. Know. So I actually really, you know, since we're following on Rajesh Rao's episode, I actually really like his hypothesis that, oh, 2, 3 is the one that handles sensory data. You know, 5, 6, is actually handling chiefly motor data. And when you compute an updated sensory prediction, you might route it through there on its way somewhere else. But then fundamentally, he would be saying, okay, now, you know, oh, and he also notes that there's thalamic projections into a cortical column that don't have to go through layer four. [00:54:43] Speaker B: Right. So the desire is to bypass layer four. Bypass layer four being a necessary part of predictive Coding. Is that one way to put it? [00:54:53] Speaker A: Well, or ways of reformulating the predictive coding hypothesis so that you can still have sensory data coming in even when there isn't a layer four. [00:55:01] Speaker B: Sure. [00:55:02] Speaker A: And then you just have physiological and evolutionary questions about why are these areas a granular, dysgranular laminar, what are the differences between them and the similarities. But you haven't totally abandoned your framework. Whereas if you're committed to layer four being where sensory observations come in, then logically the Bayesian computation can only be done in our sensory cortex. [00:55:29] Speaker B: Okay, I see. [00:55:30] Speaker A: So when I say I think I'm committed to this being, you know, an explanation of laminar sensory cortex, I'm being kind of minimalist. [00:55:40] Speaker B: Sure. Okay. Okay. But so you're on board with Roger's story about, like the incoming 2, 3 layer 2, 3, outgoing layer 5, and how that's, that's one way that it's biologically plausibly could be implemented. But your divide and conquer predictive coding also strives to be biologically plausible. Maybe we can start with like, what is divided and what is conquered in divide and conquer predictive coding. And then, and then maybe talk about a. [00:56:10] Speaker A: Sure, yeah. [00:56:10] Speaker B: Plausibility. [00:56:12] Speaker A: Okay, so if you go look at some of the free energy papers, I think there's even one called the Graphical book, like the Graphical Brain. They tell a story about how a probabilistic graphical model has these different nodes representing different unobserved random variables, and these get mapped onto cortical areas, and then the communication between areas is a series of messages in a belief propagation algorithm that eventually gets down to primary sensory areas where the random variable is observed. Now, this kind of algorithm makes a very specific assumption that they call the mean field assumption about essentially saying we're going to approximate the posterior distribution with a product of independent representations. So we'll have one representation for the visual, one for the audio, you know, one that represents the integration of visual and audio. But they're actually all going to sort of be statistically independent in, you know, the approximate posterior scare quoting as implemented in the brain. And by the way, on the, you know, machine learning side, we know that this is quite a bad representation of a posterior distribution. [00:57:48] Speaker B: Why is that? [00:57:49] Speaker A: You know, essentially it can't represent correlated posteriors. [00:57:54] Speaker B: Because. Because of the independence assumption. [00:57:57] Speaker A: Yeah, like it's making a very strong independence assumption that was necessary to simplify the math in, like 2003. Literally the first time variational inference was published was in a PhD thesis from 2003 or so. Like, you know, all my respect to people who are developing new things and make simplifying assumptions. Right. But of course, the point of science is that we always want to try and relax our simplifying assumptions and ask, can we come up with a way to essentially, can we assume that the real world is really complex and complexify our models over time so as to accommodate the real world? [00:58:39] Speaker B: Well, but then you're also dealing with Occam's razor. You're dealing with trying to figure out, well, what can we abstract away? What are the important things that we can abstract. And so when you make assumptions like that mean field assumption, you are making trade offs. It's just whether they're the right trade offs given what you're trying to answer. Right? [00:59:02] Speaker A: Yeah. And sort of what I had Learned through my PhD on the machine learning side was that if you have a complex structured graphical model, as might be used in some cognitive science task, then mean field variational inference doesn't work very well. And I thought, well, you know, if I take a theory or, sorry, if I take a hypothesized model from neuroscience and I apply it in AI and it just doesn't work very well. Is that what the brain does? No, I don't think the brain, you know, fails at things that are doable with current AI methods. Or rather, I don't think the brain fails at doing things that we've observed it to be able to do in actual behavior. You know, I think that's a case where the algorithmic model is just inadequate. So I said, okay, let's make a better one. Instead of mean field, you know, independent independence assumptions, let's instead try to break down the random variables from one another so that you maintain their correlations when you update them. [01:00:15] Speaker B: Is this the dividing part? [01:00:17] Speaker A: Yeah. Yes. [01:00:19] Speaker B: Just say it again. So what are you. You're dividing. Go ahead. [01:00:22] Speaker A: You say, you take this, you know, a probabilistic graphical model. So it's a mathematical analogy to the brain's internal model of the world. And you say this consists of a bunch of different variables that are connected to each other in various ways. You know, kind of like cortical columns. We can imagine. So this was actually how I imagined it was. You know, one cortical column, one random variable. And then when they communicate with each other, you know, those are conditional dependencies and all that. And then I said, okay, let's try to divide this so that we can update each random variable in a way that takes into account the correlations with the other random Variables that it's the conquer part, that's the divide part. [01:01:14] Speaker B: Well, yeah. Or yeah. [01:01:15] Speaker A: Then the update is like the first step of conquer. Then the real conquer is that we have all of these importance weights from the world of like Monte Carlo methods for Bayesian statistics that then let us eventually write out, you know, here's how good a fit to the joint model we have to the whole probabilistic graphical model. So we're saying we want to do local updates that maintain some kind of global coherence. And it gets called divide and conquer because. Well, frankly, Neurips is ultimately a computing conference. Yeah. And all computing people have taken an algorithms class where they talk about divide and conquer methods. [01:02:08] Speaker B: I see, I didn't realize that. So this is a well known phrase in the algorithmic world. Yeah. [01:02:15] Speaker A: Like if you talk to algorithmicists and say divide and conquer, they'll say, oh, okay, so you're taking some kind of huge data structure and recursively performing the same computation on each component before going on to the connected bits. [01:02:30] Speaker B: Okay, that makes a lot of sense. [01:02:32] Speaker A: You know, and it just sort of happened that like you needed a lot of Monte Carlo tricks to make this work. But when you do, it's very sort of intuitive why you would want to do it that way if you were then going to map your probability model onto a physical circuit structure where the different random variables are spatially separated and have to signal to each other. [01:02:56] Speaker B: So what is in the divide and conquer model? What is required for it to be biologically plausible? [01:03:07] Speaker A: So the claim of biological plausibility we made is to say the computations are purely local. [01:03:16] Speaker B: Right. Instead of the local updates, instead of globally. [01:03:19] Speaker A: Right. So people have talked about, could back propagation be implemented in the brain. Tomasso, our senior author, has this paper on using Gaussian predictive coding to actually implement backpropagation as a substrate for backpropagation. And in this paper we're sort of saying, well, let's assume you can't do backpropagation. You don't have any kind of global computation graph or computation automatic differentiation tape in the brain. But let's assume that one cortical column can signal to another and that if you're representing one random variable locally, then you can do really three things with it. Sample from its distribution, measure the log density of its distribution. So the log probability density, where density just means that you're talking about continuous random variables and not discrete ones. And three, take the gradient of the log probability density. And if you can do those three things Locally, then you have the primitives necessary for our algorithm and you can thereby obtain global coherence out of local computations and you don't need any back prop. [01:04:50] Speaker B: Since we were talking about that experiment versus theory metascience topic earlier, I mean, does it, does this make clear predictions about what kinds of signals that you would expect to see now here's where. [01:05:04] Speaker A: It gets biologically implausible. These were still rate coded neurons, right? [01:05:10] Speaker B: So yeah, so they can still cross. [01:05:12] Speaker A: Between positive and negative. [01:05:14] Speaker B: Right. So, so brains use spikes among other signals like LFPs, but the essentially all of modern machine learning or AI models use rate codes. And you know, there are a lot of people working on spiking neural networks also. But I assume that if you're going to implement it in like a spiking network, then you know, you, you'd have to go, I mean, it's plausible with the sampling approach. Right. Because that's what spikes are all about. [01:05:42] Speaker A: Spikes are all about sampling. [01:05:43] Speaker B: Well, you can. So going back to the old debate on like how probability is implemented in brains, there's the sampling approach versus the approach where the spike counts map onto some probability distribution types, et cetera. [01:06:00] Speaker A: Oh yeah, with a twist. So with the twist. Yeah. So I know there's a lot of sampling approaches where you essentially say a neuron has a preferred stimulus and implements a likelihood function. And the priors are actually represented in the developmental program in the genome, not in the neurons themselves. And then those eventually make the prediction that they make the opposite prediction to predictive coding. They say when the posterior probability of what the neuron prefers is higher, the neuron will fire more. And predictive coding actually and the free energy principle and all of those approaches are much more information theoretic. They say that when the stimulus is thoroughly expected, you should see much less neuronal firing. [01:06:55] Speaker B: Right, right. [01:06:56] Speaker A: And so we're in that family of theories, though we do use random sampling. My, you know, my dispute with spikes being about sampling is that of course if you patch clamp a neuron it like in vitro, then what is it like 96% of the variance and it's spiking is explicable. Deterministically there's stochasticity in the real brain, but we don't know that the single neuron is intrinsically stochastic that way. [01:07:27] Speaker B: Right. That way we do know it's stochastic. But yeah, okay, so, but then going back to you started by saying that the majority, this is where it gets into non biologically plausible mechanisms is that it doesn't use spiking. [01:07:44] Speaker A: Yeah. And actually, I think Blake Richards group has recently written to our rescue with their paper on what is it? Brain like learning with exponentiated gradients. [01:08:01] Speaker B: Brain like because it uses spiking. [01:08:04] Speaker A: So in their case, brain like, because it obeys Dale's law. Oh, so they'll have inhibitory neurons which are negative and excitatory neurons which are positive and the signs will never flip. [01:08:18] Speaker B: And they still use rate, but. [01:08:20] Speaker A: Yeah, they're still using rates there. [01:08:23] Speaker B: Yeah. [01:08:24] Speaker A: And you know, how realistic do I think that is? I don't really know. Like, I mean, there's areas that could use rate codes, but there's also too many experimental findings showing that precise timing matters. [01:08:44] Speaker B: Yeah. [01:08:46] Speaker A: So what it could be, you know, and this is not an original thought to me, this is coming from a computational brain and behavior paper. I can send you the name. It's from 2020. You know, it could be a prefix free code. [01:09:05] Speaker B: A what? Sorry, say that again. [01:09:06] Speaker A: A prefix free code. [01:09:08] Speaker B: Prefix free. What does that mean? [01:09:10] Speaker A: So that means that once you send a certain pattern of spikes, and by certain pattern I mean, you know, the precise timing determines which code word it is. But once you've sent a certain pattern of them, then that code word is over. So prefix free means that no code word is a prefix of another code word. So if I say abab, then that's either a full code word that now tells you something, or there's no full code word that starts ABAB, except the one I'm already sending. [01:09:50] Speaker B: So what would that mean? Is that just because a rate code. Go ahead, sorry. [01:09:56] Speaker A: So a rate code would say you listen over a certain period of time. [01:10:00] Speaker B: Right. Whereas the timing oriented code 10 spikes. [01:10:03] Speaker A: You divide by time T. Yeah. Whereas a timing oriented code is like you get a spike at time T. Now you think, well, what comes next? Spike at delta T. Delta T prime. Delta T prime, prime. Right. And you look for, you know, very specific timing, like with musical notes in your lookup table. [01:10:26] Speaker B: You figure out, oh, I was just received this particular message. [01:10:30] Speaker A: Yeah, I just received ta ta. You know, it's. Gosh. Has someone actually tried using third grade music class on timing codes? But yeah, a prefix free code would then be a timing based code where you say, once I've received a full code word, that's it, I know that I've received a full code word, I can interpret the whole message. [01:10:58] Speaker B: And you clean your cache and move on. [01:11:00] Speaker A: Yeah, clean my cache and move on. Exactly. But really, I mean, gosh, on the other Hand that doesn't. See, this is the thing that bugs me is like there's also evidence that, you know, dendrites, right. Are sort of accumulating this precise spike timings into something more like a continuous signal that gets fed up to the cell soma. So how can it be that there's precise spike timing and there's dendrites that convert from spike timing to spike rate, more or less. [01:11:36] Speaker B: Well, I don't know that those are necessarily problems. Right. I mean, so when you were going to say, you know, is it a. Is it a spike timing code or a rate code? And because we know that some things require precise spike timing, like the intraorial differences. Right. The underlying how owls hear, locate sound, for example, timing is very important, but maybe timing is not as important in, I don't know, frontal cortex or prefrontal cortex or something. And it could be both. [01:12:07] Speaker A: Yeah. [01:12:08] Speaker B: Depending on what you're needing to accomplish. An organ like the brain is fairly complicated, it turns out, and it might be implementing lots of that degeneracy you were talking about. That could be the case in terms of how it computes. It's not maybe one or the other, but just depends on what's needed. [01:12:31] Speaker A: Yeah, that's very, very possible. That essentially. So actually, not only is that possible, like that would go very well with, you know, some of our recent preprints that basically say predictive coding is a much more cognitive computation that can take place in frontal areas. You know, back to our GLOW paradigm. Those global oddballs seem to get detected in frontal areas, but not in lower sensory cortex. [01:12:59] Speaker B: Interesting. [01:13:00] Speaker A: Okay, so maybe the laminar cortical column, you know, is something like a big stack of universal computational primitives that don't tell us much from just reading off the anatomy about what it is doing. Oh, God. [01:13:16] Speaker B: Yeah. [01:13:17] Speaker A: No, if we broadcast this, the modular mind people are going to crawl out from under the rocks. We spend so much time banishing them. [01:13:29] Speaker B: That's all right. That's all right. There's room for everybody. One of the things I wanted to ask you about is so you're mindful of what is and what isn't biologically plausible in this, you think it's important. If you're going to understand this sounds silly to say. If you're going to understand the brain that you need to implement through a model, you need to implement something that is biologically plausible. And. But you were willing to forego the spikes, but so inevitably, any project is going to have hurdles. What hung you guys up the most in Getting this thing to work and, or getting it theoried out properly. [01:14:15] Speaker A: So two big things, you know, the first time was when I tried to write out all those weighting rules, essentially saying like, how do you accumulate, you know, the weights from doing a dozen successive updates to a random variable over a dozen passes? And I got something that looked really complicated and eventually just exceeded the numerical, the numerical precision of floating point numbers in a computer. [01:14:45] Speaker B: Okay. [01:14:46] Speaker A: And what I eventually did was just like have a meeting with how and talk out some options. And he pointed out that one of them was essentially just cheating, forgetting the old importance weights and just saying, you know, I start with some particles, that is, I start with some samples, I do a computation, step on them, now I have new samples, I'm going to do the same thing next time. I don't save any weights. And we ended up going with that because it turns out, you know, once we like both proved to ourselves that this was legal to do within all the rules of the game, this just turned out to be the simpler thing that was able to work. [01:15:32] Speaker B: And so you're okay. I mean, so storing the weights over time maybe is not even as biologically plausible as throwing recurringly doing. [01:15:41] Speaker A: Yeah. [01:15:44] Speaker B: So there were two things that you said. [01:15:47] Speaker A: So the other one is that between the first preprint draft and the second one that represents our camera ready, we added like this preconditioner that helps the optimization go in the right directions and respect the geometry of like the latent space and you know, this very mathematical like technical itchy thing. And the thing is, without that stuff doesn't work and you just don't perform very well on your test tasks. Now we did manage to rig this up in such a way that it could be biologically plausible. You know, it's effectively like calculating a certain function of the prediction errors. So if the prediction errors are locally available, then this thing is locally available. And you could even, you know, nod to the free energy principle and say, ah, there's that precision of the prediction errors that these free energy guys are always on about. But really it was just motivated by getting the damn thing to work in. [01:16:58] Speaker B: The end you have to have a product, a working product. [01:17:02] Speaker A: Yeah. And you know, this is where I forget which famous person said that. No two famous people have said this. Okay. It's Richard Feynman and Daniel Dennett have both, you know, said if you want to understand it, you've got to be able to build it. [01:17:20] Speaker B: Did Dennett say that also? I mean Feynman, he said a version of that did he? Okay, yeah. Feynman says, I do not understand what I cannot build. Build. [01:17:30] Speaker A: No. And then Daniel Dennis is completely different. He actually said at one point, AI keeps philosophy honest. That's what I was remembering. [01:17:39] Speaker B: Well, that's interesting. [01:17:41] Speaker A: Which is a whole other can of worms. So my mistake. But you know, what I would say is if you want to say that predictive coding is a thing that happens in the brain based on your experimental observations, then it should hypothetically be possible to build an algorithm that does predictive coding and actually works for some of the toy tasks that we use in AI, which are still vastly more simplified than the tasks we use in neuroscience, or rather the task of the brain. An AI image generating network does not have saccades. Well, unless it's one of rouse, in which case it does have saccades now. But, you know, that's very new for AI and completely trivial for neuroscience. And so I think you have to be able to build up AI to the point that it's able to do things that are trivial for neuroscience before you can really say, oh, a computational theory is viable now. No, it has to do the things that are most trivial for the brain. [01:18:48] Speaker B: Well, all right, so then I have two kind of broader questions for you before we end our conversation today. And one, just going off of what you just said, and I've sort of been building up to this. Do you need to understand the brain or brain processes or implement things in a similar manner to how the brain does things to build the best artificial intelligence? Do we need to mimic the brain? And at what level, if so. [01:19:24] Speaker A: So I think that depends on how you define. I'm sorry to be philosophical about this, but it really does depend on how you do it depends on how you define artificial intelligence. Right? [01:19:37] Speaker B: Geez, yeah. [01:19:38] Speaker A: And I don't like to commit to a definition of that at all, because what I personally want to do is understand the brain. That is the motivation for me. I want to understand the thing that actually exists, try to draw, you know, so to speak, laws and principles from it, and then maybe I could engineer something with those in the same way that you can, you know, engineer a steam engine with Newton's laws and thermodynamics. Right. But you do have to do, you know, in my view, the interesting part is to do the fundamental science before the engineering. Now, if you are engineering first, then an intelligent task is whatever the heck you have a benchmark for. And there's this alternation between making a harder benchmark and beating the current benchmark. And in that case, do you really need the brain? Well, no, you need to understand your benchmark task. Like, there's a lot of tasks where if you have a very deep understanding of the task itself, you don't necessarily have to understand how the brain would solve that task. [01:20:53] Speaker B: But there's all the talk of AGI, right, in the AI world. We're going to get the AGI by next Tuesday. It's going to be the Tuesday after that. No, and then it's like five years. No, no, no, it's 20 years. You know, I mean, my. I personally, I feel like going away from definitions again. I don't know what AGI is, but I think that the humans are the wrong benchmark. It's like just what, what's the right analogy? All we're doing is like staring at ourselves in the mirror and yeah, that's real intelligence. It's only because it's us. We think we're great, I guess. [01:21:27] Speaker A: Oh, there I totally agree. Because, you know, what is it we got, you know, optimal chess playing at superhuman level. Maybe. Was that a decade before we got, you know, neural networks that could pass imagenet classification at a human level? [01:21:47] Speaker B: Yeah, half a decade maybe. I think it was 2007, you know. [01:21:50] Speaker A: And that was at the time, you know, chess was sort of the king task where we thought if we understand how to play chess, we computationally, we understand cognition computationally, or we've like built, you know, intelligence. And then. Well, I don't even have to say. And then there's a cliche for it. More of Ech's paradox. Yeah, you know, like I am very much a more of ES paradox person where I say, understand embodiment first sensory motor stuff, first feeling first, and then maybe later in retrospect, you'll turn around and say, here's all these normative principles we derived from our empirical study and we understand now how those tell us how to build what intelligence is and how to build it. But the term AGI almost feels like it. I admire the people. I admire the sheer ambition of the. [01:22:59] Speaker B: People who are, well said. [01:23:01] Speaker A: Okay, Trying to do that and going to conferences like the AGI conference. And the other angle on it is, unfortunately, that I do think in the era of large language models, there's been a tendency to fool ourselves and define AGI down so that instead of being a name for something we don't understand and have to come to understand, you know, only through working at it over time, it's become, you know, a name for something that we say has Happened. [01:23:36] Speaker B: Oh, yeah, yeah, yeah. [01:23:38] Speaker A: Like the latest, you know, model from wherever is AGI. [01:23:42] Speaker B: Is it AGI? It's Kyle. Yeah, right. [01:23:44] Speaker A: And it's like, okay, but that's guy. Because it talks. That's because it talks. And we know the Eliza effect. We know that if you talk and talk and talk, people will project personhood onto the words. And to be fair to people, prior to the invention of LLMs, 100% of all linguistic stimulus we ever received came from other people. Well, except maybe for like bad Markov chains and Eliza and that sort of thing. [01:24:13] Speaker B: Right? Yeah, yeah, Eliza. [01:24:14] Speaker A: Overwhelming. Super majority. Super major. So for an optimal probabilistic reasoner, if you heard language, then the rational conclusion was that there's a person. [01:24:24] Speaker B: Well, we also know that some of us aren't that bright. For example, I've said, I think only ever more ofeck, and you say Moravech. Which is it? So moravec parents. [01:24:36] Speaker A: I have no idea. [01:24:37] Speaker B: Oh, really? [01:24:38] Speaker A: I'm so embarrassed now. [01:24:39] Speaker B: Oh, I'm sure I'm wrong. I'm sure I'm wrong. Anyway, that's the paradox that it turns out that it's easy to build, that. [01:24:49] Speaker A: I blanked from my memory where someone were like, Mitch corrected me on this. [01:24:53] Speaker B: Oh, I don't know. But anyway, that paradox is that it? The things that we think are hard to do, like chess. Yeah. Turn out to be easy. And the things that we think were easy that are easy to do. Like walking. Two legs, a weight. Like be. Yeah. Be a. Or like a waiter. Balancing a tray. Moving. Walking through a restaurant. [01:25:15] Speaker A: Don't list that as easy. Talk to a waiter before you call that easy. [01:25:19] Speaker B: Well, what I mean are the sensor and motor everyday things, the continuous sorts of behavior. [01:25:25] Speaker A: Yeah. Those are hard even for embodied human beings. And that's one of them. [01:25:29] Speaker B: Yeah. [01:25:30] Speaker A: Talk to me. Go get a friend who works in food service. [01:25:33] Speaker B: I've been a server. I've been a waiter. So there's three. That was the poor example. See, again, I say more vec. I give bad examples. What do you do? Maybe not of us. [01:25:44] Speaker A: I'm sorry, I'm not. I'm not supposed to be shaming you on your own show. [01:25:47] Speaker B: What are you shaming me on my own show? Sorry. [01:25:51] Speaker A: Okay. You've been a waiter. Yeah. It's easy for you because you practiced. [01:25:56] Speaker B: I also have ungodly balancing talent. No, that's not true. All right. But I do have another question, because you. You are interested in. How did you phrase that earlier? Not consciousness, but feeling. Why anything feels the way it does. Right. Another way of. [01:26:12] Speaker A: Yeah. [01:26:13] Speaker B: To say it is like just subjective experience in general or affect, I guess. [01:26:18] Speaker A: Affect, the affective component of it. Like why do some things feel. The classical dimensions of core affect are valence and arousal. So why do things feel pleasant versus unpleasant? Why do things feel exciting versus relaxing, what you could say. Or arousing versus sedating. [01:26:40] Speaker B: Okay, so my question then is, and I was thinking about Anil Seth, who ties predictive coding into consciousness and that that's going to solve consciousness essentially. What do you think about sort of maybe that, but also its predictive codings relation, possible relation to affect, the way you just described it. [01:27:03] Speaker A: So I have to say I think the second one, the relation to affect through interoception, homeostasis, allostasis, this stuff is a lot easier to establish than anything about consciousness. And that's why I've sort of said like, well I'm not going to touch consciousness with a ten foot pole. [01:27:25] Speaker B: Right. [01:27:25] Speaker A: It's much too hard. Like I'm not. Everyone's a little bit of a philosopher, but I'm not very much of a philosopher, so I'm just not going there. As to the connection between predictive coding and consciousness, I mean, here's one of the reasons I think consciousness is so hard to think about is that oh, what is there this like classic thought experiment about consciousness. Like couldn't you imagine a philosophical zombie who has the same input output mapping and the same observable behavior, possibly even the same electrophysiological readouts as a real person, but isn't conscious? [01:28:06] Speaker B: But what about the affect aspect? Then they wouldn't have affect either. Right? [01:28:11] Speaker A: Right. I mean if they don't have consciousness it possibly makes sense. Right. Well that's what I would ask is, does a philosophical zombie have a predictive internal model? Do they have interoception? And I ask myself, can I imagine someone who has the same internal states and control systems at a physical level but doesn't experience them at all? [01:28:43] Speaker B: The answer is no. [01:28:45] Speaker A: Yeah, the answer is just no. Because I'm like, but there's a latent variable there. There's like, you know, representations and computations going on. There's internal states maintained over time and internal dynamics. I can't imagine how there could be no one home. And that's possible. That's, you know, like I said, I don't study consciousness because I recognize that this is very likely a limitation of my imagination rather than some kind of answer. It's just the way my intuitions work. And so I prefer to be, at least on the engineering end, where, like, I can bang an intuition against an experiment that doesn't work and bruise it until it's softer and can be remolded into another intuition. [01:29:35] Speaker B: Now that you're doing experimental work, how do you think about the role of intuition? Sorry, I know there's another question, and I've got. I actually have to go in a minute, but do you feel that your intuition has served you better from the computational world, theoretical world, or the experimental world? Because it all comes down to that. To make any progress, you have to make a guess, and that's from intuition. [01:30:01] Speaker A: Actually, I would say I don't know a good way to put those two together right now. I'm sorry, I just don't. Which tomb the intuitions from both ends. Because maybe if I was doing experiments with naturalistic behaviors, I would develop more of an intuition for how to drive. Let the experimental and drive. But with the highly constrained experiments, you know, I just don't. I get an intuition for the task and the setup. [01:30:31] Speaker B: Yeah. [01:30:32] Speaker A: And like, the way that a particular data set or animal might behave, but not one for. How does you know? Not one from. For like, how do I pass from these spike trains to psychology to, like, the mentalizing I can do about the animal? I have no bridging intuition there whatsoever. [01:30:56] Speaker B: Well, see, now that I do quote, unquote naturalistic experiments, meaning we just. There's just a mouse running around in a box, and we measure, measure, measure, measure. And now we're trying to relate neural activity to that ongoing behavior, which is continuous. They groom slightly differently. They move their paws slightly differently. Are we going to call that the same groom as the other one? How do we define that? My intuitions about experimental neuroscience, which were forged in that controlled, constrained environment, I think, are not serving me well. So I'm trying to build new intuitions. [01:31:32] Speaker A: Yeah. If there's one thing I've learned in my life, it's really the limits of raw intuition and how you just kind of have to bang up against experience long enough to start developing. Let's end on a pun, call it posterior intuition rather than prior intuition. [01:31:49] Speaker B: Right, exactly. You have to take action and update your posterior in the way that you phrased it. [01:31:58] Speaker A: Yeah, yeah. [01:32:00] Speaker B: Eli, did we miss anything? We went haphazard. We went quite technical there. We remained out in the forest some. Is there anything crucial that we missed that you want to end on? [01:32:13] Speaker A: Oh, actually, yes. So there's this thing I always keep in my Twitter bio. Abolish the value function. If I'M doing a podcast. I should tell people what that means. [01:32:23] Speaker B: Yeah. What does that mean? That's a great way to. [01:32:26] Speaker A: Okay, so that means that at one point in grad school, Jordan Theriault, who will probably listen to this. Hi, Jordan, recommended this book to me entitled More Heat than Light by Philip Murovsky. [01:32:41] Speaker B: Okay. [01:32:42] Speaker A: And Philip Murawski is like a very philosophical leaning, part economist, part historian, and he wrote this whole book about the analogy between energy and the conservation of energy and economic behavior. So all of this notion of, like, there's an economic agent who maximizes utility or minimizes cost, that's the value function. Yeah, all that stuff is the value function. And what he pointed out is that essentially, if you think like a rigorous physicist, the analogy is bunk. Economic value is not a conserved substance. People produce things that are valuable and then consume them. The amount of value is not a fixed constant number that stays the same all through all of this. [01:33:40] Speaker B: Well, thinking like a rigorous physicist, would it be called an emergent property of production then? [01:33:47] Speaker A: I mean, like, I'm not sure what Murawsky would say there, but his point was that in order to get all the math that was imported into economics and then by the way, into cognitive psychology, into reinforcement learning, into optimal control, into all these things that we use in psychology and neuroscience imported from economics, and to get that from physics to economics in the first place, you have to assume a conserved substance. So a conserved quantity which represents a physical substance on which you can then have a gradient flow, a certain kind of dynamical system. [01:34:28] Speaker B: So absent that, where do we go? What is the result of abolishing the value function? [01:34:33] Speaker A: Right. And so if that's just the wrong metaphor, then I think we need to go into a much more control theoretic frame of mind where some signals represent references and they can be directly compared to input signals from the bottom up by a comparator. [01:34:52] Speaker B: Yeah. [01:34:52] Speaker A: And then when I shift from folks. [01:34:54] Speaker B: Yeah. I've come around to the theory things as well. Yes. [01:34:58] Speaker A: So then when I shifted my point of view from, you know, all of these decision making tasks, they're about grabbing more value. Imaginary gold coins, like in Super Mario Neuroeconomics. Yes. Versus they're about measuring the distance between a desired outcome and a actual outcome. Much more perceptual control theory, where the. [01:35:26] Speaker B: Reference signal is somehow internally generated by which you compare. Which is amenable to predictive coding. [01:35:32] Speaker A: Yeah. And then I thought, okay, well, now we've gone from substance to distance. These are completely different metaphors. Distance is the superior one. Because as soon as you set up a mathematical model, you can measure the distance in the parameter space. [01:35:46] Speaker B: Right? [01:35:48] Speaker A: And by the way, that's actually the difference between reinforcement learning and active inference. You know, in all of that free energy literature is that the active inference people are saying let's specify desired outcomes as target probability distributions, then measure the relative entropy distance from one to the other, and then just try to get closer to the desired outcome distribution. That's abolishing the value function from substance to distance. [01:36:22] Speaker B: All right, Eli, I appreciate your time. Look forward to more work coming out and good luck with the experimental. Be in touch. But good luck with learning more experimental research. [01:36:33] Speaker A: The latest analysis seems to draw a very different conclusion than the ones we pre printed, so going to have to reconcile those. [01:36:41] Speaker B: Yeah. All right. So yeah, I know you have an office mate there. Also needed to get back in the office. So thanks. Tell them thank you for letting me take up some of your time. So thanks for coming on. [01:36:51] Speaker A: Yep, thank you. [01:36:59] Speaker B: Brain Inspired is powered by the Transmitter, an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advanced research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives written by journalists and scientists. If you value Brain Inspired, support it through Patreon. To access full length episodes, join our Discord community and even influence who I invite to the podcast. Go to BrainInspired Co to learn more. The music you're hearing is Little Wing, performed by Kyle Donovan. Thank you for your support. See you next time.

Other Episodes

Episode 0

March 08, 2023 01:23:27
Episode Cover

BI 162 Earl K. Miller: Thoughts are an Emergent Property

Support the show to get full episodes and join the Discord community. Check out my free video series about what's missing in AI and...

Listen

Episode 0

August 15, 2024 01:27:51
Episode Cover

BI 191 Damian Kelty-Stephen: Fractal Turbulent Cascading Intelligence

Support the show to get full episodes and join the Discord community. Damian Kelty-Stephen is an experimental psychologist at State University of New York...

Listen

Episode 0

January 19, 2022 01:11:05
Episode Cover

BI 125 Doris Tsao, Tony Zador, Blake Richards: NAISys

Support the show to get full episodes and join the Discord community. Doris, Tony, and Blake are the organizers for this year’s NAISys conference,...

Listen