BI 075 Jim DiCarlo: Reverse Engineering Vision

[00:00:02] Speaker A: Science is about finding models that explain a lot but have the minimal needs in them to explain the maximal amount of stuff. Right. So. But that's an art. It's not really a science yet. I mean, I don't think it ever is. You're just guessing at the next model. I don't have dreams of. There's some, you know, great, you know, simple, you know, differential equation that the whole brain is going to follow, and that's the answer. Like Maxwell's equations. I've heard some physicists tell me that's what our answers of the brain should look like. And I find that almost, you know, that's almost, you know, I can't imagine how that could ever someone could think that's how you should think about a system like the brain. It might be. If it was true, we'd love to know it. But if we all sort of said we get into neuroscience because that's our hope, then I think a lot of us are going to be disappointed. My bet would be against that. This is brain inspired. [00:01:06] Speaker B: Hello, everyone, it's Paul. Today I'm happy to bring the voice and thoughts of Jim DeCarlo to you. Among other things, Jim is the head of the Department of Brain and Cognitive Sciences at mit. We've spent a good amount of time on this podcast talking about the work coming out of his lab, which is right at the interface of AI and neuroscience. His goal is to reverse engineer visual intelligence, and the focus has been modeling the ventral visual stream, that hierarchical cascade of brain areas that's thought to underlie our ability to recognize objects. And he does this using deep learning models, although, as you'll hear, he's careful to distinguish his models from what's commonly thought of as deep learning. For example, he doesn't think that his models prove that backpropagation happens in brains. Backpropagation is just one solution for optimizing synaptic weights, and brains could use a different solution. If you've been listening since I started the podcast, you'll remember that I had Dan Yeamans on way back then. Dan was a postdoc in Jim's lab, and Jim gives basically all the credit to his lab members for producing great work. More recently, Nicole Rust was on the podcast, and she was also a postdoc in Jim's lab. So it was good to finally have Jim on the podcast. We discussed some of his recent work, adding recurrence to his previously feed forward models of the ventral visual stream. We also talk about his work using the models to create images that, when viewed by a subject, can control the neural activity of specific neurons and populations of neurons. So a way to stimulate neurons just by showing someone an image. We also discussed his notion of what understanding is and how it differs from other notions of understanding, and there's plenty more beyond those topics. I of course, link to the papers that contain the research that we discuss. You can find those at the show notes at BrainInspired Co Podcast 75. If you value this podcast and you want to support it and hear the full versions of all the episodes and occasional separate bonus episodes, you can do that for next to nothing through patreon, go to BrainInspired Co and click the red Patreon button there. All right, thanks for listening and enjoy the mind of Jim DeCarlo. Jim, we've talked about you a lot, talked about your lab a lot, at least on the show. So it's good to have you on. Thanks for being here. [00:03:35] Speaker A: Thank you for having me, Paul. [00:03:37] Speaker B: So I'm going to get right into it. In the interest of time, you frame your work as reverse engineering, visual intelligence. What does that mean? And maybe you can say a little bit about what that means without focusing specifically on what your lab does in particular. [00:03:54] Speaker A: Yeah, when I say that phrase, what it means to me is what we're seeking something that's going to be a description of brain function at engineering level terms, which means that we should be able to build something similar to it in terms of both internal and external function. And I often think from a science point of view, it means forward engineering under the constraints of the data of natural science. And so, you know, that's how it actually operationalizes. It's just do good engineering, but with guidance from the data, because the data aren't going to tell you what to actually forward engineer, they're just going to give you guardrails on doing that. And I refer to that overall approach as reverse engineering. [00:04:28] Speaker B: So the reverse part is the actual biological data. [00:04:31] Speaker A: That's exactly right. Otherwise forward engineering could just be on its own. But now we have some guardrails and so then we give it a new term. And maybe it could be called forward engineering with natural science guardrails, but that doesn't sound as punchy as. Yeah, exactly right. [00:04:46] Speaker B: Okay, great. So your lab has really embraced deep learning as models of brain processing, specifically core object recognition in the ventral visual stream. And one of the main goals of your lab is to solve core object recognition. So I'm going to do a quick and dirty summary of the past decade or so of your work that will hopefully bring us up to the last couple years in your lab and then you can correct my story as much as you'd like. And then we'll go from there with the current projects and state of things going on. Before I do that, though, I use this phrase, core object recognition. So for clarity, would you mind just describing what core object recognition is? [00:05:29] Speaker A: Well, that was a way of just describing how we've operationalized the problem to study so far, which is vision in the central field, the central 10 degrees, which is about two hands at arm's length, what you see in your center of gaze. And for just the duration of time, which is about the time of a natural viewing duration, which is just 200 milliseconds, a fifth of a second, sort of a blink of an eye, almost literally. And that, that's just to reduce the spatial challenge of the problem and the temporal challenge of the problem. And so, and then I decided to call it core recognition. So we had some board to refer to, but it's really just an operationalizing the problem so that it's not all a vision, which is too hard to. Too hard to do all at once. Too hard to swallow. Too hard to swallow visual intelligence whole. Right. [00:06:11] Speaker B: So, yeah, yes. [00:06:13] Speaker A: And core sounds a little more interesting than step 1.0, which is another way to view it. [00:06:19] Speaker B: Core sounds important. Yeah. [00:06:21] Speaker A: Right. Yeah, that's a bit of branding, I guess, I'd say. [00:06:24] Speaker B: I think you're pretty good at branding. Yeah. Yeah. [00:06:26] Speaker A: Okay. [00:06:27] Speaker B: So, okay, you worked a long time with models that, you know, map onto the ventral visual stream, that ventral visual hierarchy in brains. And you've been doing this a long time in non human primates that have been performing what you call this core object recognition task, where you show these flash, these pretty quick objects and the non human primate or monkey has to identify the objects, essentially. And while you've done this, while the monkey's performing this task, you've been recording neurons and that the number of neurons that you've recorded has grown and grown and grown. The number of neurons that you simultaneously have recorded until these days, I think you're putting like three Utah probes in and recording over a thousand units at a time, essentially on average. Right. And these models that you had been working with were deep in the sense that they had multiple layers, but they weren't deep in the sense of deep learning as it's known today. And we'll get into that. But often the, you know, the parameters of the models would be hand engineered using known constraints from the physiological recordings this was painstaking work that made improvements. They may have been somewhat marginal improvements, but there was this steady march of improvement until there was a breakthrough in 2012, around that time when you started asking the network, like the monkey, to actually perform the task. And you ended up with a set of models and one model in particular called HMO model, really blew away the previous model's performance to account for the neural activity. But you still weren't training these models using back propagation, which is what all deep learning models have know used these days. You found, if I'm not mistaken, you found the parameters by this sort of massive search process over the space of possible parameters. But still the model was performing the task that you would search the parameters over this massive space of search parameters. And so you had this really great performing model and that's, that was very satisfying. And then Alexnet came along, which is a convolutional neural network model. And I guess that was 2012 too. Yes, it was definitely 2012. And that was like a breakthrough in, on the AI side in the computer vision world. And you guys eventually tested your text, tested Alexnet with your neurophysiological data and found that that even performed better than the HMO model. And since then you've been using convolutional neural networks and tweaking them to model both human and non human ventral stream processing with great success. That was quick and dirty. So I apologize, but that takes us to just a couple years ago, I believe. And I'll ask about this past sort of era a little bit more later. But what did I miss that you would say that's important? [00:09:18] Speaker A: Well, I think you summarized some of the things that were most well known for in this space. Of course, there was sort of a couple decades of work before that on more basic neuroscience that I could tell you about those stories as well. But you hit some of the recent highlights, say between, you know, we've always been building models, but you know, they really, you know, it was really getting serious in the lab and the late 2000s and starting to run models on GPUs and first doing that. And those were the fun days of just trying to start to do the things which are now commonplace. But then, yeah, the breakthroughs happened in the early 2000s, 2010s or so, 2012, 13, 14, in exactly the way you described. And there's important things we need to talk about there, like deep learning versus deep models. Why HMO versus other models. And I think we could, I'd love to get into those distinctions with you because I think they're interesting and important and in some ways our work has been sometimes miscast in those terms. And I think we should kind of discuss that. [00:10:16] Speaker B: Let's do it now, if it seems like a natural time to do it. [00:10:19] Speaker A: Yeah, so I see. Probably you said you guys are known for deep learning and sometimes just to be podcast. I actually don't easily say, oh forget it, that's not what we do. But because there's a big wave of deep learning and it's of course nice to be associated with deep learning. But just to be clear, as you said, I would say we're more associated with just high performing deep models. So deep learning is just a way to update the parameters of an architecture to do well. It's a technique. Right. And that, and the part that's interesting to us is not so much the learning part, which we view as not really biological. [00:10:52] Speaker B: Sure. [00:10:53] Speaker A: People who like deep learning point to our work and say, oh, this is evidence that deep learning is running in the brain. And I often correct them and say, no, no, this is just evidence that a process of performing high performance in these kind of architectures for these kind of tasks is what's running in the brain. Whether evolution got there in a different way or postnatal development or evolution. It's not a statement about backprop, it's about performance optimization in both cases. So I view it as sort of convergent evolution, not necessarily a copy of, not the exact process that's running in the brain. And I think that's an open, important question, if that makes sense. [00:11:29] Speaker B: Well, how important do you think it is to distinguish between specifically backpropagation? I mean, there is this real push right now to sort of force backprop on the brain and figure out how it has to happen, how must it happen? And even if you're. Even if it's not happening in your models, it must happen. So we have to figure it out. I mean, is backpropagation sort of the key feature that stands out in your mind that separates modern deep learning from what you associate your work with? Perhaps. [00:12:01] Speaker A: So again, I would say our work. I would like it. I do not like our work being associated as this is thesis back prop. This is evidence of back prop. I would associate our workers, we have tried to help guide the field towards. These are the current best models of visual processing in the brain. How you get to those models, Backprop is one way to get to them, but there are other ways to get to them, and those may be the ways that the brain gets to them. But we converge, so to speak, on that. They're good approximations of what's actually going on in the neural hardware. So the distinction is more of the one on the building of the system, not the execution or the inference running of the system. Where yes, I agree, these models are the leading model. So depending if I'm talking to machine learning folks, I need to remind them of this distinction. If I'm talking to neuroscientists, I first have to convince them that these kind of models at all are the best models because they would prefer different styles of models. So those two communities kind of have different, you know, we're sort of, I'm always trying to be a moderate in most of these things, like how, you know, there's different issues depending where you come from on how you think of these things with respect to the brain. But it's. I don't even the folks that pursue backprop as a model of the brain, they say, oh, there must be something like this running. That's just because they think it's a very efficient learning algorithm, which is true, but it doesn't mean that the brain, you know, is optimal in all senses. And so it's, it doesn't have to be the case. And they're even relaxing now and talking about ideas of producing things that are backprop like without actually being strictly back prop. Right. And right. You know, if you think about it that way, all of science is back is then back prop. It's like you're optimizing against something. It's all optimization. Right. And evolution is back prop under those kind of. So sort of, you know, unless we're going to stick to details of you're going to take this derivative and compute this exact error gradient here, then I think it becomes very loose to say it's just optimization and everything technically is optimization. [00:13:59] Speaker B: Well, I mean these are semantic issues as well because let's say that there's a, you know, a something that is somewhat similar to back propagation like you were just describing, but that doesn't, you know, take the ace, the symmetric reverse gradients on the weights to improve the model. The deep learning crowd, and we don't need to get into, you know, the politics of it all, but they'll perhaps want to argue, ah, see, it is deep learning. The brain does deep learning. But, you know, so is backpropagation as it's done in modern deep learning? Does that. Is deep learning defined by that kind of backpropagation or does it encompass Any kind of backpropagation. And you know, eventually we're going to just start talking about optimization in general. [00:14:45] Speaker A: Right, exactly. So I think we're saying the same thing. I mean, I think if you, you know the way to succeed in, in careers like science, if you can sort of capture a field which is non falsifiable, then then yes, you will never be wrong. But you know, I push my own students on this. It's like we're not saying we have, you know, if you just say you have the brain is by some product of optimization. It's like we don't need to do any more work. We can just declare evolution and it's true and, and go home. But we want to know exactly which model is running, exactly how does it get to be there. And now you have to make stronger commitments if you're a scientist, to say my hypothesis is there's an error gradient computed, this is where it's computed, this is how it's computed. And those are, those are more testable versions of deep learning. But if you say yes, broadly it's all deep learning, then it's all evolution too. Right. So I think there's slippage on those semantics right now going on and that's maybe some politics, as you say. But also the science part would be let's carve these things up into alternative views and then see which one is true. And I think the question for the brain is still unclear. That's, I guess that's why we don't know which is running yet, whether it's the strict form or the less strict form. But probably some form of optimization is running. Of course, that's evolution again. [00:15:57] Speaker B: Yeah, exactly. [00:15:58] Speaker A: Yeah. [00:15:59] Speaker B: For most of us anyway. [00:16:00] Speaker A: Right. [00:16:02] Speaker B: Going back to you mentioned, you know, HMO specifically and I think you wanted to highlight some differences. I don't know if it was just backpropagation, but maybe some differences between hmo because there was the current convolutional neural networks were being developed sort of in parallel in a separate world, maybe with a little bit of crosstalk. But did you want to highlight some other differences? [00:16:22] Speaker A: Well, yeah, exactly. So HMO was a convolutional network, but its optimization was not on the weight parameters, it was just on the architectural parameters. So it was an early precursor of architectural search strategies. And so all the weights were essentially random. Right. So you know, that's why it didn't perform quite as well as a backprop optimized network. But it already got us sort of a long way into the explaining the brain data that we had, and that's why I kind of am careful to point out, all seems to be about so far it's about performance of the network, not how you get to the performance like that. And HMO was one way to get high performance and it wasn't quite as high performing. As a backdrop, we found a slightly higher performing one and it also matched the brain better. So to us the main signal is the one of the performance of the task, not the method by which you get to that performance. At some point, our data will hopefully reveal that those differences can be fleshed out with the data. But right now we don't know that. [00:17:23] Speaker B: It's interesting. You've made the point previously that since the really good performing model like Alexnet and that little era that performed well and accounted also pretty well for behavior for the neurophysiological data, you know, since then, of course, the AI world has been making deeper and deeper, bigger convolutional neural network, bigger models, and those don't account at all. The account for brain activity has really gone by the wayside in those, you know, latest models. Whereas you guys are continuing to focus on accounting for the neurophysiological data and the performance. [00:17:58] Speaker A: And it's not that they've gotten worse, they just haven't gotten much better than Alexnet. They've gotten a little bit better and they've gotten better on some of the behavioral measures. But this gets into the details of all the various measures. When we say it's better. It's many measures, the internals, many ways you can look at it. So they've gotten better on some things and not so much on other things. But you're right, it's sort of tapped out a little bit. The performance is now driving to computer vision and it's not necessarily leading to further gains on the matching to the. [00:18:26] Speaker B: Brain, which you wouldn't expect, because the goal is not to match brain. The goal is to perform the computer vision tasks. [00:18:34] Speaker A: The goal of the people building those models is to do a computer. The goal of our lab is to build models that are going to better match the brain. So you're right. Ideally like to do both, but yes, the goals may diverge a bit the last five years or so. [00:18:46] Speaker B: Yeah. So do you think that brings us up to speed here? You want to talk about some current things going on in your lab? [00:18:54] Speaker A: Sure, I'd love to tell you about things going on in that one. [00:18:57] Speaker B: Yeah. Speaking of the goal of, you know, matching brain data, one of the things one of the recent trends in your lab is adding recurrence to these networks that you've been building. So, you know, up to this point, up to a couple years ago, all of the networks were still feed forward. Which, you know, makes sense that, you know, on a feed forward sweep through the visual system, it can. Core object recognition, as we know, can happen very fast, you know, 200 milliseconds or so. And it takes time for a feed forward sweep to happen through the ventral visual stream. But you guys have been adding recurrence to your network. So, so why add recurrence? Which is a softball question, I suppose. And what, you know, what does recurrence do in your hands? [00:19:41] Speaker A: Well, I think, you know, the why ask the recurrence is. There's many ways to answer that. Some people would say you should just have recurrence because the brain has recurrence. And I'd say, well, that's of course an argument. But the brain also has, you know, ion channels and glial cells and all kinds of other things that are not yet in the model. So we want to be, want to choose the things that are most likely to produce gains on the functional measures, as we call them, the behavior or the spike. So this is. Recurrence is an architectural change that is of course present in the brain, but as you point out, not present in most of those initial models and likely to lead to functionality changes in the response of the network. One example is just simple one of, like the neurons in the brain have time dynamics. They don't just you show an image and they step up to a new value. They have some dynamics to them. So you need some form of recurrence or time dependence to make that interesting or to even compare to the brain. So again, that's another argument that the functional data is something that can't be explained without a more complicated model. But, but I think those are reasons that I think are sort of more obvious. I think the more, the more interesting thing to us is that we were noticing that the networks were getting deeper and deeper. They're getting more and more feed forward. And of course you can take any, you can take a recurrent network and unroll it into a very deep feed forward network and adding a bunch of skip connections. You just redraw what's a neuron. The neurons don't have one to one mapping with the brain anymore. But you could imagine taking the brain's recurrent network and redrawing it as another feed forward with a bunch of skip connections network. The point is at an algorithmic level, these are almost like they're equivalent. Right. So what's. But when you're doing neuroscience, you're actually at the hardware level. Those aren't equivalent. You have defined mappings of, you know, in the feed forward version, something that's one neuron is actually now multiple neurons in that unrolled system. [00:21:31] Speaker B: Right? Multiple copies. Yeah. [00:21:33] Speaker A: Time is now rolled out into space. Right. So you have like multiple copies of that sort of neuron or variant points of that neuron in time. Right. So this is showing that you might end up. If you don't follow the. If you don't put recurrence, you might start diverging again. If you want to model the actual brain, you could still get good algorithmic performance with those deep networks, but you won't get good matches to the brain if the brain is in a more recurring mode. And this leads us to the question of why. You might start asking, why do we want to even match the brain? This comes down to two different goals. If we just want performance, maybe it's, you know, deep4networks are fine, but if you want to match the brain, then you need to. You can't just keep going down that road forever. And again, we noticed that the networks were getting longer and longer, deeper and deeper. And we noticed that the later time points of the neural responses, and we had a paper on this a few years ago, the later responses at it, the dynamic, the later responses tended to match the deeper layers of those networks, which was kind of a signal that, yeah, this looks like an unrolled version of what we're seeing in the brain again. So these are all the. Those were sort of some sort of, I guess, I'd say data driven, but also I guess goal driven. To figure out the brain questions of why, we went to recurrent models. [00:22:48] Speaker B: So like you said, there's a host of biological more realistic details that you could add in. I mean, recurrence is in some sense the most obvious to do. I mean, you wouldn't want to just add metabolism into the units, for instance. That's also just a more difficult project. So recurrence is also more tractable, is it not? [00:23:08] Speaker A: To add in more tractable than say, ion channels or. [00:23:13] Speaker B: Sure, yeah, make them all. Hushley. Right. [00:23:16] Speaker A: Well, I think you're saying it the way I think about it too. You start with the things that seem like what's going to. It's engineering approximation. So the first thing we did was build a feed forward model. We wouldn't start with a feedback model. How could that make sense? I can't process an image. So you start with a dumb model first you push it till it doesn't work and then you add. The next thing that's going to add, you think is going to add value. And I don't know if recurrence is the next best thing. It seemed like, as you said, the most obvious next thing. And then there's various forms of recurrence within area, across area. And then that becomes more subtle and interesting. And then we'd say, well, at some point maybe we got to add spikes in. There's many things that are missing, but what order, which orders do we have Intuition to choose to add them in. And there I think I've just admit that's just art and guessing. Nobody really knows the answer. They have their own preferences, their own beliefs, and we have our own guesses too. And it's really a matter of kind of what's doable at the moment in time and what data are you going to have to help guide you is the other thing you have to think about as you try to improve the models? Improve. When I say improve, I mean to make them more brain like. More brain like that's something we should return to. It's like there's different senses of improve that are our fields between ML, AI and neuroscience are not always doing the same thing. [00:24:31] Speaker B: For the purposes of our conversation, let's just always assume that that's what we mean by improve is more brain like. But let me. So now I just want to interrupt because I want to make sure I ask you this question. I mean, you mentioned adding spikes in eventually or something. I mean, really, do we really need to add spikes? Is that something that's going to make a difference? Is there something special about action potentials? [00:24:49] Speaker A: Look, I think that's pretty low on my list at the moment. But again, I'm going to say that's a guess. And my guess is going to have a lot to do with energetics, and I'll just take that too. It could also the noise properties of spikes may end up making the system more naturally robust in some way. So they're. There are other reasons to think that spikes are important, but it is a fine line, as you say. Well, we could model it down to the exact copy and that feels, you know, science is about finding models that explain a lot but have the minimal needs in them to explain the maximal amount of stuff. Right. So, so but that's an art. It's not really a science yet. I mean, I don't Think it ever is. You're just guessing at the next model. [00:25:31] Speaker B: Well, you made a good guess with the recurrence. And I mean, maybe one of the things that you found, you were just talking about how the feedforward networks were getting deeper and deeper. And what you have found, I can just summarize quickly, is that there are images that take longer to identify objects, images of objects that take longer to identify. And when you add recurrence to the network, what you find is that that later processing around the time that it takes to identify these objects with the recurrence actually matches the brain data better. And that's sort of the main finding right from the recurrent addition. [00:26:12] Speaker A: That's right. But then there's details of how the recurrence is implemented and how much the brain. There's still lots of ways to implement that, as I said earlier. And the data only, they're only explaining some fraction of the data. So there's still sort of residual variance in the data, as we call it, that's not captured. Which means basically we don't have the right model yet. We have models that are better again in the brain matching sense, but they're not perfect. And I want to point out something you said about core recognition. Remember we said this is 200 milliseconds, said, oh, it's this motivated. It's in a glimpse. But what's interesting now is Even in that 200 milliseconds, everything you just said about some images take longer. That's all still happening within that 200 milliseconds. So we're talking about recurrence. That is not the kind where you overtly think, oh, I now see something, then I think about it, then I move my eyes or my head long timescale seconds, timescale. This is all sort of subconscious sort of sub, you know, sub mil, sub second recurrence structure that we're talking about here. So even in core recognition, which we thought was a simpler problem, there's complexities around time that sort of make it even still a very rich problem in that regard. You don't have to go to more than even 200 milliseconds to even engage on these interesting questions. [00:27:21] Speaker B: So that suggests a local recurrence or at least a few synapse sort of recurrent mechanism happening. But I mean, we can just jump to the next thing I was going to ask you about because so you actually did, stepping away from the models, you inactivated the prefrontal cortex, part of the prefrontal cortex, and found that that is Actually very important for object recognition. So IT and prefrontal cortex connected, but maybe that in a more of a top down feedback sort of recurrence than a local lateral sort of recurrence. I don't know, maybe you can just address because I just said, oh, that makes. It would suggest that it should have local recurrence to process it so quickly. But maybe I'm wrong. [00:28:04] Speaker A: I would say the answer is probably all of the above. It probably has some. We think there's some local. If you think a lot of our recordings are in the IT cortex and we haven't talked much for the listeners, is V1, V2, V4 it. So it is sort of the highest visual area and then it connects up to prefrontal cortex, as you mentioned. So if we record at the level of it, the temporal dynamics that we see in it are probably due to local recurrence within it. They're probably due to top down recurrence from prefrontal. In the recent paper we had, as you mentioned, top down recurrence from other areas that we haven't talked about, perirhinal cortex, periippocampal cortex, amygdala, those are other areas that would be affected. And also there's time for recurrence of it to V4, to V2, back up to V4, to it, sort of sort of cross area recurrence within the ventral stream itself. And our best guess is it's probably all of the above. It's not some simple thing. Right, so we've so far we've attacked prefrontal because it was easier to make an experiment there than to get at some of those other things. But we're interested in all. Let's call all three of those types of recurrence and their relative roles and shaping up those IT representations to be even better than the ones you get out of the forward models alone. [00:29:16] Speaker B: Well, there's IT representations for core object recognition, but then with the connections with all the other areas, but then you start getting into finer behavioral output, you know, amygdala, maybe put some masking on, be afraid of this object or you know, something like that. And then, you know, hippocampal memory, you know, some sort of abstract notion of the object and prefrontal cortex controlling task selection, et cetera. We don't have to go through all of them. Sorry. But then are we still talking about core object recognition or are we talking about something more cognitive at that point that we're going to have to tease out? [00:29:52] Speaker A: Yeah, that's back to the object recognition 1.0 that we said at the beginning, it's like core recognition was just an operationalization of way to get started. But it's not, you know, the only way that operational. You tend to move your eyes every couple hundred milliseconds. So that causes sort of a hard, you could say that's a hard stop on. The system has moved. You know, there's overt feedback. Usually when you're viewing the world, your eyes are actually moving. Other parts of your brain are redirecting your eyes, which you could call a form of recurrence. It's a, you know, it's a feedback on your whole, your sensor system. It's just executed by moving the sensor array rather than neural processing. But, you know, but I guess what I'm trying to say is everything that we call visual intelligence or cognition is it becomes a gray area between, well, do I have cognition in 200 milliseconds, yes or no? I think yes, partly. I mean, depending what that word means. [00:30:39] Speaker B: What does it mean? [00:30:40] Speaker A: Right. For me, it's like if you say I can recognize your face or something else about it within 200 milliseconds, if that counts as, you know, if that counts as cognition, then you certainly have cognition in that time scale. Do you have all of cognition? Well, of course not. So it's, it's, you know, it's, it's not a yes or no question. These are all shades of grays. And the models get bigger and bigger and more time extended and more time extended. And this is back to, you know, 1.0, 2.0, and we just keep extending that way and we capture more and more of visual intelligence is the hope both bigger spatial regions, longer time extensions, models that now are bigger and have more recurrent areas that, I mean, that's how it's going to have to unfold. It's not a. Here's cognition. Perception stops and cognition begins. It's not going to be that clean. That might be textbooks, but that's not how the brain looks. [00:31:26] Speaker B: Yeah, I mean, just to linger on, this idea of, you know, so this idea of adding more biological detail and incrementally improving the models, I mean, is it just going to be like that from here on out? Or, you know, I guess the question is, will it disappoint you or excite you? If eventually we do have to get down to the real nitty gritty and add so many biological details, is that going to be disappointing? Should there be some better explanation that we can where we don't have to add in the biological details to really understand the system. [00:32:00] Speaker A: Yeah, I think that's really, it's a good starting question. It leads to a very long discussion of what does it mean to understand the system? [00:32:08] Speaker B: And I think we'll get to that later. [00:32:09] Speaker A: That would be something to discuss. And you're asking would it disappoint? And I think these are related questions. Depends what the goals are. And I, I wouldn't, I don't, I don't see it as a disappoint. I don't think it would be disappointing if models get more complicated. And in fact, I think they're going to have to get more complicated. How much detail they have to get to explain which phenomena is unclear, but that's okay. Such as life with complex systems. I wouldn't disappoint me. I don't have dreams of. There's some, you know, great, you know, simple, you know, differential equation that the whole brain is going to follow. And that's the answ like Maxwell's equations. I've heard some physicists tell me that's what our answers of the brain should look like. And I find that almost, you know, I can't imagine how that could ever someone could think that's how you should think about a system like the brain. It might be. If it was true, we'd love to know it. But if we all sort of said we get into neuroscience because that's our hope, then I think a lot of us are going to be disappointed. My bet would be against that. But I know we can make progress with models that are more complicated, not just because we want them to be complicated, because they need to be more complicated, because the brain is complicated. [00:33:14] Speaker B: I remember going to an SFN keynote address many years ago and it was. I cannot remember who it was, he was Scottish. But he was essentially saying we should give up and let the physicists come in and do the job. So you're saying not so much, huh? [00:33:30] Speaker A: I would say let the engineers come in and do the job, but not the physicists. And I think some of that is happening in vision and they're doing the job on computer vision. In this sense, when they're working with those models, they're doing a big part of our job for us. They're building hypotheses. And that was part of what we showed, is that some of the hypotheses that they buil built are actually not bad models of what's going on inside the brain. [00:33:49] Speaker B: Yeah. Okay, so let's move on here. So an Important part of an important property of IT neurons is that they respond to particular objects, even when those objects are presented in slightly different ways and, you know, different aspects, different shades of color, different situations. And, you know, speaking of. So back propagation, you know, it was originally thought from, you know, back in the days of Hebb, that the brain is doing all unsupervised learning, essentially. So the Hebbian neurons that fire together, wire together is an unsupervised learning principle. And you guys thought that there could be a role for unsupervised learning mechanisms in it, whereas everything else up to this point had been supervised learning, I believe. I mean, you know, since a certain point. So maybe you could describe that work real quickly before then. I do want to talk about understanding and some of that work. [00:34:44] Speaker A: Right. You're actually describing the part I mentioned. There was about a decade of work before the things in the deep networks that you referred to. And a big part of what I did in the 2000s was we worked on kind of, can we see evidence of unsupervised learning in the visual system? Because one of my favorite ideas is still, one of my favorite ideas is a system wires itself up with a combination of architecture and some clever unsupervised learning strategies, right? So a lot of people still believe that idea, even in ML. That's some kind of. One of the Holy Grail questions is to not have a fully supervised system, have a more unsupervised system. And Jan Lecun and others have sort of made that point very eloquently, and I think so. That's. That was a problem that drove me as well. And the work you described was work where we said, well, one way you could learn about objects and their stability, you know, to say, as have neurons that, say, respond to a, you know, a face, even if it's shifted left or right, or a dog if it's to the left or the right. So that's called translation invariance, or it's distance from you, so it's big or small, is that you tend to see those kind of sequences naturally during time. If you. During, you know, if you just think about, you know, as you move your eyes around the world, objects are translating on your retina. And if your brain makes an assumption that, well, look, things that are nearby in time are probably, you know, derived likely from the same things, that objects don't jump in and out of existence, that you could come up with a heuristic, which is, why don't you do some associative. Type learning related to sort of the HEB ideas. You mentioned others. Again, these are not our original ideas. I'm sort of summarizing them that you would. You could build neurons that would naturally build up invariances if they just leverage things like time stability. And this is. There have been models of that. Kara Synowski's group and Larry Wiscott and others have worked on them. They're under the framework of slowness models. There's more recent versions of this in the ML community, but that was the idea we pursued and we just recorded it neurons and said if this is true, we should be able to mess around with the visual statistics over time and cause the neurons to change their response properties. So IT neurons that tend to like a dog over different positions. If we keep playing it where the dog becomes a, you know, a sailboat when the, when the object translates over the eyes, either by eye movements or passive movement, that that should cause the neuron to start to think. A dog and a sailboat in these two different positions are essentially the same thing. Respond similarly to them. That's the basic idea. And we had a series of papers that showed that yes, when you give the brain those kind of statistical exposures, it starts to bend the IT responses slowly but significantly in that exact direction, as if it's starting to absorb those statistics. If you keep hammering the system with those kind of statistics, even adult brains will start to reshape the IT response properties as if that learning rule using time unsupervised, no instruction is actually running. That was actually quite exciting to us because I didn't think that was going to work at all. It seemed like. But we had a nice couple of papers out of it and it worked. And then I said, and then you say, well, that work still stands there waiting for a model to just absorb that. And that's what we're excited about. Now as these unsupervised models come, maybe we can start to connect them to those older results which were non model based. They were just sort of intuition, phenomenology measured in it in the loose way I described. Without a model to start to predict them quantitatively yet I think that's sort of an open frontier for the next five to 10 years. That's going to be exciting, I think. So I hope those two lines of work will come together. A deep network that learns in an unsupervised way and explains not just the response data that you referred to of lots of neurons firing response to images, but also the changes as you manipulate the statistics of the environment that we and others have measured along the visual system. Sort of a model that can capture both of those kinds of phenomena would be a model that starts to be more in the direction of. It learns more like the brain is learning, because there's no supervision or very little supervision there. [00:38:37] Speaker B: I mean, eventually all these pieces are just going to have to come together and it's going to be a massive system. Right? That's the way it's going to look. [00:38:44] Speaker A: It depends. Yeah, it'll be massive. Well, I don't know what you mean. You say massive. We'd like a model that explains everything rather than one model that explains one part and another. And that's a little bit back to the, you know, do you want simple models or Maxwell's Equations, or do we want a big model that can explain everything? And I agree with you, though. You'd like something that is a little bit bigger, that can explain more things than we currently have. [00:39:07] Speaker B: Okay, so finally, now you're driving neurons wild at will by having your models create images essentially that look extremely unnatural. But you show this extremely unnatural image to a subject and you can control not just a single neuron, you can make a single neuron fire higher, more feverishly than it has to any other thing you've shown it. And you can also tweak whole populations of neurons. So I know you're excited about this because it affords powerful opportunities. So maybe you can describe that a little bit further. And then I want to ask you about understanding. [00:39:51] Speaker A: Yeah, that's great. You set this up well, Paul, because you described exactly the. The results that we've had recently. And there's a sort of fun story about this. It's like I was trained to just say, we're going to build models of the system. That's what our understanding is going to look like. We have engineered models of the system. And so we were kind of doing that a little bit with our heads down for a while, like, let's build model that can explain the responses better and better. And then we started to succeed, as you mentioned, and then it sort of struck us like, wait a minute, why are we doing this again? Why are we building these models? They're complicated. And what's our answer to why they're complicated? We're explaining the data, so we should be happy. And then at some point I realized the reason you build the model is, of course, so that you can use it to do stuff. It's not just to have a model. You want to be Able to predict stuff, but then you want to. The next move beyond prediction is control. Your control knobs are afforded by the axes of the model. In this case, the models take images, so they start with pixels and they're predicting neural responses all on the ventral stream. So their setup is you can control pixels, you can move the pixels around to control the neural responses. Right. So this is kind of a, just a general statement of science. The models that you build are basically ways to link various things that you might be able to control or measure. In this case, control, because we can control light on your eyes to something else. A dependent variable. In this case neural responses. Either single neurons, as you mentioned, drive a neuron wild, or a bunch of neurons set the population one neuron on and all the neurons other off or every other neuron. Right. So this is really just a test of if you really have a good model, you should be able to drive neurons wild or turn them off in these different ways. So we said, let's go ahead and give it a shot. And at least in area before this mid level visual area, it worked really well. Much better than any we were able to show against standard measures of, hey, we already had neuroscientists running around trying to do controls. They would dream up stimuli and say, we think these are this and they would use them. That was our reference. If you use those stimuli, could we beat those? And now, so now we're able to drive the neurons higher or more accurately in terms of populations than those than that previous work and we showed that in the paper. But the control is also not perfect because the model is not perfect. That's a test of the model, but also it's an application tool because, hey, one of the goals of brain science is to do things like apply images and maybe be able to tune your brain into a way that's helpful to improve say your health or even your mood. And it leads to those kind of uses of models, right? Not just models for model sake and so called understanding, but models for actual use in some way. [00:42:25] Speaker B: For use. Yeah, imagine that. So you envision, you know, down the line, like you said, they're not perfect right now, but you envision this in a, using this in a therapeutic fashion, potentially. [00:42:37] Speaker A: That's how I think about it, right? We've grown up so long with brain science. It's like the way you're going to fix a brain disorder is, oh, you got to give a drug, of course you give a drug. That's how we treat everything. But there's many brain Disorders that we don't. There's one big one that we don't fix without drugs, and it's. My colleague Josh Tenenbaum here at MIT has reminded me of this. It's like, it's, you know, a lot of vision problems. I don't give you a drug. I just put these kind of funny lenses in front of your eyes and, oh, suddenly you're improved in your behavior. Right. Well, that's just optics on the front of the eyes. But that's an example of a, you know, something about understanding the eyeball that, you know, then turns into a device on the front without actually having to go inside the head. Right, right. So, so there may be ways and there's other areas, other people are thinking about this. In digital medicine more broadly, the more you understand things, you may be able to inject energy properly into the system without getting so inside the head with a molecule or a beam of light or whatever you want to stick in someone's skull, you might be able to make progress without doing that. And so that, that excites me from a vision side because that, that would offer new opportunities that just usually aren't thought of that way. [00:43:38] Speaker B: I mean, even, you know, so obviously you take a drug and it just bathes your entire brain in it. You know, it doesn't really target specific a neuron, for instance, you know, but even like Parkinsonian patients, patients who are getting deep brain stimulation, that's still, that stimulation is just massive and it's just driving like the whole population. And so, yeah, the more subtle these sorts of abilities to perturb in a very specific way is the more subtle and specific that gets. That is exciting. So, all right, let's talk about understanding. So in the deep learning world, there's this growing divide between prediction. You have these models that predict are super accurate with predicting things. And I'm going to use air quotes with understanding now, because you often use quotes when you talk about understanding. And if that's happening in deep learning, then it's scary to think how far that divide could go in, in brains and neuroscience. So there are a few different ways of talking about understanding. And some say, you know, to understand these more recent deep learning models, we can't really understand. Again, I just assume I'm using quotes around understanding. We can't really understand them at the unit level or, you know, population level, the internal workings. So what we should do is try to understand them at the level of the things that we control, like the algorithms, the architectures, you know, the objective functions and that is kind of provides a first pass, rudimentary understanding of these models. And then in the philosophy of science, there's really been an explosion recently in people writing books. You know, there's been a handful the past few years trying to figure out what understanding is. And those are all driven by the conception of understanding as something that, you know, discovers some simpler principle than the thing that you're trying to understand. Discovering some simpler principle, not Maxwell's equations, for instance, but some simpler principle that can account for the host of more complex things going on that you're trying to understand. But you have this, you've kind of pushed this new operationalized definition of understanding and what it means in the modeling world. So what view is that of understanding and what is the. Because I know you use it in quotes. What's the blowback that you're getting from using it like this? [00:46:03] Speaker A: Yeah, this is a fun discussion. We have it all the time in the lab. So I often say, you know, scientists, I always train, we all use this. We're humans, we use the word understanding. And it's kind of a cover word for not saying what your goal is. Is another way to put it. Like if you, soon as someone tells you they want to understand something, they say, well, what do you mean by that? And they might, first they might say, well, I haven't thought about. And then eventually they might, you know, you're going to get them down to something like it's either going to be a form of prediction or control. Like it all if they're, it's going to boil down to prediction or control. And that's, I guess, the main thing. I would say they would, they usually wouldn't phrase it that way, but that's what they would seek. And they would want to seek that under some simplicity, you know, constraint, as you mentioned. It's like you'd ideally want a lot of prediction with the simplest model. Now, simplicity is hard to define, but that's the, that's sort of a good model, is one that's as simple as possible and predicts a lot and not simpler and not simple because you train it simpler, it'll predict less. Right? You make it less simple to predict more. So there's just a trade off between those two things. And that's. We could all choose where we want to be on that trade off. And sometimes I think scientists want to be on the simplicity end and engineers want to be on the other end. And I'd like to think the middle is where you really want to be. I mean, you know, if you want to fix brain disorders, I'm sure, you know, people that would, that are suffering from that would rather you went by a model that was a very accurate model of what's going on when they have a brain disorder, regardless of how complicated it was, and not have to say, well, no, but I needed the beautiful model because it just made me aesthetically proud or pleasing as a piece of art. That's not going to help the person with the brain disorder. They don't care if it's complicated or not. That's a sort of scientific aesthetic that is grounded in the idea that it will tend to generalize. That's an Occam's Razor argument. If you make it simpler, it will tend to generalize in some way. And so I'm going a little bit off your question, but I think that's sort of, that's how we were all trained, because physics found that, well, you made these beautiful models and they tended to generalize in other ways. And there may be aspects of that for brain science. Imagine what does generalization mean to, you know, you know, another species on another planet we haven't yet encountered. Yes, that might be true, but maybe there are things that will generalize and notions of intelligence and so forth. But right now we have an operational problem of like, we've got this thing that lives between our ears and the Homo sapiens that we're trying to build good models of. And it's sort of an earlier stage problem that just requires, you know, principles, is a little too lofty, is, I guess, another way that I, that I would go. So I think, again, the notion of understanding is a tricky one, but to me, it's all about building models that support prediction and control and trade off between simplicity and prediction and control. But in this phrasing, you know, it comes down to them, even when I say that you should push to me and say, well, Jim, what are you trying to predict? What are you trying to control? You know, that becomes the. Again, you know, because I'm being vague about that, right? I said earlier, oh, I can control light on your eyeballs, but maybe I can control your genes. You know, if I'm a molecular biologist, I have good control over genetic tools. So those are my control knobs. So. And what do they want to predict? Well, maybe I want to predict your behavior. I want to explain your behavior because, you know, we're the organisms that are paying the bills as taxpayers, so we're the ones that we want to help. But maybe I want to predict the responses of neurons. Well, again, the taxpayers don't care about the neurons. They care about if they actually, you know, are they, are they feeling better? Right, so those are, those are again the operational definitions. But to me, but you have to. You choose those and then that different scientists will choose different forms of those. And then everybody's science could be boiled down to a choice of what are you controlling and what are you trying to predict. [00:49:44] Speaker B: I mean, yeah, so you see it sort of a two pronged possible set of definitions of understanding. One is very specific to if you understand something, you can predict it. And the other is if you understand something, you can control it. So you don't like the idea of understanding as being, I don't know, you know, to say ontologically is heavy, but as being something separate from prediction and of course separate from control, because that's the sort of monkey elephant in the room. So an in between prediction and understanding is explanation, which kind of goes back and forth as well. But the way you conceive of understanding then is this, if you understand something, you can either predict it or control it. Do you get blowback for that? [00:50:30] Speaker A: I do. Right. And so, and I would say first of all, prediction and control to me are two sides of the same coin. You could do it, you know, if you can predict. Well, the difference is whether you actually have control knobs. Right. So, you know, sure, you know, astro. Astrophysicists might have a hard time controlling certain things that they're measuring out in the universe, but. But they can still predict them. So I wouldn't call that not a science. But where we on this planet, we have a lot of control knobs we might be able to use for the brain. So we have both prediction and control, but they tend to go together. If you can do good prediction, you generally should be able to do good control. But that's not. So let's couple those together. And I think the thing that you raised was like, well, what about explanation or something more vague that you called understanding. And you said something a minute ago that I lost the thread, but maybe you could say it again, Paul, because I think you tweaked my interest there. [00:51:17] Speaker B: A separate line of the way to understand something. Well, in the philosophy of science. Oh, I probably said ontological. So in the philosophy of science it's almost agreed upon. I would say that there is again, I kind of cringe to say ontological, but there is a difference fundamentally between what prediction is and prediction is just quantifying the output. Right. And you can Say it's accurate or inaccurate, you know, very quantified and understanding and a lot of work, at least in the philosophy of science these days is aimed at getting a better grasp on what we actually even mean when we say understanding. Right. Because there's these notions from historical science and philosophy about explanation itself. And now it's sort of blooming into more of an understanding which is. Brings it down more to the. Our cognitive capacity to conceive of something and think about how that, let's say a model be able to in our heads, sort of simulate. If I did this to the model, this is what would happen to the output. Something like that is more akin to the philosophy of science sense of understanding as I understand it these days. So what's your, you know, response? [00:52:27] Speaker A: Right, so now I sort of. Yeah. So you think if you. That version of this understanding and this boiled down version, imagine whatever that is, let's suppose you had it and it was declared ontologically beautiful or whatever version word you want to use and it didn't predict anything, would you call that understanding? I don't think you could. Right. So if you, you could have something that you thought was beautiful and easily digestible by humans and all the things that you were describing, yet it failed all prediction tests, I think that's a non starter. You would say that can't count as understanding. Now we're on the path. Now we're going to say, well, how much can I predict then it's just the curve at one axis. You're predicting everything perfectly well and you have a more complicated, harder to intuit model. The other end. You're going to have a lower dimensional, simpler to explain human model that may predict nothing at all. What you want is to be in the middle of those, which is the most intuitive model you could have that explains the most things. The other thing we're certain circling around here is the notion of, you know, why is the goal to explain it to other humans? [00:53:27] Speaker B: Is that really is that I want to know? [00:53:30] Speaker A: Well, but I think that's sort of the thing we should ask ourselves. I mean, I always pull out my iPhone. It's like, do you understand your iPhone? You know, you feel like you do, but you don't really. And actually no human on the planet actually does, I think because it takes teams and teams of humans. And that's sort of how I think models of the brain are going to look is together, we think as a species, we understand it, individuals understand it well enough to use it, some individuals understand it well enough to improve parts of it. But no human individual understands the whole thing. And this is a device we built. It's not as probably as complicated as the brain. And so. So the form of our understanding is going to be a little more like that. And what's wrong with that? We can cure brain disorders with. If we had an iPhone version of the brain right there, we would just. We just have to get used to the idea that the complexity is managed by many people together, not by each of us feeling satisfied on our own. I think that's the maybe tension that we're sort of describing here. And many people got into science because they thought they themselves are going to have a good intuition. And I. I don't know if we have, you know, maybe that's the part that feels as hard for me to sell this vision, because I like it too. I'm a human being. I like it to feel intuitive too, but I don't. You know, if you think about your iPhone, there's many parts of it they're not. You say, I don't care. Somebody knows those details. Well, okay, the brain, Somebody has to know those details too. And it is. We don't have to have intuition for every component of everything. We just, you know, the way to judge it is whether the darn thing works when you pull it out of your pocket or whether it actually controls and predicts. Is it in the language that I'm using for the brain or helps you cure brain disorders? [00:54:59] Speaker B: I'm going to give one more concrete example of this because I just had Masrita Chiramuta on who has written about this, and she actually has written about your work, which she admires, but her conception of understanding. So, you know, she compares not your models, but specifically because it gets a little complicated, because your models map onto the brain areas and are trying to account for brain areas. But let's say neural network models writ large these days versus earlier canonical models of things like simple V1 cells or normalization as a canonical computation. So her argument would be, is, and I can speak for her, I believe that the canonical models, even though they don't predict as well, so they predict pretty well in certain circumstances, but then in natural circumstances they don't. So they, you know, their prediction is not perfect, but because of the way that they are conceived and they are this canonical sort of computation, they actually provide greater understanding. And this is just a different definition of understanding. Maybe in that conception they provide greater understanding while predicting worse than, let's say, a neural network model that predicts beautifully because it can perform the task and it can, but it doesn't provide understanding in, you know, it doesn't spit out the mathematical function that it's using to predict those things. That mathematical function is really embodied and buried in, you know, distributed units. And so it doesn't provide the same level of understanding. So that's the concrete example. But I think that's orthogonal to your control prediction sense of understanding. [00:56:44] Speaker A: Yeah, I don't, again, control prediction. Let's just call that one direction. That's just better prediction for now. So keep those together as one. And I don't think it's entirely orthogonal. So again, I think you're just. If I was drawing with my fingers around one axis is simplicity or what the human thinks is beautiful. And that might include closed form math equations if you want, depending on your. That's, there's an aesthetic there of what, what we mean by beautiful or understandable. And the other axis is prediction, slash, control. And there's. You can imagine a curve of points, right? And you'd say at one end you're, you're, you've got this really complicated, ugly thing that predicts everything. It's almost like the data itself. Right. And the other extreme I mentioned is that you got this beautiful, you know, statement to the world, like a poem, but it predicts nothing. Right. And of course you want to be up in the corner where you're doing both typically. And I think we all, we all want to be there. But imagine that there's a world. Imagine the world. The function doesn't look like that. It looks like a, you know, a kind of exponential fall. Now you got a hard choice. Do you want to actually be in the beauty world? And it's like, we're nice, it's good for museums and talking to other humans and making them feel and having nice math equations. Or do you want to be in the world of actually predicting and controlling? And I hope you won't have to choose. But I think that's why there's so much pushback, because people are maybe afraid that we're getting forced into that world and that's not the world that they hoped for. And imagine the smart Einstein might come around to our field and say, actually if we rethink what we mean with simplicity, the curve will suddenly open up and it'll become not an exponential, but more of a square again. And then we'll get up into the corner and you just have to think about it in a different way and everybody will be happy. [00:58:21] Speaker B: We need that. We need the tensor geometry with our sense of these axes. [00:58:26] Speaker A: Right. So again, I think that could happen and I'm not opposed to that. We're just busy trying to build models that are at least doing well on the prediction and control in a fair sense, not just capturing the data, but real prediction and control, knowing that that minimally will be fuel for those that like to try to make, let's call them, simplify them and make them more beautiful and find the underlying principles, if they exist. So I view our work as sort of a step towards that larger goal, which would mean that we could make progress along the way in things like brain disorders and other things that we mentioned earlier, even if we haven't yet fully made them simple and beautiful and give material for people to chew on with the hopes that they may eventually find simpler ways of describing those models. Right. So these are not at odds. It's really a question of which thing is the field ready for first? Like should it keep thinking about beauty and principles or should it work on models that are capturing the data? And I guess the thing I would advocate for is right now we need to have models that predict stuff at all and do that first. Not, you know, not. Let's not let the enemy, you know. Well, you know what I'm trying to say Perfect be the enemy of the good. We don't need to jump. We don't need to jump right to the perfect model that's beautiful in every way and predicts everything. Let's just try to get some prediction models going and then we can simplify them later. That's sort of a choice. [00:59:44] Speaker B: That's spoken like a successful engineer modeling the ventral visual stream. [00:59:49] Speaker A: Well, yes, that's right. So I'm an, I'm a sci. I'm an, I'm in a science department, but I'm actually an engineer at heart. Right. So I. That's right. So maybe I'm not. I sometimes tell my students, maybe I'm not really a scientist in this sense. And that makes me a little sad because if science is only going to be if you're beauty, then, okay, then we'll seed the whole field of neuroscience to engineering is, you know, if we're, you know, that's what I fear the next decade will become. And I don't think neuroscience scientists should see the field of neuroscience to engineers. Even though I'm an engineer, I'm also a scientist and I think scientists should just be willing to embrace more complicated models and have their. Not be so afraid. I mean, that this doesn't mean that we're giving up on those ideas. It just means this is the next step. It's not a dead end. But I do resonate with that. Again, you point out, our work is a little hard to categorize because some people use deep networks as just, oh, it's just a big hammer and you just predict stuff, and it's just a black box of a bunch of stuff. And then. Then I'm with you and I'm, you know, with. Anybody would say that's not understanding because then. Then it's just a big complicated system as a. Just a prediction tool. And when I say prediction. And maybe this is very important to point. To get across. The deep neural network of the brain has many levels. And our measurements, every level of that model better have a mapping within the brain. [01:01:00] Speaker B: Yeah. [01:01:01] Speaker A: Which can be tested. There's no black boxes allowed in our framing of these models. And I think maybe that's where we have agreement with, you know, that's where. And I think again, this is back to the field is often used networks to do things like prediction and control without worrying that they're, you know, mapping all the components to this system like the brain. And that's where I think our work is not. Some people get that from our work, but we're sort of cast into the lot of. We're all doing deep networks, so we must be doing that crazy thing where we build a big black box and then just do prediction. And that's not actually what's going on. We view those as neurally mechanistic approximations of what might be going on at different parts of the brain. And our job is to check how true they are and then improve them. But the models are still complicated. They're. They're big, deep networks. The brain is a big network. How can that not be true? So I hope that makes some sense, that there's a middle ground between these. These. There is an extreme. But I don't want to be put into either camp. I'm not in the simple models, you know, physicist camp. And I'm. I'm also not in the, you know, let's just build deep networks and hammer on every data set and put a big black box and go home. Right. There's a. There's a middle ground, which is the interesting space. I think that's really exciting for our field. Not just for vision, but, you know, from Ramoni Cajal, we said, oh, it's a neural network. So our job is to figure out if we're going to be neuroscientists it's a big neural network. So our job is to search through which neural network is actually running in the head, find the right one and model that like that. That was what we were trained to do. Otherwise we shouldn't call it neuroscience. We could call it cognitive science. We could remove the neurons. We could. And I'm not saying that in a joking. Neuroscientists need models with neurons in them. They're connected, AKA they work with neural networks. So what, what is to this, what is to dislike about working with big complicated neural networks? That's the whole brain. I mean, it's, it's the base hypothesis of the field. And I, I don't know how a neuroscientist can sort of say, well, these, I don't like deep neural networks. What do you mean you don't like the brain? Then this is the base hypothesis class. The trouble is they're mapping the word, you know, artificial neural network into today's current feed forward only simple rectification functions. And of course that's not right. It's an approximation. But they can't dismiss the whole class because that is obviously right. And they take it, they learned it on day one of neuroscience, that it's right. So I think it's sort of, you know, navigating those two worlds of, you know, those views on an artificial neural network that we, our field has to kind of come come to grips with. [01:03:25] Speaker B: That's good. I'm glad I scratched that itch for you. So, optimistically, eventually the full story about, let's say the ventral visual stream will be told. Where are we in that story? [01:03:38] Speaker A: Yeah, well, I think I'd like to say we're, you know, in terms, there's two parts to this question. So, you know, where are we in explaining the ventral visual stream or even just for core recognition, which is only again, one part of visual intelligence? Yep, you sort of said it earlier. We have pretty decent. Let's just. There's two main parts of the way I think about. One is what is the adult inference engine doing? What is a description of the adult ventral stream as it is, and say you or I, and that, you know, we might be a quarter of the way there, maybe halfway there. We have some models that can approximate that pretty well. There's a lot of important opening questions. You mentioned recurrence and others. But, you know, we're making good progress on that. And that's where a lot of our work has been and there's still a ton to do there. But there's another question which is how does the darn thing wire itself up? Right. How does it go from a birth state or evolution which you know, also connects to the machine learning community when they think about learning the system is partly, you know, there's learning which is kind of neuroscientist called development or evolution or some combination thereof. That, that part we've got much less progress on. That's sort of the whole, there's a huge open space of questions. And that's back to our earlier discussion of the learning part is not mapped on to the, to the actual system very well. The deep learning part, it may or may not be right in various ways. So I think there we're sort of less far along but on the inference side we're sort of, you know, you know, we're, you know, again we're going to look back and say these models were good but far from perfect. But they're, you know, maybe I can generously say we're you know, a quarter to a halfway there where you'd like to be on that problem. [01:05:06] Speaker B: But so do you see is the rest of your academic path, career? Do you just see it mapped out already? Because you have these five or six different full on streams of beautiful potential work to do. I mean you're busy. [01:05:22] Speaker A: Yeah. And we're not going to, we're not going to get to all of those things. I mean we talked about some of the things we are most excited about like moving, pushing some of these control experiments deeper into the brain and obviously always improving models. Thinking about some questions for human health or maybe even some learning, learning related questions for education. Those are things that are starting to excite me now. But I'm also, you know, I'm, I, you know, I'm 52, so maybe I'm like halfway through my career I'm starting to think about, you know, what, what's that going to look like at the end. But you know what excites me now is all the people coming through my lab and you know, they come away saying, well I've got, you know. They come away with it. Yeah, I think a different way of think, A different way of thinking. And they come back and tell me this like wow, I, I, you know, I, I didn't see until I kind of spent time in your lab. And instead of we sometimes the lab joke, we're doing the indoctrination like maybe it's a bit of a religion but they sort of, they sort of start to come around as some of the views that we've been discussing about here, and they see that as, oh, this is actually a way to make progress. And then they go, and I'm seeing them do it in other parts of vision or beyond vision. And what will make me happy is when they'll come, you know, 50 years, if I'm still alive, I'll get to hear their stories of how they took those ideas and extended beyond core recognition, beyond vision, and took them to the bigger problems. Because we're going to need teams of people working on this at one lab is just going to try to do the bit we can, but I think that's what's going to excite me is the way I can influence others to try to do that. [01:06:51] Speaker B: You've certainly spawned a lot of successful teams. Have you always had this way of thinking, or has your way of thinking about these things changed since early in your career? [01:07:01] Speaker A: Yeah, I mean, the way I think about these things. A ton of credit is to my graduate advisor, a professor named Ken Johnson. He passed away about a decade ago. He, Ken was an engineer by training, and he was the one who got me onto these problems. You have to transform the data. You have to go through a series of representations. And look, the visual system does this in this magical place up to the ventral stream called it, and it then enables object recognition. Those were the things he would while we were busy working on the somatosensory system. Right. So that was when I was in his lab as a graduate student. And I said, well, Ken, well, that's great. Let's go do that. So then I figured out how to kind of get John Mansell to let me do a postdoc in his lab and give me some vision credentials. And John is a super, incredibly supportive mentor. And. And then between those two, I take kind of Ken's vision and then John's teaching of how to actually do some vision science. And that's where that comes from for me is the combination of them. So a lot of what I'm doing is just taking four divisions that I got from the two of them. In that sense, I owe a debt to them. This is not something I dreamed up one day of. This is the right way to do it. And maybe I've added our other ways of, you know, other tweaks here or there that we've come up with. And I credit those to my students. And postdocs have been lucky enough to join my lab. So I'm more of a steward of what's going on than I would not call myself a Mastermind genius. I'm more of a community organizer to kind of get. Get some stuff done together. [01:08:25] Speaker B: Do the students that come in, do they. Do they want to do engineering or do they want to do neuroscience? Do they want to inactivate prefrontal cortex, or do they want to build models or half and half. [01:08:33] Speaker A: You know, what's exciting is they all want to do both now. Right. And that's what's really exciting. It's like they don't want to. When it's just engineering, that feels a little dry. You know, the brain is a very motivating thing, like how do billions of neurons and trains of connections make each of us who we are? And, you know, all the things that it could. We can do to help humans. This is the. This is the thing that, you know, that's exciting. That's what neuroscientists has. But they also see, look, I can't just make measurements and inactivate brain areas. We've been doing that for decades. And you can walk around SFN with 30,000 posters and, you know, not clear. It's converging into a kind of a model of the system. Right. So they're sort of drawn to this sort of interface between models and data. And I, I sort of been trying to nurture that in my lab, where everybody does a little bit of both sides. And, you know, it's. They work together often the best projects. And you mentioned the control projects, like two postdocs, one coming from a more ML background, the other coming from neuroscience background. And they each learn from each other and they get something done together that neither of them would get done on their own. And that, I think that's kind of. That's what exciting to me and I think exciting to a lot of students. They want to touch both sides of that, of that. That bridge the actual brain wetware. But the models in the ML and the AI that go into that side as well. And that has practical advantages too. You could get a job of Google and Apple or if you like, or you could run a lab. And so. But that's the exciting time for students coming into the field at that intersection. [01:10:03] Speaker B: I think you record a ton of neurons at once, and that's just going to grow. That number is going to grow and grow. But do we need it to grow? Are we recording enough simultaneous neurons, or do we need to be recording all of them? What's the right amount of neurons to record? [01:10:18] Speaker A: Yeah, I don't think there is an answer to that. I mean, it's my first question, more. [01:10:23] Speaker B: Is always better, but do you get more explanatory power? If you doubled the neurons that you were recording right now, would it actually benefit the match from the models to the brain? [01:10:35] Speaker A: Not the ways that we've been matching so far, where we're matching, sort of how we can predict individual neurons. And then we talk about averages over many neurons. But I think the interesting form of that question to me becomes if we have two models that we can't tell apart, like both seem to match the neural data, how many neurons of what type would you need to distinguish among those two? And notice when I phrase it that way, that sounds like science. Right. You've got two alternatives, hypotheses. The hypotheses drive the experiment and then the experiment is powered enough to separate the two alternatives. But that's. Our field is only just starting to do those things. Right. Most of the history of neuroscience is led by people's intuitions of, well, I should, you know, inactivate this or record here and then I'll make sense of it later. It's not very model experiment driven for the most part. Right. It's, it's more kind of intuition derivative. [01:11:24] Speaker B: Exploratory. [01:11:25] Speaker A: Yeah, exploratory. That's not bad. You need to do that at the beginning. And beginning is the last century. Right. And that works should continue. And I don't mean to say that that work shouldn't go, but it's sort of exciting that now the models are going to start to tell us what experiments to do and like they're going to guide. Even to answer your question, the first thing I would do is kind of ask folks in my lab and say, let's run the simulation on if we had this many neurons, what would we be able to do? But I'd have to have a question. Is it separate two models? Is it to control something? Otherwise it's sort of very hard to answer. You know, it's sort of like asking, how many humans do you need for a vaccine study? Well, it depends on a lot of what is the safety? Or, you know, there's lots of things that go into that question. [01:12:04] Speaker B: So you started off recording single neurons, right? [01:12:08] Speaker A: We did, because that's how we were trained. Right? [01:12:10] Speaker B: Yeah. Did you, did you think, did you think that you'd be recording thousands of neurons at a time back then? [01:12:14] Speaker A: Again, we're not thousands, we're more hundreds of thousand at the kind of at the moment. [01:12:19] Speaker B: But, well, it's multi unit. So who. You know. [01:12:21] Speaker A: Yeah, okay, so we could say, but Then they're mixed together so that, you know, against the training was you should get really good clean single units. [01:12:30] Speaker B: Does that matter that it's multi unit? [01:12:33] Speaker A: As far as we can tell, it doesn't again add much more constraint power to the model separation. But again we keep asking it, we're not going to give up on that. We try to find clean single units and multi unit. We re ask the questions that you're asking for almost every experiment. If we had more clean single units, what do we think? If we had more units, does it tell us anything different? So you're asking why should we keep recording single units? Or you know, I think this to me becomes a, it's becomes a model driven question. What, what is the goal of the, which models are you trying to separate and what's the best data to do that? And if you, you know, given the choices you have, you may be able to get thousands of multi units versus 100 clean single units. Okay, those are apples and oranges. So only if you have a goal in mind can you then say which is better. So more is better. But that's sort of an undergraduate answer, right? That's the undergraduate criticism to this. How could this paper be better? Well, they could have collected more neurons. Well, that's always true. Right. It's, it's harder to answer it in the concrete, but you need, you need some alternative hypotheses to then to answer that question. [01:13:40] Speaker B: I know it's after hours in your world, so I just have one more question for you and that is what are brains for? [01:13:47] Speaker A: What are brains for? Yeah, I think first of all I would, my answer to that question is brains are not quote, for anything. Right. The use of the word for, it's sort of a subtle thing in your language implies, implies that they're designed. Right. So the four implies design. Everything we know about evolution says none of this was designed and it's not for anything but brains. I would say brains are a very interesting part of a machine that was selected for its ability to survive and reproduce, produce. So that's really just a statement of evolution. [01:14:16] Speaker B: It recapitulates evolution. Yeah. Which is totally fine because it all comes back to evolution. But. Yeah, right. [01:14:22] Speaker A: But that's, but that's also why we couldn't give the answer of well, you know, it's just evolution. Let's go home. This is optimize and you know, we can, we can make a textbook. How does the brain work? Well, it's for survival and reproduction. We should all go home, let's go. Right, let's go home. We're good. Let's get on. Other problems to be solved. [01:14:38] Speaker B: Yeah. Well, Jim, thank you for your generous time. I'm going to let you go home. I really appreciate you spending time with me. [01:14:44] Speaker A: Oh, it's been fun, Paul, Thanks. Thanks for chatting with me. [01:15:01] Speaker B: Brain Inspired is a production of me and and you. I don't do advertisements. You can support the show through Patreon for a trifling amount and get access to the full versions of all the episodes, plus bonus episodes that focus more on the cultural side, but still have science. Go to BrainInspired Co and find the red Patreon button there to get in touch with me. Email Paul. BrainInspired co. The music you hear is by the New Year. Find them at the New Year. Thank you for your support. See you next time.

Show Notes

Episode Transcript

Other Episodes

Episode 0

BI 184 Peter Stratton: Synthesize Neural Principles

Episode 0

BI 161 Hugo Spiers: Navigation and Spatial Cognition

Episode

BI 218 Chris Rozell: Brain Stimulation and AI for Mental Disorders