BI 093 Dileep George: Inference in Brain Microcircuits

December 29, 2020 01:06:31
BI 093 Dileep George: Inference in Brain Microcircuits
Brain Inspired
BI 093 Dileep George: Inference in Brain Microcircuits

Dec 29 2020 | 01:06:31

/

Show Notes

Dileep and I discuss his theoretical account of how the thalamus and cortex work together to implement visual inference. We talked previously about his Recursive Cortical Network (RCN) approach to visual inference, which is a probabilistic graph model that can solve hard problems like CAPTCHAs, and more recently we talked about using his RCNs with cloned units to account for cognitive maps related to the hippocampus. On this episode, we walk through how RCNs can map onto thalamo-cortical circuits so a given cortical column can signal whether it believes some concept or feature is present in the world, based on bottom-up incoming sensory evidence, top-down attention, and lateral related features. We also briefly compare this bio-RCN version with Randy O’Reilly’s Deep Predictive Learning account of thalamo-cortical circuitry.

Time Stamps:

0:00 – Intro
5:18 – Levels of abstraction
7:54 – AGI vs. AHI vs. AUI
12:18 – Ideas and failures in startups
16:51 – Thalamic cortical circuitry computation 
22:07 – Recursive cortical networks
23:34 – bio-RCN
27:48 – Cortical column as binary random variable
33:37 – Clonal neuron roles
39:23 – Processing cascade
41:10 – Thalamus
47:18 – Attention as explaining away
50:51 – Comparison with O’Reilly’s predictive coding framework
55:39 – Subjective contour effect
1:01:20 – Necker cube

View Full Transcript

Episode Transcript

[00:00:03] Speaker A: It's like solving a jigsaw puzzle. Biology gives you some set of hints and your model gives you a full set of computations. And then it's a question of how do you map these computations? We had to put in there because that that's the only way to make the model work. And then when we worked back from that to see where it would fit, there was no place to fit it other than in the thalamus. This is brain inspired. [00:00:50] Speaker B: Hey, it's Paul once again. Today I have Dalip George on because last time he was on we ran out of time to talk about what he's back today to talk about. And that is his recent work that maps his model for visual inference onto the circuitry of the cortical column in combination with the thalamus to account for the function of the loops of connections between the cortex and thalamus. The model I just mentioned is what we discussed in previous episodes with Dalip, which he calls a recursive cortical network, which is a generative probabilistic graph model for inferring the best explanation for visual evidence presented to the model. In this case the RCN model, the recursive cortical network model was adapted to account for the corticothalamic circuitry and the way it turned out. Dalip thinks of cortical columns as little belief machines about some feature or concept about the world that you're perceiving. And the belief is informed by sensory input, by top down attention, and the context of everything else that's going on at the time or in the same scene. All that information goes into a vote on whether to believe. The feature or concept that the cortical column stands for should be present in your perception. So I'm either on or I'm off. You do see an edge there or you don't. For instance, this model gives the thalamus a crucial role for explaining away evidence for other interpretations as information gets processed up the hierarchy of vision. So we discussed the model and its functioning in more detail and we compared a little bit with Randy O'Reilly's ideas about the thalamus providing a predictive learning mechanism from a few episodes ago when Randy was on show. Notes are at BrainInspired Co Podcast 93. If you value the podcast and can afford a couple bucks a month, consider supporting it on Patreon. Often I include extra bits of these regular episode conversations and every once in a while I'll post a bonus episode. And I'm almost to the point now where I can start using some of the funds from the Patreon support to pay for a little help with things like editing and transcripts, which would help immensely. Anyway, Dalip won't be back on for a while, but he's always fun to talk with, so I hope you enjoy the conversation. [00:03:24] Speaker C: Catherine, it's Dalip again. George Vicarious. Yeah, can you come do your thing again? Okay. Okay. She's on her way here. Hang on. Previously on Brain Inspired. Thanks for coming on the show again, and we'll talk soon again. So I appreciate it. [00:03:45] Speaker A: Thanks, Paul. Thanks a lot for having me. And I hope I will come back again before two years. [00:03:54] Speaker C: All right, so, Dalip, here we are. Welcome back. [00:03:57] Speaker A: Yes. Great to be back. It hasn't been two years. It has been just a few weeks. [00:04:03] Speaker C: I think only a few weeks. But I know of at least two things that have happened during that time. One, how was your RV trip? How was your first ever RV trip? [00:04:14] Speaker A: Oh, that was so much fun. The kids loved it and it was. It was just one night of RV camping, but, you know, the whole experience was fun for me. It was first time driving a big truck, basically, and then it was just good to get out for a few days. [00:04:36] Speaker C: Oh, yeah. Especially these days. [00:04:38] Speaker A: Yeah. [00:04:39] Speaker C: Can you imagine? So you did it for one night? I did it for a year and a half. Although we had a. We towed a fifth wheel, which is like the big ones that you tow in a. We had to buy a monster truck and stuff. Do you think you could handle a year and a half with the. With the kids and everyone in there in space? [00:04:56] Speaker A: So you did a year and a half with. In an RV with kids? [00:05:01] Speaker C: Yeah, and I'm still. I mean, I'm basically bald now. So, I mean, it's. [00:05:06] Speaker A: I would love to do that, actually. [00:05:08] Speaker C: Yeah. Well, maybe you should. Maybe we should talk before you just sell all your stuff and move into an RV like we did. Anyway, glad. Glad to hear that you got out and that it was fun. The other thing that happened is that you published this nice review in Frontiers that details really your overall approach, but also is based on the recursive cortical networks that we've talked about a few times here already on the podcast. Obviously, I'll link to that review. It's a great review, by the way. I have a few questions here related to that before we really get started. So one of the things that you lay out, what we mentioned, what you mentioned and described in the last episode, this triangle strategy that you employ for building AI, Basically you use in parallel cognitive and neuroscience observations. And you match those in parallel with computational algorithms and principles, and you match all of that with the third node of the triangle, which are properties of the real world. And that's a little bit reminiscent of Mars levels of analysis. And the reason I'm bringing this up is because I had a listener question about you trying to understand how the brain works. The question is, how do you define the levels of abstraction of the description of the brain? And what level are you working at in particular? So I just will throw it to you. [00:06:38] Speaker A: Yeah, the tricky part here is that you cannot define a level of physical abstraction in the brain. You cannot think of you are operating at neuron level, column level, or synapse level, or you cannot make any physical cutoff like that. Because if you make a physical cutoff like that, and if your overall system needs a mechanism that is below that level, then you will be missing that out. And that's why we don't describe it in terms of a physical abstraction level, like a cortical column or anything like that in the brain. And it's more, you look as deep as you need to, but you ignore irrelevant things at any level, any physical level of abstraction. So if something at the cortical column level is irrelevant for information processing, you ignore that. So this triangulation strategy is more for figuring out what is important for information processing at any physical level of implementation versus and what is not relevant for information processing. [00:07:50] Speaker C: Great. So hopefully that satisfies. Jeff. Okay, last question about this review. And again, I'll point people to it. But so in vicarious, the goal is to build AGI. And one of the things that you write about is why are we calling it AGI? Why not ahi as in artificial human intelligence? And then so you make that distinction between artificial human intelligence and artificial general intelligence, and then you add an artificial universal intelligence. So how do you think about these things? What is all that? What do you mean? [00:08:26] Speaker A: Yeah, so we need a term to refer to building of intelligence model after the human brain. And people have been calling it artificial general intelligence for a while, but there is some confusion. And this confusion can be because of books like Superintelligence being written where what is imagined is an arbitrarily powerful entity which would just instantaneously learn everything and take over the world and convert everything to paper clips, etc. Etc. But such things are do not exist. There are fundamental limits on what even a general intelligence will be able to do, how quickly it will be able to learn new things, how quickly it will be able to disentangle the causal mechanisms in the world so that it can make decisions and drive actions. So those fundamental limits will always put a, you know, there is a constraint on how quickly something can learn and act in the world. So those are not constrained in this mythical super intelligence setting. So that's why I call it some arbitrary artificial universal intelligence, which is, which is like perpetual machines. You know, you can imagine something like that existing, but it violates the laws of physics and it cannot exist. So that, that's the thing I put in artificial universal intelligence. You know, something that exists only in our imagination cannot be really built. So then some people, you know, since people were mischaracterizing general AGI as this aui, some people started basically saying that, oh, it should be just called human intelligence, not AGI. But the problem there is that that also doesn't solve the problem because are you going to model exactly like human intelligence? Like, are you going to put in the working memory limitations that we have just because we have some hardware limitations? So are you going to impose all the arbitrary constraints that might be there on our own intelligence? Are you not going to wire a Google search directly into the prefrontal cortex, if you could? So it's basically whenever we are going to build something based on human intelligence, but on a different substrate, it will be more general than human intelligence just because of the way we are building it. So you can think of, oh, I want to model human intelligence. Exactly, and try to build it, but you will end up building something that is more general. So that's why I like the term artificial general intelligence. It is not some arbitrarily universal intelligence and it is not exactly trying to replicate human intelligence with all its limitations. So that's why I like the term AGI as something general intelligence model after the human brain. [00:11:23] Speaker C: I like that too. And it's not subject to the same constraints physically or just capacity wise as well. So I wonder, so you kind of have a hierarchy. There's artificial human intelligence and then above that is artificial general intelligence. And the limiting fictional idea theoretical is artificial universal intelligence. So then below the human intelligence, where is that? Is that me? So is there a term for that? Computers or something? [00:11:52] Speaker A: I would call them specialized intelligences. [00:11:56] Speaker C: Oh, good. There's all the specialized stuff. That's pretty good. Okay, all right. So anyway, I enjoyed the review paper because it lays out, I mean, like I said, it lays out your approach and gives examples that we've talked about already on the podcast and that are in all these Papers so in a very easily digestible manner. [00:12:15] Speaker A: Glad you liked it. [00:12:17] Speaker C: So vicarious. I have just a couple things on vicarious and then we're going to get into the meat of this thing. Okay. Doing things in general and creating startups, for instance. So it seems like 99% of startup ideas begin with the wrong idea. And you were talking about how important it is to listen to customers and that's how it kind of shapes what you're building. Not only the specifications and the technical information, but the kind of things that you're making. And so I'm wondering, does it even matter how much does it matter what idea you start with given that it's going to change probably dramatically? [00:12:52] Speaker A: Well, it matters in terms of what is the skill set of people you are bringing in and what are your key people in the company. As long as whatever you are changing to fits within the skill set of what those people can build, then I think it'll be reasonably fine. It'll be very hard to pivot to something that is totally outside the scope of like that will be almost like starting a new company. [00:13:23] Speaker C: Do you think that's why a lot of startups fail? I mean most startups fail for various reasons, but do you think that might be an underlying reason that when people have to quote unquote pivot, then their team or their technical skills aren't up to par for whatever the end goal is at that point? [00:13:40] Speaker A: I mean there are multiple, multiple reasons and depends on the different stages of the company. So at early stage what I've seen are things falling apart sometimes just based on disagreements between the founders. They just, you know, try to work for a few months together, just didn't pan out. They want to go in different directions. So at the very early stage when it is two or three people in a garage kind of sitting, it can, it can fall apart because oh, just didn't gel to work working together. So that's, you know, one reason and often it is just finding that core team that will stick it through which is the hard part, you know, getting a, getting a co founder and, or a key employee number one, which becomes super important in the company. That core team clicking together can be a challenge and so that's a lot of ideas fall apart right at that stage and then it is, you know, the different gates that you have to pass through, you know, seed, CDC, etc. So and things can fall apart at the seed stage where oh you just. The idea does not get any traction at all where you talk to multiple people and Just cannot raise any. Any money. And you might bootstrap it for some time. But if it's not getting customer traction, or if it is an idea that actually requires capital to get customer traction, then it will fall apart at that stage. Then later on, what can happen is that companies can scale too quickly. Because you always want to scale because there is a perceived customer demand. But if you scale too quickly and then the customer demand doesn't materialize in time, that's. Something go wrong in later in the life cycle, then scaling too slowly sometimes can also be bad. [00:15:45] Speaker C: You make it sound so fun, man. [00:15:48] Speaker A: Because then somebody else scaled faster than you. So there are many, many things that can go wrong. We know that Khosla has a nice analogy for what startup life is like, right? It's basically, he's saying it's like jumping off an airplane and building a parachute on the way down, and then the goal is to basically just not hit the ground. [00:16:27] Speaker C: Note to self, Dalip does not suggest starting a startup. [00:16:31] Speaker A: No, it's fun. It's still like skydiving fun. [00:16:34] Speaker C: Yeah, that's right. There you go. There's the plus side of it. [00:16:37] Speaker A: Yeah. [00:16:37] Speaker C: Okay, well, that's great. So you've made it. You've built your own parachute. It seems like you're. Or a hang glider even. Maybe I don't know what you've built, but you've done a really nice job so far. So anyway, continued success, of course, with vicarious. [00:16:50] Speaker A: Thank you. [00:16:51] Speaker C: But what we're really here to talk about today is the second of two recent papers, the first of which we talked about last time. So the second one is all about. Well, I'll just read the title. A detailed mathematical theory of Thalamic and Cortical microcircuits based on inference in a Generative vision model. And I'll just kind of introduce it here and then you can correct me and we can. And then we can get into it. So the neocortex seems important for our general intelligence, however narrow that generality may be. As we were, you know, sort of just discussing. You have argued that the cortex is, you know, on the one hand, you know, the important thing to understand, but on the other hand, that, you know, it's. It's the easier bit of brain to understand because the rest of the brain is so complex and specialized, honed through eons of evolution. Okay, so cortex, just to bring everyone to up to speed, is this big sheet made up of a repeated architectural motif. What's called the cortical column or microcircuit. Which is repeated throughout the cortex fairly uniformly, but with variation. But it's. The basic organization is similar across cortex. And you've argued that if we knew what it was doing, as many have, that we could apply that and then take us a big step forward, not only in understanding our own human intelligence, but in building artificial intelligence. And there are many theories of which you have, you know, worked on already, but they're based on sort of two, I would say, main approaches. One approach is to think of cortex as sort of a feed forward, bottom up series of hierarchical processing. So going from like these simple to more complex and abstract representations. So like in vision, that would be going from like lines and edges building up to all the way up to an image that you can identify an object, like the face of, I don't know, Abraham Lincoln, Someone famous. Right. The other approach, the one to which you subscribe, is to think of cortex more as a top down inference engine which creates generative models of possible worlds to then best explain the data that is, that is coming in to our senses. Am I on point so far here? [00:19:12] Speaker A: Yes, you are exactly on point. Yes. [00:19:14] Speaker C: Okay. [00:19:15] Speaker A: Do I have to say anything at all here? [00:19:17] Speaker C: No, no, just, just. By the way, Dalip's on a treadmill. This is great. This is the first time someone's exercised during the podcast. I love it. So it's. [00:19:27] Speaker A: It's great to be standing and walking while talking. That's fun. [00:19:31] Speaker C: Yeah. [00:19:31] Speaker B: Oh, yeah. [00:19:31] Speaker C: I should get a treadmill because I do. I use a standing desk. Okay. Anyway, all right, so. So I'm going to continue here. Most focus has been on the canonical cortical microcircuit, asking, what does that column do? What does that cortical microcircuit do? But of course, cortex doesn't act alone in the brain. It's highly interconnected with lots of other brain areas in these complicated loops between cortex and the other brain areas, one of which is the thalamus. And the role or roles of the thalamus has been debated for many a year now. It was originally thought just to be a relay from our sense organs to the rest of the cortex. And it does do that. But more recently, it's been thought that it's played a role in attention and the regulation of information flow between cortical areas and from our senses to those cortical areas. So the circuitry and the loops between the thalamus and the cortex have led some people to rethink the canonical microcircuit computation. Right. What is the canonical microcircuit actually computing so to move beyond just cortex and it actually involved the thalamus. So I had Randy O'Reilly on recently and he has this deep predictive learning model where there's a feedforward projection to the thalamus from cortex and a feedback projection to thalamus from cortex. And the idea, and this happens in the polvinar, at least in the visual system. These feedforward and feedback connections join together in the polvinar and act like a predictive learning mechanism in the style of this top down predictive coding inference approach. [00:21:19] Speaker A: Yeah. [00:21:21] Speaker C: And I only say that because this conversation that we're having is basically the closest thing that we've talked about on the podcast to that is Randy's predictive learning mechanism here. Okay, so that brings us up to speed now. Sort of up to speed. So you had these recursive cortical networks that you've been working with for years. [00:21:42] Speaker A: Yeah. [00:21:42] Speaker C: And you realized that they could be implemented with networks of neurons, and you realized that you could map the computational properties and the flow of these RCNs onto the cortical column and thus the bio RCN was born. [00:22:01] Speaker A: Yes. [00:22:01] Speaker C: And. All right, so I'm going to hand it over to you. So we've done this already multiple times, but just really broad overview. Let's just recap what recursive cortical networks are and what you've used them for in the past. [00:22:14] Speaker A: Yeah. So recursive cortical networks are a hierarchical generative model for vision. It builds a hierarchy from parts for line segments at the bottom, going all the way to object level models at the top. And these are all encoded as a probabilistic graphical model. And when a new piece of evidence comes in, like a scene of characters or scene of objects, this model can parse that scene as best explanation under the model. And we used it for cracking captchas, defeating their fundamental defense. And also we use it regularly in our robotics tasks for cluttered bin picking, detecting boxes, all those things. So it's a vision model that can be used for object recognition, for foreground background segmentation, for estimating pose, for generating from the model, for occlusion reasoning. So it's a unified generative model on which the different tasks are just different queries on the model, rather than having to train specifically for the task. [00:23:25] Speaker C: Yeah, Having to retrain the model like a deep learning network that you'd need to retrain for every task. I mean, there might be some generality between tasks, but in general you'd have to retrain It. [00:23:33] Speaker A: Yeah. [00:23:34] Speaker C: All right, good. So, like I said, you've taken those RCNs and applied them to a cortical column and developed a BioRCN. And one of the nice things about applying this to a cortical column is that because you already had the theory basically of the rcn, it makes very specific biological predictions of what needs, like what kind of connections there need to be and what kind of cells need to be involved and precise inhibitory and excitatory interactions. And the way that this works so well, maybe you can elaborate on that just a little bit. [00:24:12] Speaker A: Okay, so one thing is that, you know, when we built RCN originally, we were looking at the brain for insights. We were looking at visual cortex for insights to say what kind of structural constraints need to exist in the model. And so it is not surprising that we will be able to map it back, because we started with insights from the brain. But the insights from neuroscience are clues. For example, the idea that surfaces and edges are represented separately, but in an interacting way, that is an idea that came from neuroscience, and we triangulated it to some algorithmic principles and properties of the world. But then how exactly it gets implemented in the graphical model, that is something we develop in the context of everything else that we are building. So the mathematical model that we are building is filling in a lot of the details based on hints from neuroscience. And the good thing about the final model that is built is it is functionally complete, even though it is partial functionality. It is not doing everything that visual cortex is now doing, and not to the level of performance that the visual cortex is doing. But at least it is complete. It is doing the whole thing of parsing a scene and recognizing characters, all those things. So it's a complete functional model. So now that means it fills in a lot of the details that are not available when you look at initially at biology. So now we can go and map back these computations to the cortical lamina and columns and again, their information from anatomy and physiology. All the experimental data act as constraints in that mapping. So you know that, for example, feed forward input from the thalamus mostly lands on layer four in the cortical lamina. And if, if that falls on layer four, then if you place that computation there, you know that the next set of computations, which are dependent on the projections from layer 4 to layer 2. 3. There is only one place to put that. It's like solving a jigsaw puzzle. Biology gives you some set of hints, and your model gives you a full set of computations and then it's a question of how do you map these computations and you anchor them based on known data from biology, that those becomes your anchoring points, the corner pieces of the puzzle. And then rest of it is just get gets filled in based on those constraints imposed by these anchoring pieces of data from biology. [00:27:09] Speaker C: Good. One of the things that you do is so in the rcn, it's a probabilistic graph model. Each node in the rcn, when you break it into biology, is sort of broken into groups of neurons. So each node kind of represents groups of neurons that then you break down mathematically and computationally how they interact and compute and then send projections to other nodes that are made of other groups of neurons that do different computations. So we'll bring thalamus in here in just a bit, but let's start with the cortical micro column. You fashion the cortical micro column as a binary random variable. So what does that mean? [00:27:55] Speaker A: Yeah, so one pleasing thing out of this mapping is that it gives us a tool to think about cortical micro columns. So you can think of a cortical micro column as representing a feature or a concept. So it can be a cortical microcolom represents an edge or a character, a at a higher level. And then so what do those different lamina in the cortical column do? They are all talking about the same feature. You know, they are all talking about this one thing, whether it's edge or a character, but they are talking about different aspects of that thing. So it could be, how does this edge participate in lateral connections with other edges in the same level? How does this edge decompose itself into smaller edges? Or how does this edge compose itself in a hierarchy with a corner in the next level? And these different aspects, the participation of that feature in different aspects, whether it is laterally, hierarchically, sequentially, etc. All these different aspects get represented in different cortical lamina. And then finally there is this need for computing the belief, the final belief, that each cortical column needs to say, am I on or off? As part of the overall coherent thing that the visual cortex is trying to explain. Am I part of the best explanation for the scene or not? And that is the belief whether that cortical column is finally on or off. And that requires integrating information from bottom up evidence, lateral evidence, top down evidence, sequential evidence, all of them. And all of them are finally combined into a signal saying, am I on or off? And so that is represented in another lamina so this mapping gives us this conceptual tool to think about what a cortical column is doing. [00:29:56] Speaker C: Just to give a really coarse sort of cartoon of this. It could be that, you know, in one layer, let's say, you know, layer projections coming into layer four, saying, I'm an, I'm an edge, I have edge properties. And then the projections up to layer two, three is like, you're an edge, but you're near a surface and I'm going to vote on that. And then projections, it gets like the context from lateral layers. And so then you have layers one through six sort of all voting on their specific contextual votes about this one property of the edgeness. And then altogether they vote on the thing as a whole. [00:30:36] Speaker A: Got it. That's perfectly done. Yes. [00:30:39] Speaker C: Okay. Okay. So I mean, is it useful? Should we break down the different roles of the different laminae or you know, is that maybe too, too fine grained? I don't know. [00:30:52] Speaker A: No, we could. So we could at least try. So. So one thing, you know, this brings up is that when you measure it the right way, all the, all the neurons in a cortical column will tend to have the same receptive field. And this is of course observed. Right. But also you will see that the receptive fields will change based on the context. So initially, when you, if you are measuring it based on purely bottom up evidence and just power it through, you will, you will see that all the different laminae neurons in all the different laminae have the same classical receptive field. But then depending upon which lamina they are in, and depending on what contextual computation they are doing, their classical receptive field can change into something more dynamic and something that depends on extra columnar input or inputs that are not directly bottom up for that column, but based on lateral or top down inputs. So if you go lamina to laminar. So in the mapping, layer four is of course feedforward input. Layer two, three has multiple roles. One is computing the lateral connections for contour continuity and the other is pooling, just like in a complex cell, pooling information for invariant representation and then projecting to the next higher level. And then layer 5, which below layer 4 is pretty much doing the same computations as layer 2, 3, but now that layer also includes feedback from above. So in this mapping, as you mentioned before, every, every node in the probabilistic graphical model needs to have multiple copies in, in a neurobiological implementation because you need to have messages going in different directions, being represented by different set of neurons, because neurons are not bidirectional. And that's why the same computation which happens in the feed forward pathway is also kind of replicated in the feedback pathway, because one is using purely feedforward information, the other is using a combination of feedforward and feedback information. Layer 5 is lateral connections and unpooling that uses the combination of feedforward and feedback information in layer 5. Also, a sublamina of layer 5 is involved in belief computation, which uses both feedforward, feedback, lateral, all those things together. Then layer six is computing feedback messages to send to the children. So that's the rough breakdown. [00:33:36] Speaker C: All right, good. Yeah. Everybody memorize that. All right. It's nice. It's in the paper. I mean, I'm just going to start listing interesting things as we go through here, Things that are interesting to me. So, copies. So the clonally related excitatory neurons, our copies to participate in different lateral and hierarchical contexts. So when cortex is developing these clonal neurons, they all come from like the same neuron, essentially when these neurons are being created. And that's what's referred to, I believe, as these clonal neurons. And so what role do these copies or clones play within the processing? [00:34:18] Speaker A: So this is an interesting aspect that kind of falls out of the model and the mapping it produces to biology. So the model used clones in its representation for representing higher order context. And this is also related to our work on cognitive maps, where you're using different clones to represent different contexts. So that same idea is also used in rcn, and instead of temporal thing, it was in the context of lateral connections. Is this line part of what different curvatures are aligned part of, and representing that in an efficient way uses clones in that representation. So basically what it means is that if you think of these lateral connections as a sequence, and different curvatures are different sequences it participates in, then you can think of it as a particular feature, which is an edge in this case has different clones for participating in these different curvatures. And that's a very efficient way to represent this higher order lateral context. So that's one place in which these clonal neurons come in. And this, if I am going to speculate forward, this might be a general property of how a cortical column represents higher order information using cloning. Basically saying you create different clones for the different higher order context it is participating in, whether it is lateral context based on line continuation, or whether it is sequential context based on temporal continuation, or whether it is based on surface properties. So just using different clones for different contexts might be a General property. That's one aspect. And there's also another aspect which is basically saying irrespective of what a cortical column represents, you need to have some basic computations to be done in that cortical column, which is part of inference, saying, oh, whether it doesn't matter whether it is representing line segment or a character or a, you know, frequency bin, you still need to process feed forward information, combine it with feedback information, lateral information, etc. And those set of computations imply a particular connectivity. And those connectivity can be wired upfront. You don't need to wait for environmental signals to come in to wire them, because those computations are irrespective of what the column represents. So that's this idea of establishing some connections a priori in a cortical column. [00:37:06] Speaker C: That don't need to be learned. They don't need to go undergo any learning. It's just hardwired. [00:37:10] Speaker A: Yeah, exactly. And this is also related to the clonal neurons. There are some recent papers which we cite in our paper showing that a lot of vertical wiring in the cortical column can be established a priori and are established a priori, and maybe those synapses will still retain some plasticity, but you don't need that plasticity to be the one establishing those connections. [00:37:38] Speaker C: Yeah, Maybe a lower sensitivity to change perhaps. [00:37:41] Speaker A: Correct. Right. [00:37:43] Speaker C: I mean, this is in contrast to the lateral connections between columns, which have in the model a higher learning capacity. Learning sensitivity. [00:37:53] Speaker A: Correct. Yeah. So the lateral connections are all learned. You know, you can of course, genetically project them to a area where they are more likely to make connections, but the specific connections it makes to other cortical columns are learned because the definitions of cortical columns themselves are learned. Right. Like, you know, so whether a cortical column represents an edge or not is not something knows a priori. So the lateral connections will depend on what those cortical connections, the cortical columns themselves represent. So they have to be learned. [00:38:28] Speaker C: So, I mean, this is all about. One of the things that I love about this. I've been reading about prefrontal cortex. Oh, is it passing him and wise? I don't remember. But the overarching theory of the prefrontal cortex function, in their view, is that it's all about context and planning actions, but vastly based on the context of multiple different sources of information that are coming in. And that this really fits with that and broadly just these lateral connections, because it just makes a lot of sense. I used to make fun of there was a postdoc that I used to work with and he was all about context and I used to make fun of him. He's probably much more successful than I am now. But I've come around on it and thinking, wow, it really is a fundamentally important thing to do to be able to move through the world is to integrate these different sources of information and it's all about context. So maybe we can just go through a bit of the sort of processing and then this is when we can bring the thalamus in as well. And of course this is all detailed in the paper, but you know, essentially there is a feedforward pass which you already mentioned, and it kind of goes through this sequence of features, laterals, pools, this kind of cascade. There's also a feedback pass which is goes through the reverse of those features. It cascades in the reverse direction. So it goes pools, laterals, features. And I don't know if you want to comment on the functionality of that. [00:39:59] Speaker A: Well, features are detecting co occurrences, line segments or corners, and laterals are just enforcing continuity between in the in the line representations. So contour continuity and pools are pooling for invariance so that the higher level can be more invariant. And that's the structure that is repeated in the hierarchy. And it's just that it is more formulated as a generative model so that you can sample from the model and also pay top down attention, control things with top down attention, etc. [00:40:37] Speaker C: And that's kind of the core of the RCN, right? So there's this forward pass and then going to the top, and then there's the top down attention feedback pass that then hones in on the correct answer. Is that a way to put it? [00:40:50] Speaker A: Okay, yes. It comes iteratively to the correct answer. [00:40:56] Speaker C: Yeah. Refines the feed the bottom up projections and refines it into the the top down generate into a generative fashion. So I'm just stumbling over my words here. Okay, so that's kind of the core of the rcn. And now let's bring the thalamus in because there's this corticothalamocortical pathway which you describe as explaining away where the thalamus is implementing these, explaining away these or factors maybe you could describe. [00:41:29] Speaker A: Yeah, so when you have multiple things modeled in your model or in the brain, so you can think of it as each thing you are modeling whether it is an edge or whether it is an object, it is specifying how it generates the input. So when you think of an edge, it's basically saying if I activate this edge feature, it will generate these set of pixels in the world. And if it's an A, it's basically specifying, if I poke this node, it will generate these pixels in the world. And now, of course, many of these things are interconnected. If you poke an edge and an adjacent edge, then they will overlap in the fields that they generate. Some set of pixels will be overlapping between an edge and adjacent edge when you try to generate them. Then when you actually do inference in the real world, you need to find that which subset of these need to be on, because some of the evidence will be overlapping between them. We have some examples of this happening in captchas where when you look locally, because of the juxtaposition between these characters, you can, you can start seeing characters in between some of these characters. For example, when you bring an R and N close to each other, it can look like an M. And so although locally the evidence for that character might be very strong in the global parsing of the scene, that evidence is just a hallucination and needs to be explained away. And so this is, you know, this is the core idea called explaining away, which is, which is, which happens in probabilistic graphical models naturally, if you formulate it the right way. And when you're parsing a scene, yes, you definitely need to explain away local evidence using the global context. And not only that, it's not a competition that happens just at the first level. This needs to happen between every level in the hierarchy. So from V1 to V2, from V2 to V4, you need to have these explaining computations, because the things that are modeled in v2 also have overlaps in v1, and you need to explain them away in a hierarchy. And these explaining computations exist in RCN because it's a computational requirement. If you want to come to global consensus, you need to explain a local evidence. It's something we had to put in there because that's the only way to make the model work. And then when we worked back from that to see where it would fit in this cortical mapping, there was no place to fit it other than in the thalamus. And not only that, it turned out to be a very, very good mapping to what the computations the thalamus is doing. And I would say based on this mapping, it kind of starting to make sense why it would be implemented in the thalamus and why it is related to other projections to the thalamus, etc. So basically, if you think about what this explaining away computation does, it's gating of feedforward information based on feedback information. That's fundamentally when you look at what is happening in a subset of neurons or in a node in rcn when you pass messages for explaining away, it's routing of bottom up evidence based on top down support. So if you have, if a node has two parents pointing to it as both are causal influence on this node being on. So a pixel can be on due to parent A or parent B. And now evidence comes in from what I'm saying, oh, this pixel should be on 0.9. That's the likelihood of this pixel being on. Now you have to make a decision locally on how to share that piece of evidence among the parents. Should you basically say oh, 0.9 goes to parent A or 0.9 goes to parent B, or is it a fraction of them? You know, half of 0.9 goes to parent A. And this is depends on how much other support does parent A or parent B have. So if somebody says parent B from elsewhere in the network, you know that parent B is the one likely to be on. Then you, you pass all the evidence to parent B and they give very little to parent A. So it's. So it's based on this top down information that you get from these parents on how much support they have from elsewhere in the network, you decide to route this bottom up information. So that is the fundamental computation that is happening in this explaining away circuit. And that fits very well with what Sherman and Guillory and many other people have found out about the thalamic circuits. [00:46:52] Speaker C: Like the feedback connections from layer 6 in the cortex projecting back to the thalamus. So it's almost like a center on mechanism, although it's not anatomically set up the same way. How is explaining away then related to attention? Because saying top down and that, that's like an attentional kind of mechanism. Is it attention or how does it relate? [00:47:19] Speaker A: So attention. You can think of it as a very special case of explaining away. So this explaining away is a mechanism where the parents can be kind of half on or half off. They don't have to be full, they don't have to commit to being fully on or fully off. All right? And even then this computation happens. But now suppose you, you set a parent to be off or set a parent to be on. Then that is hard, I would call it hard explaining away. And so that is attention. So basically you're saying, oh, I want the computations to happen under the assumption that the letter A is on. That would. So turning that letter. Yeah. On top down will basically change the nature of explaining the computations happening at the lower level. [00:48:08] Speaker C: So the explaining away is happening anyway. But then the attention can have an effect on top of the explaining away. [00:48:13] Speaker A: Exactly. [00:48:15] Speaker C: One of the worst times I had coming down off of acid, I was laying in bed trying to sleep and still my mind couldn't stop and I saw this cylindrical green thing made up of almost Minecraft blocks. Minecraft didn't exist back then, but it kept spinning and spinning and these blocks kept coming in and adding to it and adding to it. It was driving me insane. But this, what I'm wondering is, first of all, what's your worst experience on acid? No, what I'm wondering is if you didn't have these explaining away mechanisms, my guess is it would be like an acid trip where everything is super local and you can't sharpen anything. Right. You can't have the global features, you can't filter anything out basically. And everything is present and interacting. How does that sound? [00:49:03] Speaker A: Well, that all sounds reasonable. This is something I have to extend the theory to. [00:49:11] Speaker C: Oh yeah, because. Right. Psychedelics are back. You could maybe a therapeutic use for explaining away. [00:49:16] Speaker A: Exactly. Yeah. So I haven't dwelled much on that one. Especially this idea of feeding the top down input back into the network. Right. So basically this is where there is no sensory input, your eyes are closed and the system is running on its own. So you know, generating your top down input and effectively feeding it back into your network. And that's, you know, that is obviously some amount of mixing of top down and bottom up is happening all the time in the network. But where you cut off the bottom up totally and feed it back in. That's something we haven't explored much in detail. But I would love to, because of the connections to psychedelics and also because of the connections to some other things like schizophrenia or where we start mixing what is real versus what is what is hallucinated. Right. So that would be an interesting direction in which to take this model. We haven't done anything that. But that is a. Interesting way to look at it. [00:50:32] Speaker C: That is interesting. I'll introduce you to my friend. I have a friend a few states away who he's growing his own mushrooms, psychedelic mushrooms. He keeps sending me these pictures. I haven't done psychedelics in a long time, but I'll hook you up with him. That's the language of the kids. But so anyway, these just bringing it back a moment because. So your story then, the story of your model here is that these feedback projections from the cortex come onto thalamus and have these explaining away mechanisms. And Randy O'Reilly's story is that the feedback projections, same feedback projections, but in this case they are, they're comparing the prediction, the generative prediction with the bottom up information. And yours is as well. But in his case it's a story about learning how off the prediction is and then that's how plasticity happens within these circuits. The temporal difference between the prediction and the bottom up gets sent and drives the learning in cortex. And I don't see a problem for both of these things to be correct. I'm not sure how you. [00:51:45] Speaker A: That's right. So I don't see why both can't be correct. And especially in our model we don't, we don't talk about the learning part at all. Right. We are, we are only talking about the inference side. You know, once the model is learned, how does inference happen? So I won't be surprised if learning involves some mechanism like, or really suggesting there based on error between the prediction from one layer and the other, how the synapses are adapted. I won't be surprised there. And so they can be both be compatible. But I do want to make a contrasting statement between what our model is doing and this generally accepted idea about predictive coding. [00:52:29] Speaker C: Yeah. [00:52:29] Speaker A: So predictive coding is, you know, it's a popular word in neuroscience and it's used everywhere. Every, every model is a predictive coding model. [00:52:39] Speaker C: That's right. [00:52:41] Speaker A: Of everything. Hippocampus is a predictive coding model. Visual cortex is a predictive coding model. So this a pretty predictive coding thing is thrown out in just a word that is just overused everywhere. [00:52:56] Speaker C: It's a bonanza. You could say it's a bonanza these days of predictive coding in the literature. Yeah, right. [00:53:01] Speaker A: But when you look at what actually predictive coding, if you go back to the literature and look at what predictive coding entails. So this is during inference. Right. We are not talking about learning. This is during inference. It needs to subtract the top down input from the bottom up information and then only the errors between the top down prediction and the bottom up input are passed up. That's actual computation, if you want to map that word to an actual computation. But that computation of subtracting the top down predictions from bottom up input makes many restrictive assumptions. It's basically assuming that your model is linear and your noise is Gaussian. And it's only in that setting, the subtracting the top down predictions from bottom up input makes sense. And I would say this might go back to one of the first papers on predictive coding which was from Rajesh Rao. This was a Nature Neuroscience paper. But then that idea got stuck. Basically saying, oh, this idea that you should subtract top down predictions from bottom up input is the right way to do things that somehow got right. But it is not. What you want to do is combine bottom up input with top down. You want to say top down input influences top down prediction, influences how bottom up information is sent up. But it is not a subtraction. Sometimes it can be an amplification of compatible regions. So if top down prediction agrees with bottom up input, you, you keep those things around, you pass those up. Where there is a mismatch, you can kill off the bottom up input. So it is, and it depends on the particular context in which the computation is happening, whether it is based on top down attention, whether it is just software explaining away. So it's a richer story than just subtract top down input from bottom up evidence. And I would call it more, rather than calling it predictive coding, it's probabilistic inference. What is happening is probabilistic inference of combining bottom up and top down information. And that can look like subtraction in some settings, it can look like amplification in some other settings. And in reality it will be a mix of both of those things. [00:55:40] Speaker C: So you have the model, and it's not just a model, it does things, it accounts for things. And you go through multiple visual phenomena that it accounts for in the paper. And I'll just let you choose, maybe you can just describe one of the phenomena that it accounts for and just a little bit about how it does so. [00:56:00] Speaker A: Yeah, so one of the phenomena that we explain and replicate in our paper is the subjective contour effect. And subjective contour effect is where you see parts of an object as the bottom of stimuli, but the rest of the object is filled in top down. And you actually hallucinate things that do not exist in the real world. And this is the famous Kanitsa triangle example is the most salient thing I can think of in this category where what you're seeing bottom up in terms of the real evidence is just these three Pac man like figures which are the corners of the triangle. But your brain actually hallucinates. You interpret the image as a triangle sitting on top of three circular discs. That is your interpretation of the image. And then your brain in its vast wisdom actually hallucinates as, you know, three edges of the Triangle. And when you look at this, an image like that, you actually perceive a faint line and that line is not there. [00:57:13] Speaker C: The faint line delineating the triangle, the shape of the triangle and that faint. [00:57:16] Speaker A: Line is completely created by your brain. And so this is something that fits very well with the theory because this hallucinating something that doesn't exist is inference. It's part of inference because that hallucination is part of the best explanation that the brain is cooking up for the scene. So if you are thinking in terms of cortical columns, so you can basically say what happens in the cortical column in that location where there is no bottom up evidence, that cortical column is looking at a segment of the portion of the image where there is nothing, it's just blank. And then it is hallucinating a line there. So you can think of what happens as part of the dynamics there. When bottom up input comes in, layer four neurons will be essentially silent saying nothing, nothing to see here, blank image. But later on when the context kicks in and the top down and the lateral context kick in, and the lateral. [00:58:18] Speaker C: Context in this case would be. So you'd have, and correct me if I'm wrong, you'd have like a receptive field for one cortical column or something in that blank spot where there is no line. But then next to it, because it's a topographical map in the cortex, let's say next to it is the edge of one of those Pac man shapes. And so it's getting that context that there is this edge near me even though I'm blank. And that's sort of a feed forward pass through with the context. [00:58:46] Speaker A: Right, Correct. And then it will also be compatible with the final decision arrived at the top, which is, oh, it's a triangle. So there will be top down information saying that oh, it's a triangle. That means all these columns in between should be on. [00:59:00] Speaker C: And at first it's kind of like I think it might be a triangle. Oh no. And then it goes and goes, oh, it's a triangle, it's a triangle. And then it really clamps it down. [00:59:10] Speaker A: Correct. So you can think of what will happen in that cortical column. Initially it will say, oh, blank, nothing to see here. But then as the lateral context and top down context gets incorporated, suddenly neurons in this other lamina which are responsible for representing those aspects of computation of that cortical column will turn on. And then finally when you look at the belief of that column, it should basically say I am on, I'M on. And similarly the belief neurons in, in the outer edges of the Pac man, which is the circular parts of the Pac man, it should say they're off because you know, that's not part of the explanation of the triangle if you're paying attention only to the triangle. So if you, if you basically are clamping the triangle at the top, then the circular parts of the Pac man should turn off and you should be able to see that in the, in the cortical column. So. And that's effectively what we are doing. What we are doing is virtual neurophysiology. It's almost like we have a functioning monkey in the lab and we can show its stimuli and basically record from different layers and show this is how information settles and this is how it arrives at the final answer. And you can extend this to not just the Kanitza triangle explanations and subjective contours explanation in our model mostly uses just the contour model, but you can also extend it to an illusion called the neon color spreading illusion, which also. It's a two layered illusion. There are subjective contours in it because it is hallucinating edges. Not only that, it's also hallucinating the color inside the circle. And it's interesting that the color hallucination respects the hallucinated edges. The color doesn't bleed out of the edges. [01:01:05] Speaker C: So. [01:01:08] Speaker A: It'S interesting in that way. So we can replicate the dynamics of that too. And hopefully it's also making some predictions about how that illusion also comes into effect. [01:01:21] Speaker C: This reminds me of. So Steve Grossberg has work on this sort of color spreading and respecting the boundaries of the neon thing as well. And it looks like it's along the same lines explanatorily. Yes. While you were talking, I was just thinking about like the Necker Cube and these phenomena where your conscious experience of something switches right on this slow timescale. Have you thought about that and how that might fit into the model? [01:01:49] Speaker A: Yes, we have actually. In fact we have several examples like that in the lab at Vicarious Square. In fact, precisely the Necro Cube one. We have created that one and replicated it. Just did not get it into the paper. So There's Necker Cube 1, the Face WA solution. [01:02:12] Speaker C: There are a few of them. Yeah. Handful of them like that. [01:02:16] Speaker A: You can make them flip by just randomly perturbing the network. Or you could also even see. Because we have full access to the model. You can also see which nodes should I put up? If you, if you want to do the minimum amount of Perturbation, which nodes in the network should I put up so that I flip the illusion, flip the explanation to the other thing? So, yeah, those are fascinating illusions to play with because they are all, I would say, results of doing inference on. [01:02:52] Speaker C: A model to the inference to the best explanation. Yeah, right. Dalip, this is great stuff, as always. I mean, you must be very confident in the model. I mean, let's say a healthy skeptic's confidence, perhaps. That's right. [01:03:09] Speaker A: Yeah. You don't want to be too confident. My philosophy is that you want to be the most skeptical about your. Your own model because you. I mean, only you can be, because you know what works and all the nitty gritties of what works and what doesn't work. Right. And I mean, it's. It's in the right direction, I would say. There are so many things to be worked on and improved and some fundamental problems to be fixed, and we will be tackling all of those as we go. But I would say the general direction in which we are going, I am very confident about that. [01:03:48] Speaker C: Yeah, it's the Richard Feynman quote, right? The first principle is that you must not fool yourself, and you are the easiest person to fool. So it sounds like you have that attitude. When are we going to see this thing published in Nature? It's on what archive right now, right? [01:04:05] Speaker A: Yeah. We still have to do a little bit more work for cleaning it up and submit. So we haven't submitted it yet. [01:04:12] Speaker C: Oh, you haven't? Okay. [01:04:13] Speaker A: No, so we. Yeah, so I hope to submit it in the coming month. So if you have feedback, I can use it, and if any of your listeners have feedback, I can use it before submitting and maybe it will help smooth the process. [01:04:29] Speaker C: Oh, wonderful. All right, so there you go, listeners. Dalip has put the call out for feedback. So, Dalip, thanks again. So if we keep this current trajectory, I'm not sure if it follows like a power law distribution of our podcast rate, but I'm pretty sure that we should have another episode in probably about two hours from now. Might be a little bit longer than that, but it's always fun and I really appreciate you coming on and I wish you success going forward. Of course. [01:04:59] Speaker A: Thanks, Paul. It is always fun to be on this one. No, I don't intend to come back in the next two hours now. I think I have talk too much. I need to go and get some work done. That's what I should do. [01:05:12] Speaker C: Great. Happy working, my friend. Take care. [01:05:14] Speaker A: Thank you. [01:05:29] Speaker C: Brain Inspired is a production of me and you. I don't do advertisements. You can support the show through Patreon for a trifling amount and get access to the full versions of all the episodes, plus bonus episodes that focus more on the cultural side but still have science. Go to BrainInspired Co and find the red Patreon button there to get in touch with me. Email Paul BrainInspired co. The music you hear is by thenew year. Find [email protected] thank you for your support. See you next time.

Other Episodes

Episode 0

March 20, 2023 01:21:34
Episode Cover

BI 163 Ellie Pavlick: The Mind of a Language Model

Support the show to get full episodes and join the Discord community. Check out my free video series about what's missing in AI and...

Listen

Episode 0

October 23, 2020 01:23:00
Episode Cover

BI 087 Dileep George: Cloning for Cognitive Maps

When a waiter hands me the bill, how do I know whether to pay it myself or let my date pay? On this episode,...

Listen

Episode 0

September 02, 2021 00:57:25
Episode Cover

BI ViDA Panel Discussion: Deep RL and Dopamine

[et_pb_section fb_built=”1″ admin_label=”Header” _builder_version=”4.9.2″ background_color=”#ad876d” background_enable_image=”off” parallax=”on” custom_padding=”0vw||0vw||true|false” custom_css_main_element=”.podcast .entry-title {||display: none;||}” background_size__hover=”cover” background_size__hover_enabled=”cover”][et_pb_row _builder_version=”4.9.2″ background_color=”#d5a570″ use_background_color_gradient=”on” background_color_gradient_start=”rgba(26,24,68,0)” background_color_gradient_end=”#231f20″ background_color_gradient_overlays_image=”on” background_enable_image=”off” background_position=”top_center” width=”100%” max_width=”100%”...

Listen