BI 082 Steve Grossberg: Adaptive Resonance Theory

August 26, 2020 02:15:38
BI 082 Steve Grossberg: Adaptive Resonance Theory
Brain Inspired
BI 082 Steve Grossberg: Adaptive Resonance Theory

Aug 26 2020 | 02:15:38

/

Show Notes

Steve and I discuss his long and productive career as a theoretical neuroscientist. We cover his tried and true method of taking a large body of psychological behavioral findings, determining how they fit together and what’s paradoxical about them, developing design principles, theories, and models from that body of data, and using experimental neuroscience to inform and confirm his model predictions. We talk about his Adaptive Resonance Theory (ART) to describe how our brains are self-organizing, adaptive, and deal with changing environments. We also talk about his complementary computing paradigm to describe how two systems can complement each other to create emergent properties neither system can create on its own , how the resonant states in ART support consciousness, his place in the history of both neuroscience and AI, and quite a bit more.

Related:

Topics Time stamps:

0:00 - Intro
5:48 - Skip Intro
9:42 - Beginnings
18:40 - Modeling method
44:05 - Physics vs. neuroscience
54:50 - Historical credit for Hopfield network
1:03:40 - Steve's upcoming book
1:08:24 - Being shy
1:11:21 - Stability plasticity dilemma
1:14:10 - Adaptive resonance theory
1:18:25 - ART matching rule
1:21:35 - Consciousness as resonance
1:29:15 - Complementary computing
1:38:58 - Vigilance to re-orient
1:54:58 - Deep learning vs. ART

View Full Transcript

Episode Transcript

[00:00:01] Speaker A: I always start by getting passionately interested in explaining experimental facts that seem to me to be basic, and I always think about what they mean in terms of function. And I particularly, because I have a philosophical bent of mind, like to dwell on data that seem paradoxical. And I will work to make them paradoxical because I will pit some data against other data until they create a design tension, which gives me a lot of intellectual creative pressure to try to unify them. I always felt in tune, to one degree or another with the beauty of the world, and so part of my scientific yearning and passion is to try to understand more of that beauty in the short time I'm around. And the fact I'm still doing it at age 80, I can't tell you how lucky I feel. [00:01:21] Speaker B: This is brain inspired. [00:01:35] Speaker C: This is brain inspired. I am Paul Middlebrooks and oh man, I am struggling to figure out how to introduce this episode. I spoke at length with Steven Steve Grossberg, a theoretical neuroscientist at Boston University, and I'm struggling in part because I can't do his career justice in a little introduction like this. So I'll say from the outset I strongly encourage you to engage in his body of work. And it's a large body of work and it will take you time and effort, I guarantee that, but you'll benefit immensely from it. In the show notes, I link to a few of the most recent articles which we just touch on, but are also, I think, good entry points because they summarize a lot of his current and previous work. But I also link to his website which contains, as Steve says, quite a good dose of material. So those show notes are at BrainInspired Co podcast 82 more than other episodes, we really go back and forth between the science and how he thinks and does the science and its relation to deep learning and neuroscience these days. And I think that you won't want to miss any of it. [00:02:53] Speaker B: So we kind of start off talking. [00:02:55] Speaker C: About his method of developing theories and models and he ascribes his long running productivity to this cyclic method, which we go pretty deep on. We discuss lots of topics surrounding the theory he's best known for, I think the adaptive resonance theory or art, which has generated tons of models to explain psychological behavioral data in terms of how networks of neurons could produce that data. We also discuss his complementary computing paradigm, which accounts for the way that brain and cognitive systems are fundamentally incomplete alone, but when complemented, when paired with other systems, those complementary systems together compute or manifest the emergent properties of our cognition and Steve has more recently taken on the challenge of explaining consciousness, and we scratched the surface of his suggestion that all conscious states are resonant states, as in adaptive resonant theory. And there are various types of consciousness. For example, visual consciousness is what he calls a surface shroud resonance. And there are other types, and they can all interact and blend and make up our rich conscious experience. And we talk about his place in the history of both theoretical neuroscience and developing systems of intelligent behavior that predate deep learning and are more true to known biological mechanisms. So if you need an antidote to the current hype surrounding deep learning, at least as you know the savior of neuroscience and how brains implement minds, you have. You've found it here. Needless to say, I was left with many questions still for Steve. And next time I have him on, maybe when his book comes out, we'll go deeper into the theory part of adaptive resonance theory. Maybe we'll talk about how some of his models account for some specific psychological phenomenon. And I'm curious what he thinks about more recent advances in deep learning that have begun implementing some of the things that he has long implemented. Things like attention, for example. Anyway, I apologize for the long introduction, but this is a long conversation and yet still quite introductory. But I hope and trust that you'll find it worthwhile. And I hope you'll be inspired by Steve's approach and his genuine passion and motivation for discovery. [00:05:30] Speaker B: If you value this podcast and you want to support it and hear the full versions of all the episodes and occasional separate bonus episodes, you can do that for next to nothing through patreon, go to BrainInspired Co and click the red Patreon button there. [00:05:48] Speaker D: Steve, do you have any idea how hard it is to prepare to interview you? [00:05:53] Speaker A: Well, I've never done an interview of myself, but I have been working like a dog for 50 years and have been lucky to be productive for all those years. And one anecdote comes to mind at a conference that a colleague of mine attended that I had organized. He's a very famous productive man and he started by saying, Steve writes faster than I can read. Now, that's literally false, because before I write anything about a problem, I might have thought about it from five to 20 years until I had a sense of what the underlying principles that lead to the data. But then once I get it, then I and my colleagues can move forward quickly toward clear goals. So, yeah, it's been 50 years of hard and productive work. And if I can add, yeah, I can add. It's very interdisciplinary, which Makes it even more challenging because I try to study how our brains make our minds. And so one needs to study psychology, cognitive science, neuroscience. And since I try to understand it the way a theoretical physicist might, through principled computational theories that unify vast interdisciplinary databases, it also involves mathematics and computer science. And because we spin off many applications in technology and engineering, there are also engineering and technological issues to keep in mind. So it's highly interdisciplinary. [00:07:54] Speaker D: Yeah. So you've been working like a dog, riding like a dog, thinking like a dog. [00:07:58] Speaker A: Well, I'm not sure I go so far as thinking like a dog, if only because I don't have a clear empathic understanding of how a dog thinks. But I would like to think that my human prefrontal cortex gives me certain thought advantages, including language beyond what a dog would think on a typical day. [00:08:24] Speaker D: Yeah, well, you've been really prolific, which makes it insanely difficult to prepare to speak to you. And I have not read all of your works. However, not only have you worked hard and been prolific, I had a previous guest on many episodes ago, and we were talking about Terence Deakin, and the guest told me offline, you should have him on the show because he's a really, truly original thinker. And I kept coming back to that phrase, original thinker, as I was reading your work and learning more about you. To me, you seem like a true original, which is so it's, you know, it's an honor to speak with you, and I want to understand more just how you think. So let's spend the first few minutes maybe just talking about your brain and mind, and then we'll start talking a little bit deeper about some of the work that you've done, the vast majority of which we won't be able to get to. I know that you started off interested in psychology, and you started off in thinking about some of the odd psychological findings that seemed hard to explain, and you sort of went down that road, and then later on, you realized, oh, I need to learn math. Is that right? [00:09:42] Speaker A: Well, it's on the right track. In brief, I, like so many other freshmen in college, my college at the time being Dartmouth College, took introductory psychology, and part of the textbook spent time on how we learn lists of things and how animals learn. Well, by learning lists of things, they could be anything from the Alphabet to a grocery list to a song to written text. How do we learn things in a given order, and then how do we perform them in that order? So there was a large psychological literature about it, and I found it incredibly paradoxical. Because in particular, there's something called the bode serial position effect, which says that you can learn things at the beginning and the end of a list more easily than in the middle. It's sort of like you remember how that relationship started and you definitely remember how it ended, but the middle is a model. And the question is, why wouldn't it just get harder and harder to learn as you got further and further in the list? Why would the end be easier? And it had to do with the fact, because after you finish practicing the list on a trial, you sort of rest for a little while. And during that rest period, in a literal sense, events go backwards in time to make the end easier. And that, of course, fascinated me. You know, I was 17 years old. I was, you know, as with so many young students, I was, you know, a philosopher king. And the idea that there were data that talked about things going backward in time gripped me and I never let go. And because it's list learning in time, I was forced immediately into studying dynamics in differential equations, for example. Let me give you a simple example. Let's say I want to learn a list A, B. Well, no sweat. But it's also the case that when you practice ab, you've also been learning to some large degree, BA let's call backward learning. But what if you learn A, B, C? Well, you can learn A, B, C. That's a forward arrow in time. And the association from B to C is much stronger than the association from B to A. And so there was an asymmetry that preferred moving forward in time. Well, what did that be? So you just have to say abc, and you're already in a world of profound philosophical and scientific issues. And fortunately, this was a wonderful classical database. Great psychologists like Hovland and Hull and many others had studied it parametrically. So yet something to hang on to. And doing that, I derived that within 1957 neural networks, because things could go forward, so you had a connection from A to B, they could go backwards. So you have to have a connection from B to A, and you could learn it either way. So you have to have an association from A to B and from B to A, and they're different. And I also realized some things are going on quickly. Activating A, activating B, activating C. And those traces of activation fade through time. But superimposed on that is the learning process which lasts longer. So I was led to derive what's called the additive model, the neural network model, with short term memory traces or activations occurring quickly in Time and long term memory traces, which are where the learning occurred. And those equations are now used almost universally by every biological modeler since I discovered them in 1957. So it was a good start. [00:14:42] Speaker D: Just to make sure everyone's on the same page, when you say short term memory, what, what you're talking about are the activation functions. Activation, Right. [00:14:51] Speaker A: That when you hear A, there's a group of cells in our brain, due to prior experience with A, that get activated for a while, but they don't stay activated forever because if they did, it would be A for the rest of your life, you couldn't learn anything. So the activation fades out after a little while. So I call it a short term memory because it doesn't last very long. On the other hand, there are connections from A, there's a connection from A to B. There's a pathway where when you activate A, it sends a signal down that pathway, which is called an axon in physiology. And when you pair A with B, A gets activated, the signal starts going down the pathway, then B gets activated. And so at the end of the pathway there is a structure called a synaptic knob, where a previous value of A and a present value of B are simultaneously active. And in the region of the synaptic knob, there's what's called a long term memory trace, an adaptive weight that learns the association from A to B because it has signals from A and the present activation of B locally at this synapse. So it can form that association. [00:16:26] Speaker B: Yeah. [00:16:27] Speaker D: So the synaptic weights that get. Or the long term memory. [00:16:32] Speaker A: A lot of deep neuroscience goes into the biochemistry, anatomy, biophysics of how these synapses carry out that associative learning. [00:16:44] Speaker D: Matt, I think one of the challenges in approaching your work for me is that you invented these terms, and different terms often are used in a lot of literature these days. So you have to map your original terms that you still use correctly onto some of the newer terms that get passed around in the literature a lot these days. [00:17:07] Speaker A: Well, within the modeling community, the terms I've introduced are very widespread. But in addition, the very idea that there were short term memory traces and long term memory traces was just beginning to be discovered in clinical neuropsychology by Brenda Milner, where they were studying hippocampectomized patients who had problems with amnesia. And so this was a contribution that anticipated the neuroscience, as many of my psychologically derived models have. And there the words are that different. They might call it LTP for long term potentiation, or LTD for Long term depression rather than ltm, which is a process that can do both. It's called Hebbian and Anti Hebbian. And then later in the 90s, famous colleagues, experimental neurosciences and my friend Wolf Singer, you know, found physiological evidence for that and started using the model. And that started becoming a standard concept. But it was 20 years later, so there's a real prediction. [00:18:37] Speaker D: This is a recurring theme in your career 20, 30 years later. [00:18:41] Speaker A: Yeah, and there's a reason why I've had that. Good luck. And that has to do with the modeling method that I grope to introduce with the following thought in we don't today and probably never will have a complete understanding of how our brains make our minds. But we can try to deepen and broaden our understanding incrementally in a principled way, where principles and mechanisms that we discover at one cycle of derivation to explain and unify a certain body of psychological and neurobiological data can then trigger the next cycle of discovery where what we've already learned points our minds to. What did we leave out? [00:19:38] Speaker B: Yeah, I was going to ask you. [00:19:39] Speaker D: About that because like we were just talking about, you start with behavior and psychological findings originally that's how you kind of got into it. But you've kept that same method all these years, I believe, where you really start with behavior. And then. Well, go ahead, if you could just. [00:19:58] Speaker A: Yeah, let me try to explain that because it's important to understand. And if it weren't for the kinds of things I'm going to try to summarize now, I wouldn't be working 63 years later. It would have hit a brick wall very soon. So as you said, I always start by studying psychological data. And there's a basic reason for that. That's because brain evolution needs to achieve behavioral success, because psychological data define behavioral success. And so if you want to understand the brain mechanisms that were selected by evolution, you have to start with behavior, because that is where the environment is influencing what individual or species will survive. But more than that, I studied dozens or hundreds of experiments at a time time. Because if you study too few experiments where the topic is multifaceted as our minds, there aren't enough constraints on one's thinking. And if I just know one fact, I can have 20 thoughts about it. Because I'm a theorist, I think about data. But if I'm studying a database of several hundred experiments, you have to struggle to find any interpretation that isn't just nuts. [00:21:28] Speaker D: Do you start with one and then you theorize a bit? And then you move on to the next experiment and see if that constrains. [00:21:34] Speaker A: No, there's a kind of holistic, visual emotional resonance with the database. But what I do is the key. You know, when you're looking at data, what is it could be so boring. You know, you're looking at a lot of curves on paper or your screen. You know, this varies with that. But the art of modeling is to think of the data as emerging from an individual, adapting autonomously in real time to a changing world, to think of the person who generated that data, the process that that person is experiencing. So said more briefly, the revolution is about understanding autonomous intelligence. Now how the hell do you do that? Well, to understand how to translate static data into a dynamic description of autonomy, it requires a speculative leap. There's no algorithm for it. And I didn't make up the phrase speculative leap, Albert Einstein did. So it's a mystery. You know, people ask Niels Bohr, one of the fathers of quantum theory, you know, how he got his ideas? He said, I don't have a clue. It is an artistic process. And it's like a great artist painting a great painting. You know, you can analyze it, but it's not just little dabs of color. So doing that has always led to the discovery of new design principles or organizational principles. So by studying a ton of data, you can begin to see what's behind the data. And once I see one of those principles, then I translate it into equations, into mathematics or mechanisms that can realize the principles. [00:23:46] Speaker D: Do you write out the principles in language or do you just have it in your head? [00:23:50] Speaker A: Well, you write down what the principles are in words, but ultimately you have to write equations, like the equations for short term memory and long term memory, which came out of studying verbal learning. There's also an equation for medium term memory, which is like habituation. You know, you activate a process and it gets tired, it runs down. But there are very few equations that I have discovered and others have contributed over the last half century. Short term memory, medium term memory, and long term memory are core equations. Just like in physics, there aren't many fundamental equations. There are a few more we could mention, but that's really the bottom line. It's really very parsimonious. That thousands of experiments can be simulated and explained using a small number of equations is one mark of a maturing theory. And moreover, the equations always made sense in networks. And then you mathematically analyze how the equations work when if you don't have math because you can't prove a Theorem, which is often the case in a complex system. Thank God, now we have ultra fast computers to simulate. When I started, uh, no computer, which was a blessing in disguise because in order for me to demonstrate anything, then since you couldn't just run on the computer and say, see, I don't know what the hell that's going on, but look, there it is, it's on the computer. I had to prove theorems, and that was scary. But I roast the occasion. I proved some basic theorems. But those mathematical and computer analyses remarkably can then explain much more psychological data than went into their derivation. And that's critical. You're not just sitting on a few hypotheses. And it led to multiple predictions, scores or even hundreds of predictions over the years. And the batting average has been quite satisfying because the method is so. Horsemani is so conservative. But the biggest surprise to me, you know, you're talking about how did I get into neural networks, not just networks, is that the mechanisms I derived could naturally be interpreted in terms of the brain. And that led to a new functional understanding of even known neural facts. Because I derived the networks from psychological data, and because brain evolution needs to achieve behavioral success, I had a functional understanding, a novel one often of what those brain data were doing for us in our day to day lives. And so the method crucially created a link between brain mechanisms and psychological functions, which is essentially unavoidable. If you're going to want to understand how your brains make your mind, you need the link between brain mechanism and psychological function. [00:27:28] Speaker D: You're talking about the importance of behavior. I think most people, when they get into neuroscience, and I am part of this group, don't think about behavior so much as higher cognitive processes. Right. We're interested in how the mind works, what consciousness is, and things like that. And it's interesting to think about deriving neural principles from behavior, which, like you just mentioned, evolution selects for the behavior. But then you think, well, you know, what is all the higher cognition stuff? Are those just byproducts? That's observing behavior? [00:28:03] Speaker A: Let me say a little bit more about the method because the method itself clarifies how I can publish a paper in 2017 which claims not only to explain how we become conscious and where in our brains we become conscious, and it's not just one consciousness. We have different kinds of consciousness for seeing and hearing and knowing and feeling, but also more fundamentally, and this to me was so satisfying, it still is. Why, from a functional and mechanistic point of view, was evolution driven to discover a state of consciousness in the first place. Why are we conscious? We now have a scientific proposal for that that is rigorous and explains a lot. And so it's not crazy, it could be wrong. But the weight of evidence for it is to my mind, overwhelming. [00:29:13] Speaker D: It's also counterintuitive until you learn about it and then it seems intuitive. So we'll get to that. But go ahead with the. [00:29:19] Speaker A: Well, let me get to the issue of why brain is so hard. Mind and brain. When I talk about emergent properties, the paradigm we're in, I'll come back to it. You know, we think of science as induction. This is not induction, this is emergent properties. And that's a whole different ballgame. So what I just said doesn't yet give you a complete understanding of mind and brain. There's a cycle for incrementally deepening and broadening your understanding. So at this point you would press from bottom up from the psychological data and top down from the brain data that you could explain to focus on what you left out. It focuses your attention on a design principle that you didn't not include, which you may have not even thought about without the derivation. And then the task is to incorporate that design principle into the theory in a consistent way, Going through the cycle again and again, each time explaining and predicting ever larger interdisciplinary databases. So it's a cyclic evolutionary process to understand the evolution of mind and brain. And at present there are used psychological, neurod, biological databases that are principled, unified and quantitative explanation. More than I've studied things that I never dreamt, not only that I'd study, I never even dreamt existed when I was a boy. And that's the power of the method. It forces you into the next step if you're following it with integrity. [00:31:15] Speaker D: Are you conscious of the method while you're thinking through things? Are you going from step to step or is it a natural thing for you now? [00:31:23] Speaker A: Well, it's always been natural. Just like deriving the additive model from verbal learning data. You're already going through the process because first you're thinking about the data, you're thinking about how it happens in real time all by itself. You're sort of forced into a network which turns out to be a neural network because you need short term memory and long term memory and then it has functions, because I mean not functions, it had that already. Then it has a neural interpretation because you go into the physiology literature and even if it's not there, voila, 20 years later there's hard data supporting the prediction. So That's a microcosm of the method. It's not a discrete digitized thing. It's a flow. It's a process of discovery. But I want to emphasize, having said all this, anyone who tells you we don't understand the brain in this day and age doesn't know the theoretical literature, because this incremental and rapid progress of theory construction, unification has been going on for 50 years. And we live in the world of Google. And I think 50 years is long enough for people to Google some of this or just go to my webpage, Google Steven Grossberg. You'll get quite a dose. [00:32:48] Speaker D: I can link to it in the show notes as well. But not many theories have had such a high batting average as none has adaptive resonance theory. [00:32:55] Speaker A: So art is just part of the story. It's not just art, but none of them have, because by and large, most theorists don't follow this method. They don't start with a fundamental link between brain mechanism and psychological function. But given the intense interest in our minds today in AI and all that, it's fair to ask why so many people do not even today realize this. And I'll touch on this as we go on. It's not a paradox to me, but it's sad to me, because there's so much information, even in terms of clinical explanations of clinical brain mechanisms of Alzheimer's, autism, medial temporal amnesia, fragile ecstatic syndrome, visual and auditory neglect, disordered sleep. And, you know, a lot of people would want to know that. And, you know, they could read my stuff. They can. I write it to the most general audience I can. Moreover, I want to emphasize that at every stage of the derivation, large scale applications are spun off to engineers and technologists who want more autonomous adaptive algorithms or agents. And hundreds of these applications have been developed. And, you know, in the old days, I might give a talk and there wouldn't be anyone from engineering or technology. But, you know, I founded the International Neural Network Society. And Gail Carpenter, who was my wife and very close colleague, played a critical role in all of this infrastructure. I founded the journal Neural Networks and all this. And there came a point in the 80s, if Gail or I gave a lecture, you know, there were a lot of people who were engineers in the audience. They'd run up to the podium asking where they could read it, how they could apply it. [00:35:02] Speaker D: I mean, well, this is when connectionism was kind of hot at one point. [00:35:06] Speaker A: Well, we were. I preceded connectionism. [00:35:09] Speaker D: Oh, I know, yeah, yeah, but you said the 80s, and that's one of the peaks of, well, by, well, people. [00:35:17] Speaker A: Who weren't doing back propagation. Well, first I bring it back to the 60s and 70s because Shinichi Amari was one of the first people to pioneer backpropagation in the 60s. And then Paul Werbos, David Parker, people like that popularized it. Not popularized it, but brought it to its modern form in the 70s. Paul Werbos deserves a huge amount of credit because he developed the algorithm into its modern form and he did the first applications for his PhD thesis at Harvard. And I know Paul then and he came to me to talk about his thesis. And it was only till 12 years later that Rommel, Hart, Hinton and Williams popularized that crop without proper citation I might add, although they all knew Paul's work. But I am getting off the thing because you had asked me a little bit before how I think that helps me to do this. So as I've already indicated, I always start by getting passionately interested in explaining experimental facts that seem to me to be basic. And I always think about what they mean in terms of function. And I particularly, because I have a philosophical bent of mind, like to dwell on data that seem paradoxical. And I will work to make them paradoxical because I will pit some data against other data until they create a design tension, which gives me a lot of intellectual creative pressure to try to unify them, you know, to make a synthesis of what seems to be, you know, opposite. And I call such attention a dilemma because it is a dilemma for me. And so there's the stability plasticity dilemma from which I derived adaptive resonance theory. There's the noise saturation dilemma from which I derive the additive and shunting models for short term memory dynamics. So it's getting this pressure that forces me. [00:38:02] Speaker D: But it's the dilemma that you're like a kid in a candy store. Is that what really drives you as you see the dilemma and it's a problem to solve and you get excited about it? [00:38:11] Speaker A: I get excited by what I see as the mystery latent in the fact that I can look at a fact and say, wow, how is this possible? Part of my method is to stay naive. Meaning if I think I'm going to be interested in a topic at a future time, I stay clear of it until I'm ready because I don't want to habituate to any of it. I don't want to say oh yeah, oh yeah, I know that I want to come into it like I don't know anything. I want to feel this is just one big mystery to me and I can't take anything for granted. So the ultimate for me is if I can derive a neural model from a thought experiment or a gedanken experiment. And you know, Einstein was very famous for his thought experiments. That's how we do relativity theory. And when you can use a thought experiment, you know, you've touched on designs that are fundamental. And in this case there are designs that I believe help drive brain evolution. So the hypotheses that go into my thought experiments have always been facts that we all know from our daily lives. And you might say, isn't that unexpected? I'd say, no, we know them because they're ubiquitous. Environmental constraints on our brain's evolution. That's what our brains are dealing with every day, in every way. They're pressing on us continually. And so to understand one of my thought experiments, you don't have to know anything, you just have to say, yeah, sure, sure. For example, one of my deepest thought experiments, let's say to derive the additive model would be like if I practice ab, I can learn ab. And then if I say A, I can predict B, those are hypotheses. Every child knows that. So for example, in my 1980s Psychological Review article where I derived the foundational concepts and circuits of adaptive resonance theory, I did it from a universal problem about how we correct errors in a world that's always changing. So error correction is at the heart learning. If you don't make mistakes, you can't correct them with new knowledge. As to whether my method of thinking came naturally, I'd say that it came naturally and, or, but depending on your emphasis, I learned so much about how to do it better with lots of experience as a theorist over the years. And so how do I mean it came naturally? As I just noted that when I was a Frenchman at Dartmouth in 57, I studied the facts about human verbal learning and derived the additive model. You know, so I didn't know anything. I knew nothing about the brain. In fact, when I derived the networks, I knew no neuroscience. And it was only because I had pre med friends who were telling me about all they were learning about neurons and axons and synapses and, you know, action potentials. And I said, my God, I just derived that and more because I have rigorous models and you know their properties. So it was a purely top down. [00:42:16] Speaker D: Psychological derivation has knowing about. So you derive these models without knowing about neuroscience facts, many neuroscience facts. Has learning about brains helped or hurt in those efforts? Because in some sense it's purer if you're coming just from the Data through the models, Right. [00:42:39] Speaker A: Well, I always start with psychological data for the reasons I said, now I know an immense amount of facts. In fact, 30 years ago already my experimentalist friends were saying, grossberg, how do you know so much data? And it's because the models organize and unify it in functionally meaningful ways. Not all of it, never all of it, but by compressing so much data into packages of functional meaning, you know, it's called chunking. I could chunk vast databases, not as thousands of experiments, but you know, oh, this is data about 3D figure ground perception, you know, and there might be a few hundred experiments and I can visualize them in the context of my understanding of them. So it's a huge compression. And that's how scientific progress is always gone. Things that used to fill books to teach you in school now are a few lines. Just like in physics, so much physics is compressed in a few equations. [00:43:56] Speaker D: Yeah. Speaking of physics, I wanted to ask you because you've written about how physicists of old were physicists and psychologists. Essentially they were interested in psychological phenomena and physical phenomenon. But there was this divide because the theoretical mathematical tools were already there for physics to understand the universe, the physics of the universe, but the math and the theory was not laid out already for the psychological phenomena. And part of that is non linearities and non stationarity which you've talked about. So I don't know, could you just sort of discuss that divide and the way it's progressed? [00:44:42] Speaker A: Well, you've already hit on some of the high points, so I'll just give you a little bit of my summary. So as many people know, there were two major revolutions in physics during the early part of the 20th century. On the one hand, special and general relativity due to Albert Einstein, and then quantum theory and quantum mechanics due to Bohr, Heisenberg, Dirac and others. And both of those physical revolutions required brilliant new intuitions. I mean, Einstein's thought experiments were a work of genius to see to the heart of the matter. But then with the intuitions you can build upon no 19th century mathematics. And let me give you illustration. So Einstein's special relativity only needed elementary math like algebra. And his general relativity used what's called Romanian geometry or the geometry of curved space that had earlier been discovered by the great 19th century mathematician Bernhard Riemann in Germany, which is what's called Riemannian geometry. So the math was there for relativity. In quantum mechanics it used matrices. So you had the matrix mechanics of Werner Heisenberg and then Erwin Schrodinger had the Schrodinger equation, which was a linear partial differential equation which David Hilbert had discovered and developed in the 19th century. So that was all new intuition in old math. So once they had the insight, they had the technical tools to rapidly develop it. But as you noted, my own work needed both new intuitions and new mathematics, including as you noted, non linearity, the whole isn't the sum of its parts, non stationarity, rapid development and learning. You're always changing. You don't look today like you did when you were three and non local, meaning long range interactions between cells in many parts of the brain. So I had to develop new intuitions in new math for nonlinear, non stationary and non local processes. And revolutions where you have to develop both new intuitions and new math are the hardest ones to understand. And the greatest example of that is the Newtonian revolution where he had to introduce both intuitions and equations for celestial mechanics. And that's one reason it took so long. Apart from the fact they didn't have Internet and Google and everything else. It took Voltaire many years later to push the Newtonian revolution on the continent of Europe. If people didn't just say, wow, what a genius, let's do it. No, that's not how it worked. So because of the fact that rapid development and learning, or core constraints on mind and brain, it's a different ballgame. It's part and parcel of the problem of rapid self organization. And if you look at the physics revolutions, they didn't explain self organization of matter. From my limited understanding, I don't think they have yet. So. But we can't avoid it because of the nature of the human condition. Here today, gone tomorrow. [00:48:36] Speaker D: That's just the way it is. [00:48:37] Speaker B: You give the example as well of. [00:48:39] Speaker D: People like Helmholtz who did so much on both sides. So that's like right when this division happened is when Helmholtz was doing physics work. Yeah, yeah. [00:48:51] Speaker A: Well, Helmholtz was a very great scientist and also a transitional figure because for example, he would be interested not only in light as a physical problem, but also in seeing or sound as a physical problem, but also in hearing. And as Helmholtz collected data and thought about mind and brain processing, he got the hell out. Because he realized he didn't have the math. And so he then spent the last part of his life doing physics. He understood he was getting into waters for which he didn't have the right source. [00:49:46] Speaker D: And yet he laid the foundation for so many things as well. [00:49:49] Speaker A: Helmholtz did Maxwell Clerk Maxwell, electromagnetic theory, kinetic Theory of gases, he worked in vision, Ernst Mach, he inspired Einstein's relativity. All of them, among other things, contributed to vision how we see, which is also one of my great loves. [00:50:14] Speaker D: Did you have this historical context in mind already when you were thinking about self organization and the need for adaption and the things that are inherently psychological processes, or is this something that you intuitively understood and then discovered the history of it? [00:50:32] Speaker A: Later when I was doing my early work and so many people didn't, well, I survived because I was always first in my class. And like at Dartmouth, I used to hand in exams in term papers that my professor generously said were the best they ever received. And in fact my younger brother was told he was six years younger than me, was told in an economics class where he went to college. I'm forgetting how the information traveled by his EKI1 professor that I'd written the best economics one term paper he'd ever read and it was at a PhD level level. [00:51:30] Speaker D: That's awful for a younger brother to hear, but it's great for you. [00:51:33] Speaker A: It was terrible. And I always did everything I could to give my younger brother love and support, but it was terrible. And one reason was my family had no money. I needed my scholarship. I needed to work hard to be sure I wouldn't lose it. And I didn't know how to stop working hard. So I was first in my class. And it's because of that that when I started doing nutty stuff, people figured, well, he's smart, you know, maybe there's something to it. Even though they didn't have a clue what it was. [00:52:13] Speaker D: Well, yeah, you've been hugely motivated and driven, it sounds like your entire life. I mean, I know you grew up in York, which you loved, but among in a community where the competition was pretty stiff, right? [00:52:27] Speaker A: Oh, very much so. I mean, I went to what's called Public School 69 in Jackson Heights, but at that time there were very bright, highly motivated and competitive Jewish students because there was a Jewish quota in all the colleges. Maybe no more than 2% or something of students could get financial support in the colleges you might want to go to. So it was fiercely competitive. [00:53:00] Speaker B: Yeah, yeah. [00:53:02] Speaker D: And add to that your own internal drive and your brilliance and then that creates a recipe for someone who's going to do a lot of stuff, which you've done. [00:53:11] Speaker A: Yeah, well, I, I found a lot of beauty and harmony in science. I'm a kind of Einsteinian Jew. I never felt the need to express spirituality in groups. I always felt in tune to one degree or another with the beauty of the world. And so part of my scientific yearning and passion is to try to understand more of that beauty in the short time I'm around. And the fact I'm still doing it at age 80, I can't tell you how lucky I feel. [00:53:57] Speaker D: You're 80? [00:53:58] Speaker A: Yeah, I'm 80. [00:53:59] Speaker D: I didn't know that you had reached 80. Congratulations. Happy 80th. Thank you. [00:54:04] Speaker A: On New Year's Eve I was 80. [00:54:08] Speaker B: Oh, is that right? [00:54:08] Speaker D: Wow. [00:54:09] Speaker A: It's a wonderful time to be 80 when you're an adult and a terrible time to have a birthday when you're a kid because everyone's celebrating the new year and you never get a birthday party. [00:54:21] Speaker B: Oh yeah, yeah. [00:54:23] Speaker A: As an adult we always had a dinner party or a house to celebrate the new year and snuck my birthday cake in at the end. [00:54:31] Speaker D: Everyone's very happy and celebrating. Yeah, that's great. Okay, so you have done so many things and you were talking about back propagation and Werbos and him deserving credit that he, you know, he often actually gets cited. But you're right, it's usually Rumelhart, McClelland and Hinton and that crew that gets cited when back propagation is brought up. And one of the things that you and that I didn't know until I started reading your work is that you developed the what's called the additive model, which people know as the Hopfield network. And I don't know if you want to go into that story so much that you developed the Hopfield network before it was called the Hopfield network and it was later co opted and you know, given a new name that has since been run with but and this has happened multiple times and I assume it'll continue to happen in your career, which must be extremely frustrating to you. And I'm wondering. So I don't know if you want to just tell that story, but I'm also wondering if you feel so science is a self correcting institution. Very slow, but it's supposed to be self correcting. And I'm wondering if you feel like sociologically, culturally, will science historically self correct and more fully recognize your original contributions. [00:55:55] Speaker A: Okay, well, I couldn't predict the present, so I am reluctant in to try to predict the future. But this being said, my works actually had a profound effect on the entire modeling field because as I just indicated, almost all biological modelers use variations of laws for short term memory, medium term memory and long term memory that I introduced in the 60s and 70s, let alone many of the models that I pioneered with many college colleagues. And most biological neural models are built on that foundation. But because I've been so productive, which is a blessing that I'm always thankful for, often a model that I discovered and then just had to go on to the next discovery was attributed to a later user of the model who tended to spend a lot of effort marketing it and its variations. They sat on it because they're not as creative and people. Those people include Larry Abbott, John Hopfield, Tayo Vahonen, James McClellan, David Roman Hart. They've all built reputations on models that I, and sometimes me and colleagues pioneered and developed. But if you add up the citations that include uses of my work, I mean, my citations are now like, I don't know, 77,000 and an H index of 128, meaning that there are at least 128 citations of 128 of my papers. If you add up all the citations of work that has market in my discoveries is well over 200,000. But. So I hope, as you were saying, that as people better understand the history of discovery in our field and can fluently read the literature, more people will realize the full impact of my discoveries over the past years. I hope very much for that. [00:58:07] Speaker D: I do too, but. But you know, people are lazy and are also busy with their own things. [00:58:13] Speaker A: So I'm very appreciative of that. And that's why I'm so grateful for the support I have had. You know, like I'm not ignored. I've been given multiple prizes for over 30 years, so I'm not feeling forgotten or ignored. But I do think it's true that if people were aware of the overwhelming impact of my work, even with others names attached, you know, I'd be on Morning Joe, you know. [00:58:51] Speaker B: Yeah, right. [00:58:52] Speaker D: And maybe receiving touring awards and such. [00:58:55] Speaker A: Yeah, that was sad. That made me sad. [00:58:59] Speaker D: Yeah. Schmidt Huber, I think, has written about this as well here again. Yeah. The politics just are rife within science. Do you think that it has anything to do? You are recognized, but do you think that you'd be more recognized if you worked with more colleagues, if you didn't do so much of your work solo? [00:59:23] Speaker A: I've worked with over a hundred PhD students, postdocs and faculties, so I know you have. That's much more. That's why I'm wondering more than Hinton or Rumel Hart or Abbott. I'm wildly collaborative. It's not that it's the nature of the work and that I'm passionately committed to science, not self Promotion. I don't put my name on stuff the way they do and then market it to death. [00:59:53] Speaker D: Yeah, I don't really know how that works. [00:59:55] Speaker A: Well, it works that way. They put their name on my stuff. That's how it works. [01:00:02] Speaker D: And we can't. And how do we police that except for just repeating things? [01:00:07] Speaker A: I would have a choice of dropping everything and fighting for priority and giving up my life. As a creative scientist who's been given a gift that is more precious to me than their hypocrisy. [01:00:25] Speaker D: So it's. So then you're dependent on other people to fight for you, which I'm dependent. [01:00:29] Speaker A: On people who love science enough to want to know the source. Because the fact is someone who just takes something they didn't discover, you're not getting the whole story. You're not getting the foundations, you're not getting the richness which can lead to much more work. That's why they sit on stuff forever. They're not creating, they're marketing and applying one idea. But I don't do that. I keep spilling it out and try to make it clear. And then I honor my gift by moving on. [01:01:11] Speaker D: But with your heart breaking along the way. I suppose it must. [01:01:14] Speaker A: Yes, it has been. There have been some dark times. But I want to make another comment about impact. One of the hardest things to understand about how brains make minds is that psychological facts and functions are emergent properties of large numbers of neurons interacting in networks via these nonlinear, non stationary and non local interactions. So it's not just induction. It's you go from very simple hypotheses to very deep understanding because you can understand how those simple components interact to generate an emergent property which maps directly on to behavior. Understanding emergence requires intuition. It's not technique, it's intuition. And for that you have to live with the data. You have to live with the concept. It's not easy. And that's why the people who have been marketing. It's not big. The emergent properties are trivial in their work. Work. They might have one equation or something. But to really understand our minds, the emergent properties at the heart. And von Neumann, John von Neumann understood that, he even commented about it, that you'd have simple components and profound emergent properties. And it's happened over and over and over again. But I can tell anyone as story with no technique, which a lot of my papers try to do, to lead from simple facts and hypotheses to rather deep explanations. And you can follow every step of the story because it's like A gedanken experiment. But that requires an attention span and our society has reduced the attention span to a matter of seconds. [01:03:29] Speaker D: That's right. [01:03:30] Speaker A: So it's not because of technical or anything. To get at the emergent property you have to be able to do a little listening to a story. [01:03:42] Speaker D: What story do you tell in your latest book, Conscious Mind, Resonant Brain, how each Brain Makes a Mind. [01:03:51] Speaker B: It's not available quite yet, is it? [01:03:53] Speaker A: No, in fact I was just sent alleys of chapter three yesterday and it looked lovely. So it's going into production? Well, it'll be published by Oxford University Press. It's going to production very soon. It has a preface in 17 chapters, each of which provides a self contained and intuitive introduction to many aspects about how our brains make our minds. Each topic, I mean each chapter is on a different topic. I started with exciting examples that everyone can appreciate with no prior knowledge. And then it builds in gradual steps to keep teaching and explaining more and more exciting facts about how our minds work or fail to work when we have one of a number of mental disorders. So it's totally non technical, it's aimed for anyone who's interested. That doesn't mean that everyone might want to read to the end of each chapter, but everyone can read the beginning and can go as far as they want. [01:05:02] Speaker D: So it doesn't introduce and describe things like adaptive resonance theory. [01:05:07] Speaker A: Yes, in a very self contained way. I even include the Gedanken experiment, the tort experiment, because all you have to ask is the simplest questions about how you can correct an error when no neuron in the network knows that an error has occurred. How do you get interactive intelligence when all the parts are stupid? And that's the miracle of understanding our minds. And the book also makes clear how you can use the discoveries to lots of important outstanding problems in engineering and technology. And one of them that I really am fascinated by and wish I were younger so I could contribute to more, is the design of autonomous adaptive mobile robots. And There are over 600 figures, lots of pictures to make all the explanation very vivid. You know, it's a very big book and it could have been very expensive, but I got Oxford University Press to agree to let me subsidize the cost of production. So it will be selling in hardcover for no more than $35. [01:06:27] Speaker B: Oh wow. [01:06:29] Speaker A: Which is sort of for much less. People are charging at least that amount and I might even go beyond, because I don't want course I don't know if anyone will pick it up, but I want certainly Students and you know, anyone interested in mind clinicians, all scientists who care about mine and physicists too, because this is really a revolution that's relevant to physics. Until physics can do for self organization of matter what work that I, my colleagues have done for mind, we won't have a complete physical theory. I don't think that's a speculation by a non expert. And so I hope anyone who knows about it and has any idea about how much I try to explain. We'll try it. It's written for everyone. [01:07:31] Speaker D: Do you know when it's going to be available? I'm sorry if you mentioned this already. [01:07:34] Speaker A: Well, all I know is it's going into production and it might take them six months. I don't know, I don't know. But you know, I've been very heartened, you know, to get Oxford to be willing to do it. They asked quite a few people what they thought of it and I want to motivate the reader. Two of them said I'm a genius. One of them said I'm one of the most creative thinkers in any science. And one of them said I'm one of the great scientists of our time. So if they're interested in a magnum opus written for the general public by one of the great scientists of our time, please go for it. [01:08:24] Speaker D: Oh man. I know you've been shy your entire life, I know that. But you're not unconfident. So I want to ask. [01:08:36] Speaker A: That's a very interesting remark and very true. I mean, like G.H. hardy was one of the greatest mathematicians in Britain in the 19th century and he more or less said as a paraphrase, if you don't know your own worth, how can you expect anyone else to know it? And it's not so much confidence as passion, hot unbridled passion that's funneled through a method that puts huge pressure on pushing me forward. As to shy, I've always been extremely shy, which is probably why in part I became a scientist. It's something nerds do. Nerds can open new worlds through science. I also love the arts, but I'm not good enough to be an artist on the highest level. [01:09:44] Speaker D: Well, you've talked about how your shyness has been fairly crippling at certain points in your career. And you've written about that. But I'm wondering if it has served any beneficial purpose. [01:09:55] Speaker A: Oh, I think so. Because what a quiet guy like I am doing at my, in my chair or at my desk, that's where I feel confident. But don't get Me wrong. I've always loved being with people. I've always been a sweet guy. Unfortunately, I grew up in a family of, of Hungarians who don't encourage lovey kissy behavior. And when I would try to hug or kiss someone, they'd say, oh, fish much? They'd make fun of me. So I learned not to do it. But as soon as I was able to extend myself as a teacher, as a friend to others, Huggy Kissy Stevie is right out there up front. [01:11:01] Speaker D: Why do you think that is about the Hungarians? [01:11:04] Speaker A: Don't ask me. Read their history. You know, I mean, what can I say? I can't even begin to say, ask a social historian. [01:11:17] Speaker D: Well, I think you just created the title for this episode. Huggy Kissy Stevie. [01:11:22] Speaker B: That sounds like a. [01:11:24] Speaker A: If that'll bring people in. Good. [01:11:29] Speaker D: So you, Steve, you've done one or two things in your career. Maybe the most known thing and the. [01:11:37] Speaker B: Most that you've worked on is adaptive. [01:11:39] Speaker D: Resonance theory or art. [01:11:42] Speaker B: And there are multiple things that I would like to discuss with you. [01:11:45] Speaker D: But let's talk about art just for a few minutes here. And of course we, we won't be able to really even scratch the surface of all the variations and all of the different cognitive processes and behavior that it accounts for. But adaptive resonance theory does a lot of things, and in various instantiations there are a few driving principles that have led to the development and driven the. [01:12:12] Speaker B: Development of art throughout the years. [01:12:15] Speaker D: And I think you've already mentioned one of them. The stability plasticity dilemma is that the main fundamental principle behind art that's given rise to other various things. And we should just state. Maybe you should just state again what the stability plasticity. [01:12:32] Speaker A: Well, the stability plasticity dilemma is just how can you learn quickly without being forced to forget just as quickly? So I had never seen your face before, Paul. I learned it within 10 seconds. And I'm not afraid that I forgot my brother's face. So it's how you can learn quickly without being forced to catastrophically forget other things. Learning quickly, remembering for a long time in a stable way, that's the dilemma. And why it's a dilemma is if the rate of learning is rapid, and why shouldn't the rate of forgetting have that same rate? Just like short term memory traits, it goes up fast, it goes down fast. That doesn't happen with learning. So. Well, maybe I could tell you a little bit about art and then come back. So I think it's key that it can be derived from a thought experiment about how a system can learn to correct errors in a changing world that's filled with unexpected events. And it needs to do so using what I call only local operations that can be computed by computations that take place either within the cells of the network or by signals that travel along connections that exist. So everything just flows along by local interaction. Now I'm going to make a summary statement about what art is and then I'll unpack it. And that's important because later if we get into backprop and deep learning, each of the things I'm going to say is not true about those algorithms. So art's an explainable self organizing and self stabilizing production system in a non stationary or changing world. So what do those words mean? So art's explainable because its activities or short term memory traces select what I call critical feature patterns. These are the combinations of features to which we pay attention and which we select because they're going to predict effective outcomes and actions. And so we suppress the irrelevant stuff and we learn what's relevant and we embody only the relevant data in our long term memory, in our memories. And that's both in bottom up adaptive filters that go from features to recognition categories which compress those critical features and then the categories read out learn top down expectations which also focus on the critical features. Art is self organizing because it can autonomously all by itself learn using arbitrary combinations of unsupervised, supervised or supervised learning. Unsupervised learning means stuff just happens in the world. And art classifies it based on similarity relationships that it discovers in the data. You don't need an external teacher in general, in the world supervision. And art is the world itself. You make a prediction and it doesn't happen. Art is self stabilizing because its attention mechanism prevents catastrophic forgetting. I never worry that because I learned your face I'm going to forget 20 faces of people I know and love. [01:16:33] Speaker B: This is one of the huge problems with deep learning these days, which maybe we can get into later. [01:16:37] Speaker A: Oh yes, it's built into the guts of it. All the things I'm saying are problems for deep learning. Not explainable either. If it makes a prediction, you don't know why and therefore you can never make it. Use it to make a prediction about something important, like predicting a financial decision or a medical decision. You can't do it, I mean, you'd be sued, you'd lose everything. And it's a production system which is terribly important because it uses hypothesis testing to discover and learn rules. And it does it with this top down matching process from the categories to the features, it focuses attention. And by doing a search through this hypothesis testing, not only can you do that, but you can interpret what's in the critical feature patterns as fuzzy if then rules which tell you what combinations of features in what numerical ranges will make an effective prediction. It's totally explainable. You know exactly why if you're a doctor and Gail Carpenter has used AHRQ for medical database prediction with hospitals because they know what the reasons are. A doctor can look at the reason and say hey, this is nonsense. But he can say hey, you know, I hadn't thought of that. Let me think more about that. So it would only be an aid. All of these things should just be an aid to a human expert. But it can give complementary information. And attention in art is regulated by what's called the art matching rule. [01:18:29] Speaker B: Yeah, let's focus on the matching rule and match based learning. Yeah, if you could describe it, that'd be great. [01:18:35] Speaker A: Yeah, sure. So what's called the art matching rule regulates object attention. In art, it's how you pay attention to the features that you would use to say oh that's an apple, oh that's a pear, oh that's a banana. And not irrelevant features that you'd never incorporate into your prediction. And what happens is that the top down learned expectations contain both top down learned excitatory signals. Those are the critical feature patterns balanced against inhibitory signals. So you have excitatory balance against inhibitory. So you can sensitize or prime the cells that correspond to the critical features to get them ready for a bottom up input that may or may not match them. [01:19:32] Speaker B: Am I right in thinking that the inhibitory balance is spread among all of the bottom up? [01:19:38] Speaker A: It's much broader. So that even though where the prototype comes down to the features, they're going to be modulated and a little subliminally subthreshold activated. All around it is a trough of deep inhibition. Yeah, so you're just not paying attention to that irrelevant garbage. You're looking for something. So the features in a bottom up input pattern that match this top down expectation can be selected might be just a subset of features and we say that gain amplified, they're brought into a good dynamic range and they're synchronized so they're all firing together. So it's a bound state and that leads to an attentional focus on what's important in what you're looking at. And when the bottom up and the top down signals match well enough, there's Positive feedback from the selected features which send signals to the category, excitatory signals, which sends excitatory signals back down to the selected attended features. And this closes a positive feedback loop between the selected features and the chosen recognition category. And that loop, when it cycles VM vroom, vroom. It creates what I call a feature category resonance. And it's that resonance state that triggers new learning in the bottom up adaptive filter from features to categories and in the top down learned expectation. And that's what helps to solve the stability plasticity dilemma. Because the suppressed features, the outliers, can't cause catastrophic forgetting. And you'll only learn if you have a good enough match to be in a resonant state. And moreover, I didn't realize it at the time because I didn't know all that much about consciousness, but I gradually realized it more and more. I think in my 1980 paper I already sort of realized it, but not in a systematic way, that a feature category resonance is what supports conscious recognition of attended visual objects and events. So it's recognition, it's not seeing, it's knowing, not seeing. Seeing and knowing interact, but they're not the same. [01:22:19] Speaker B: Yeah. Should we just go ahead and talk about consciousness for a minute and the relation of resonance? I think this is a really important and interesting idea that. So you're going to tease this out in a moment, I hope. But your idea is that consciousness in this resonance state, so there are these resonances that give rise, not all of them. So all conscious states are resonant states, but not vice versa. But you have these resonances that are occurring and it actually limits what we. Let me see if I can phrase this better. It improves our behavior by limiting our access to what our brain really knows. So we see a very limited amount. So the example you often give is when there are two objects and one is in front of the other. Well, our brain knows that there are can basically see through the front object and know that there's a whole object behind it. But if that's all we had access to, if we had access to all that, we wouldn't be able to reach out and grab the objects. Well, we need to be able to see that one object is in front of the other. [01:23:32] Speaker A: What you said isn't quite on track because of the way you use the word see. [01:23:39] Speaker B: Okay. [01:23:39] Speaker A: Yeah. And the feature category resonance is about knowing and we haven't talked about seeing yet. The study of vision is a big part of my life for 40 smart years, but so I just said that a feature Category resonance can support conscious recognition of visual objects. Now, I should emphatically say that I never tried to study consciousness, right? I just tried to study how our brains learn to attend, recognize, and predict objects and events as the world changes. And yeah, I would do this in many databases, over and over and over. But what then became incredibly clear was that I was explaining parametric properties about psychological data that corresponded to conscious experiences. So I was studying consciousness without knowing it. But the question was, what was the state that brought that information into conscious awareness? So, in particular, you talked about seeing a lot, but I've now classified at least six different functionally distinct kinds of resonances that support different aspects of conscious experience. And a surface shroud resonance supports conscious seeing, a visual quality. And to be able to talk about a surface shroud resonance, first you have to ask what, from a perceptual viewpoint is a surface. And that's a deep thing because, for example, you can fill in surface color and they're all in figure ground separation. You can know that. Let's say I have a. I'll give you an example. Let's say I have a. I'm drawing on a sheet of paper, I draw a rectangle, and then to the left, on one side of the rectangle, I draw a little perpendicular rectangle that touches its vertical left side. And on the right side of the rectangle, I draw a little perpendicular rectangle that touches its right side. And the pairs of sides of the two vertical rectangles are collinear. If you connected the dots, they would run right into the other. But you don't connect the dots. Rather you just have the vertical rectangle and the two horizontal rectangles touching its sides. Well, when you look at that, that's what you see. You see three rectangles, but the first one looks like it's a little closer than the two on the side. But you know that they've completed into a long rectangle that's behind the vertical one. Well, that's totally weird. That's how I got into 3D vision. How can something be behind something else when it's a picture in a piece of paper that's 2D? It's not 3D. How can you have 3D percepts from 2D images? So we know that the horizontal bar's been continued, as we say, a modally behind the vertical bar in front of it. And if you think about it, it already shows that this is not classical geometry of any kind. What is the dimension? 2D? 3D? How can you know stuff without seeing it, what does it mean to a modally complete boundary and surfaces? There's a very thorough theory now that's explained and predicted a ton of stuff that I developed over several decades with wonderfully gifted PhD students and postdocs. Let me just then say there is surface shroud resonance for conscious seeing of visual quality. There are stream shroud resonances for conscious hearing of auditory qualia. There are item list resonances for conscious recognition of speech and language. And there are cognitive emotional resonance for conscious feelings and recognizing what those feelings are. And all of them can synchronize with each other so we can see, hear, feel, and know things about the world in unified moments of conscious experience. Like you and I are doing right now. We've been resonating in a synchronous way all over our brains. [01:29:14] Speaker B: I wonder if should we bring in the idea of complementary computing and then talk about the what and where streams and its relation to consciousness? Because it's a fascinating idea that the what stream is conscious and the where stream is not. And I'd love to unpack that. So complementary computing is another one of the big ideas that you have had over the years? [01:29:40] Speaker A: Well, I don't even call it an idea. It's a paradigm. [01:29:43] Speaker B: A paradigm, okay. [01:29:44] Speaker A: It's a paradigm which embodies the nature of brain specialization. So it's a paradigm I gradually recognize when lots of my models exhibited computational complementary properties. I can give you an example what that means. But intuitively, their properties, like a key fitting into a lock, or you fit puzzle pieces together, it's very yin and yang. You need both the yin and the yang to have reality. So complementary computing really characterizes the name, the nature of brain specialization. And it shows that the brain emphatically does not process information using what people have called independent modules. Which is what often happens even today in AI, where an independent model, you know, you'd have one module for color, one module for depth, one module for brightness, one module for texture. And that can't work because if you look at just about any natural scene or one module for shading, every region of the scene will have overlaid color, brightness, texture, shading, depth. So for you to have independent modules, you would have to first have such a smart pre processor that it could separate all those overlaid, I like to say, multiplex properties of the scene into their separate channels. But if you had such a smart pre processor, you wouldn't need the separate channels. The problem would be solved by reductuide absurdum. So it's just can't be how we work. So in fact, I haven't talked about the attentional and orienting system of art explicitly. But the attentional system is where art learns new recognition categories. And if there's a big enough mismatch so something unfamiliar or unexpected happens, it activates the orienting system. And by interacting with the attentional system, the orienting system helps the attentional system to reset itself and drive hypothesis testing or search for a better matching or novel category. When you learn something new, and you can easily see that the attention on orienting systems of art have complementary computational properties like bottom up, top down learned, unlearned, you go through listed property, but there's a much bigger example of that which you already mentioned on larger organizational scale. So the ventral cortical stream, which is also called the watt stream because it controls perception, recognition, or what's out there, has properties that are computationally complementary to those of the dorsal cortical stream, which is called the wear stream because it controls where we are in space and how do we act in space. So knowing and doing, if you like, are complementary in their computations. In particular, as in org, the watt stream carries out excitatory matching, meaning when you have a good enough match, you go into resonance and match based learning, which means that as the resonance takes hold, you're going to learn the matched data. The wear stream, in contrast, carries out inhibitory matching and mismatch based learning. So you could see on the surface, even in words, they're complementary, but even in equations, they're complementary. What do I mean by that? Well, let's say I want to move to a target position in space. I want to move my hand to touch an object in space, to reach for it and grab it. Well, I compare where I want to move the target position to the present position of my hand, and I compute the difference between the present position and the target position. Because that difference vector tells me the direction and distance I want my hand to move to grab that object in space. And you integrate the present position, that is where your hand is. Now your hand starts moving to the target, you're integrating the difference vector. Integrate, integrate, integrate until you are where you want to be. And then the difference vector is zero because you are where you want to be. And because it's zero when you have a match, matching is inhibitory, it's suppressive. And if when you reach, you got there and the difference vector weren't zero, must demand that the signals from the target position, the present position, the distance vector, weren't properly Calibrated and so you use the difference vector then as an error signal to make sure that when you get there it is zero. And that's mismatched learning using the difference vector as an error signal. So this kind of complementary learning I call vector associative map or vam. So mismatched learning enables our spatial and motor circuits to continually adapt to a changing bodies throughout life. Because it doesn't satisfy the stability plasticity dilemma. It does experience catastrophic forgetting. Because as an 80 year old man, I don't want to have the motor controller I had when I was a five year old boy. So I got to continually update it. But it happens in a regulated way. So in our brains we have a self stabilizing art, front end for perception, recognition and emotion. Excitatory matching, match based learning, self stabilizing that controls are continually adapting. VAM network for spatial orientation action so our body can keep up. I go to the gym, my muscles have different gains, my controls have to adapt, my arms get longer as I go through puberty, the separation of my eyes changes as I get into young adulthood. I got to keep continually adapting. So we need to distinguish the complementary laws to knowing versus doing. [01:36:44] Speaker B: So for the knowing for the what pathway. Because of the match based learning, that pathway that those networks can enter into a resonant state. And this is why you say we're conscious of objects and conscious of knowing. But because of the mismatched learning in the where pathway or the doing pathway, the dorsal stream, lots of different names for it. Because of that mismatched learning that fundamentally cannot enter into a resonant state and. [01:37:14] Speaker A: Therefore it has inhibitory matching. [01:37:18] Speaker B: Inhibitory matching. [01:37:19] Speaker A: Where you are where you want to be, you get zero, you can't resonate on zero. [01:37:23] Speaker B: So that's why we're not conscious of the doing. And I'm trying to, since I've been reading about this, I'm trying to wrap my head around what the experience of being conscious of dorsal stream processing would even be like. [01:37:40] Speaker A: Well, we are aware of why our arms and legs and eyes do, but that isn't within the wear stream. There is a strong what where interaction going on all the time. They're not independent modules, they're complementary processing streams that are interacting all the time. And it's because of match based learning. You know, match based learning avoids catastrophic spaghetti. But then you say if you only learn what you're matching, how do you ever learn anything new? And that's why you need the orienting system, the complementary orienting system that when Something very different happens and creates a big enough mismatch. That mismatch activates the orienting system. You know, if I could use pictures, I could show you. And drives the search for a new category in the attentional system. So you are required by match based learning to have complementarity of an attentional and an orienting system for dealing with the expected and unexpected and then cores. You need to ask, well, how unexpected is unexpected? You know, if I, you know, change a little bit in a familiar object, am I going to say that's not you? You know, okay, you didn't shave today, big deal. Okay, you grew a beard, big deal. But your eyes, your nose, your head shape. And so we need to have a way of dynamically tuning how good a match is good enough to resonate based on predictive success. And this is what vigilance is, vigilance control. And so because we have an orienting system, we can dynamically regulate vigilance to be able to do what I call minimax learning, learn the largest, most general categories possible while minimizing predictive error. So, for example, I want to be able to both learn a frontal view of your faith, your faith and a frontal view of it, and also know that everyone has a faith. A frontal view of view of your face uses high vigilance. It's a very restrictive matching. But the fact that everyone has a face is low vigilance, a very general category. So the system always starts with the lowest possible vigilance until a predictive error occurs. And that bumps vigilance up just above the analog match value between top down and bottom, bottom up signals, which is the minimum loss of generalization you can undergo to drive a search for a new or better matching category. And a simple example is, let's say I learned to recognize a letter E and I have an E category, and now I show you an F. And because F is the closest thing in my repertoire of categories to E, I might choose the E category. And then I'll predict in an orgmap algorithm, I'll say E and then my friend will say no, F and the mismatch of E and F will bump up the vigilance to drive a search for a new category which will henceforth be associated with the prediction F. So you have to have a dynamic regulation of generalization and that can break down during various mental disorders. And I know quite a bit about vigilance control. In particular, with my PhD student, Max Versace, we showed how a mismatch in the nonspecific Thalamus can activate a part of the brain called the nucleus basalis of minor, and that can trigger signals to layer five of laminio cortical models, which releases acetylcholine. And acetylcholine can weaken after hyperpolarization currents, and that'll increase vigilance. So because of that anatomy and physiology and biophysics of acetylcholine and the basal forebrain, or nucleus basalis, it's not a surprise that that part of the brain isn't working right in Alzheimer's, in autism, and so on. And hadn't we gotten that far with vigilance control, which we could only get with an orienting system, which we can only get through complementary computing, I wouldn't have a word to say about this mental disorder. You don't just say, oh, today I got to understand Alzheimer's. Oh, yeah, yeah, go see a doctor. You know, I mean, that's not how it happens. [01:43:20] Speaker D: Yeah. [01:43:21] Speaker A: And that's what I mean. When the theory thrusts you into things you never dreamed because of the emergent properties, because of the way the art circuit works under variable levels of vigilance control maps onto clinical symptoms when vigilance control isn't working right, which has given. [01:43:45] Speaker B: You just more and ever growing confidence that the principles underlying art are the fundamental principles. [01:43:54] Speaker A: Indeed, there are two kinds of confidence. At least one, you derive it in baby steps from a thought experiment about correcting errors using environmentally familiar facts. The facts are true, the logic is impeccable, it must be there. And then there's the amazing richness of explanations and predictions. And, you know, I've been. I was, for example, in 1999, already invited to the opening keynote address at the International Congress on Schizophrenia Research. Me, a modeler, in 1999 at the biggest clinical. I mean, and it was because, you know, I woke up a few days before saying, hey, I'm going to understand schizophrenia today. Yeah, yeah, tell me another lunatic story. I was driven into it the same way I was driven into Alzheimer's and autism. And in that case, it happened that a big center in Maryland for studying schizophrenics read the paper and said, hey, our people need to know this. And they got me invited to this international congress to give the opening keynote. [01:45:20] Speaker B: Interesting. [01:45:21] Speaker A: Yeah. So I'm always being thrust into new areas. And that's why, let me say it this way, a huge risk for any thinker is you'll hit a brick wall. Make a fundamental wrong step, you'll hit a brick wall. Our theories are always incomplete. How do we know that something we can't explain is because the theory is wrong or because it's incomplete? For that you need to have a profound experimental intuition. That's my best gift. Seeing to the heart of the meaning of data. Everything else is a tool. That's the gift that I treasure. [01:46:12] Speaker B: That's like a magic trick. [01:46:13] Speaker A: Only when you don't have the gift, the magician doesn't think it's magic. [01:46:20] Speaker B: Well, that's right. There's no. [01:46:21] Speaker C: That's right. [01:46:22] Speaker B: There's no such thing as real magic. So. [01:46:24] Speaker A: And so what I was leading up to was if your intuition isn't good enough, if you throw out part of the correct answer because you thought you didn't explain it for the wrong reason, you're screwed. Then you'll never get the answer. So the fact that I'm working 63 years later means that I never hit a brick wall. And that's caused the method, the cycle of incrementally building the way I have. Because brain evolution needs to achieve behavioral success. That is such a conservative method that I haven't hit a brick wall. And I am so grateful for that. [01:47:11] Speaker B: Yeah. [01:47:11] Speaker A: But also realize it's because of the method. Anyone who can pursue the method can keep going through their whole life. And I'm going to be dead soon, so I hope some other people get better at it. [01:47:27] Speaker B: So we don't have too much time left. I would just. I'd like to bring it back to consciousness. One question about that. And then I want to talk about a little bit about deep learning and its relationship to your work. [01:47:39] Speaker A: Sure. [01:47:40] Speaker B: So you've said, and I mentioned before, that all conscious states are resonant states, but not conversely. [01:47:46] Speaker A: I should tell you a little about what that means. [01:47:48] Speaker B: Yeah. My question is, and maybe you can tell me a little bit about what that means. But my question is, what would be the difference between conscious and non conscious resonant states? [01:48:02] Speaker A: Okay. Well, first you might not be conscious because you don't have a resonance at all. [01:48:10] Speaker D: Right. [01:48:11] Speaker A: But the thing that, for example, is surface shroud resonance supports conscious seeing of visual objects. So critical to that every resonance that supports consciously seen or heard qualia needs to include within the resonance needs to include feature detectors that can support the distinctions of the experienced event, whether it's colors or brightnesses or sounds or fear. You know, it could be an external feature like color. You have a feature, detective. For it can be an internal feature like zero temperature that you have an internal feature for. And if the resonance doesn't include those features, Then it won't support conscious quality of seeing, hearing, feeling. Okay, so you might say, well, why when those feature detectors resonate, does color look red? I can't explain that. That is where it stopped. You can have a theory, and we have a theory where you can parametric explain a ton of data about what we see, when we see, in terms of our responses, our judgments about the scene, event or the self. But the actual conscious quality of seeing or theory, the theory, an equation doesn't go there. And an example of what I mean, let's say a surface shroud resonance triggers conscious seeing. So first, why is, why do you need resonance for consciousness? And that has to do with what I call hierarchical resolution of uncertainty. What it means is if you look at your retina, you look down on your photosensitive retina, or you have your photosensitive fovea, which is a region where you have your highest acuity. And a little over on the retina, there's a big blind hole as big as the fovea. It's a big, big hole and it's blind because that's where the brain gathers signals from all the photo detectors in the retina into the optic nerve. So it goes up to the brain. So you have a huge blind hole and you don't see that. But if, let's say there was something in the blind hole like a red A, you would never see it. You know, you want to complete over that. So there is a lot of uncertainty and ambiguity in what comes into our senses. And the brain needs multiple processing stages to complete perceptual and affective representations. And it's only when they're as complete as they can be and context sensitive as they can be and stable as they can be, that at that stage the brain will trigger a resonance state based on that complete representation. And in seeing that that state is in pre striped cortical area V4, which then resonates with posterior parietal cortex, that is the trigger of a surface shroud resonance. And once it resonates, V4 propagates down, right down to the lateral geniculate, using the arc matching rule to select only features, only data that are compatible with it, that are compatible with the completed representation. And then it goes up to prefrontal cortex, so the whole cortical stack is resonating. But now let's say I cut out half of your parietal cortex. I wouldn't do such a thing. But you've had a problem with your circular or you had a terrible accident like the famous Case of a rod goes through your body. [01:52:30] Speaker B: Phineas Gage. Yeah. [01:52:31] Speaker A: Yeah. Well then you might only see one half of the world. If you draw it, you'll draw half the world. If you dress yourself, you'll dress half your body. [01:52:44] Speaker B: Hemi neglect is the phenomenon you're referring to. [01:52:47] Speaker A: Exactly. And so the visual system is completely intact, but you don't have a surface shroud. Resonance. The shroud being the attentional cover that resonates with the intended part of the surface from parietal cortex. So that's a very vivid example of it. [01:53:10] Speaker C: Yeah. [01:53:10] Speaker B: And it's just an interesting idea that it's like consciousness. So below the resonance are all these murky, uncertain details that would hinder our behavior. And consciousness by leaving out those details is helping our behavior, which is a very interesting way. [01:53:25] Speaker A: Right. You don't want to trigger action. So if you used the earlier processing stages, it could lead to actions that would destroy you. That Darwinian selection would. Would really take you out. We need to be able to select the data. [01:53:48] Speaker B: Yeah, it's just. [01:53:49] Speaker A: And that's why, for example, in parietal cortex, it's doing at least two things. You have top down spatial attention from parietal cortex, let's say, to V4 to support the conscious surface route resonance, that's top down from parietal to cortex. But you also have bottom up with a attended representation. Parietal cortex is localizing a target position in space and reading out a command to either look at it or reach it, or move to it. So there's a coordinated top down attention and bottom up intention to move, move at that interface. [01:54:37] Speaker B: Deep learning, Steve. All the cool kids are doing it. It has a lot of media excited and a lot of neuroscientists are excited. There are, however, lots of differences between adaptive resonance theory and deep learning, some of which we've already talked about. For instance, catastrophic forgetting. That's a big problem in deep learning. And in fact, in some of your papers, I think you list out 17 fundamental differences. If I have that number right. [01:55:08] Speaker A: That's right, yeah. [01:55:09] Speaker B: That art can handle. [01:55:09] Speaker A: In 1988. [01:55:11] Speaker B: In 1988, yeah. [01:55:15] Speaker A: Yeah. So why don't I do a little comparison? [01:55:21] Speaker B: Sure. [01:55:22] Speaker A: So as I just mentioned, art is an explainable self organizing and self stabilizing production system in a non stationary or changing world. So as you've mentioned, deep learning uses the back propagation algorithm to learn. And some people say, which I think is really a vivid way to say it, that deep learning is just back propagation on steroids. [01:55:51] Speaker B: That's right, yeah. [01:55:52] Speaker A: And so first, deep learning is just a feed forward adaptive filter. Signals come from the input to the output. It's not explainable because all of its learning is hidden in its long term memory traces. There's no attention, there are no critical features explaining the basis of a prediction. In fact, there's no fast information processing by short term memory of any kind. It's also not self organizing because it needs a teacher to give the correct answer on each trial. It's not self stabilizing because it experiences catastrophic forgetting. On any learning trial, an unpredictable amount of what you've already learned can crash and burn. It's not a production system because it doesn't do hypothesis testing. There's no attention in oriented system. Nor can it discover rules that predict the future which could explain to you the basis of the prediction. And it works best in a stationary world whose probabilistic structure is more or less constant over time. But we can't overlook the fact that its learning uses a non local operation. It uses weight transport to physically carry weights from one part of the network to the other. It's non local and totally non biological. And people, I mean, I can't tell you how disturbing it is to read an interview with Hinton talking about, you know, the biological motivation for deep learning when he knows perfectly well it's not at all biological. This is why, because of, I'll leave out the adjectives, because of a kind of marketing that is deeply unscientific. People who can't take the time to figure out what's going on in the algorithm will think, oh my gosh, you know, they figured out how the brain works and they'll cite it blindly. It is emphatically not. It's just an algorithm. Whereas art is a principled theory about how brains make minds that can uniquely be defined derived from that thought experiment, about how you correct errors. And because of that, that's not a personal opinion, that is a thought experiment. The foundations of truly mature theory in any discipline. You can expect art in some form in all future autonomous agents, all future truly autonomous adaptive intelligent algorithms and agents should use ART in some form, because that's how you get autonomy as the thought experiment independent of any facts points to. [01:59:06] Speaker B: So I want to ask you a little bit more about back propagation because the argument, okay, so there's this recent big push to essentially for force backpropagation into the brain, explain learning in brains using back propagation principles. And the argument goes that well, it's such an efficient algorithm, but it's not, well, not Efficient. Such a powerful and obvious algorithm. [01:59:36] Speaker A: It's not powerful. [01:59:37] Speaker B: We'll get there. Hang on that. So it can. You can train conv nets to recognize images, you know, at a better rate than humans, using back propagation, let's say. And the argument goes that it's because it works so well over a long period of time. In many training examples, the brain must be doing backpropagation or something like backpropagation, because evolution surely would have selected this particular algorithm, which we have found. We can train systems, deep networks too. [02:00:12] Speaker A: But I've just talked to you about evolution and the thought experiment doesn't lead to backprop. And you don't have non local anything in a macroscopic physical system of any kind. But also the 17 problems of backprop that I listed in 1988 and that art had already solved have not been overcome by deep learning. And in fact, you know, in Axios. Hinton was interviewed in 2017. He said, well, we don't need all the label data. We got to throw it all away. And if you look at item three in my list of 17 items, it's that art doesn't need all the labeled data. And then it learns slowly. It doesn't learn how to recognize anything the way we do. I learned you in one trial. Deep learning can't do that. The only way you could do that is if you use fast learning. And what fast learning means is that you zero the error on each trial. Deep learning is wildly unstable. If you do that, it must just slowly, slowly change the weight, never going to zero the error. And so, sure, if you go through the database at blinding speed and you show the pictures of the cats hundreds and hundreds of times each. But that's not how we learn. I learn you in one trial and I'll remember your face for a very long time. And I'll remember it without catastrophic forgetting. [02:01:50] Speaker B: Yeah, right. [02:01:51] Speaker A: It's just, it's just so full. And you know, I have an article on my webpage that I have a feeling you may have seen on explainable AI contrasting deep learning and art. [02:02:06] Speaker B: This is the 2020 Frontiers article I'll point people to. [02:02:10] Speaker A: Yeah, yeah. My new audit. And I'll have a keynote address on it that I'm giving at an international conference on c. September 19th, and then that'll be on my web page. But that article makes clear that efforts to make backprop work are like the epicycles that early scientists used to try to make the Ptolemaic system of the solar system work. It didn't need a lot of the properties you needed in that case, a lot of data. So people kept adding epicycles, but that was as futile as the Ptolemaic system was. And that's why you adopted the Copernican system. My work is the Copernican system. For mine, their work is at best the Ptolemaic system. So they can do all they want. But the problems of backprop are foundational, and art doesn't need epicycles because its foundations are correct. [02:03:11] Speaker B: Do you have relations with people like Jeff Hinton and Joshua Bengio, Jean Lecun, those folks? [02:03:17] Speaker A: Well, I haven't seen Jeff for a long time. I used to be invited regularly to UC San Diego to lecture in the 80s, for example, and maybe 90. The decades run into each other when you're my age. And I knew, you know, McClelland, Rummel, Hart, all the people there. And also Jeff Hinton, I think, was maybe a postdoc. So I met Jeff, but that was most of my contact. I haven't gone recently to the NIPS meeting, where maybe Hinton goes more. It became a clique meeting, and I was invited to do the tutorial there some years ago and was happy to do so. But on the main floor of the meeting, the organizers were joking about how they ridiculed people who don't do exactly what they do. I was shocked. That's not a scientific meeting. It was just a clique, and I don't need that. There are a lot of meetings I love to go to, including Society for Neuroscience, Cognitive Neuroscience Society, International Neural Network Society and Vision meetings on, you know, experimental psychology meetings. I go to a lot of meetings where people don't make fun of others who don't fit into their narrow, brittle paradigm. [02:04:57] Speaker B: You'd think that would be divorced from scientific inquiry, but sure isn't. [02:05:02] Speaker A: Well, if you get together as a clique to do a concerted marketing campaign to push your product, buy a beware. It's just like Facebook. It's unmonitored. Beware. I throw down the gauntlet to them. The gauntlet of science. Show me the data you've explained. They don't explain anything. What's the psychological databases they're explaining? What are the physiological and biophysical databases? It's not science. It's an algorithm for technology. That's good. Use it. Please use it. But to say it's science, you've got to compete on the level of who has the most explanatory and predictive theory. They don't exist in that world. [02:06:01] Speaker B: So do you think it's a mistake for. So one of the things that is touted about deep learning, I'm lumping everything into deep learning. All of the latest AI type backpropagation models, people say they're great because they perform tasks right. And there are these huge benchmark tests that they pass. And what I'm getting from you is that maybe that's the wrong way to go about it because it's divorced from psychological reality. [02:06:30] Speaker A: Look, some of the earliest benchmarks we did were statistical methods, backpropagation, machine learning, genetic algorithms. We blew them out of the water in accuracy and or learning speed, you know, so they don't do comparative benchmarks, they just do marketing. [02:06:57] Speaker B: You and Francis Crick are. You both get cited with, you know, your weight transport problem when people say it used to be thought that back propagation is not biological, plausible. See Stephen Grossberg and Francis Crick, you know, early criticisms, but now we're thinking that, you know, and then they go on to talk about backpropagation in the brain. My question though is in 20 years. [02:07:19] Speaker C: You'Ll be celebrating your 100th birthday. Will we still be using backpropagation at all? [02:07:27] Speaker A: If there are applications where backpropagation will be the best tool to use in 20 years, then use it. If not, don't use it. But be sure that, you know, if you need it for something it doesn't do, you look around to see if there's something that'll do it. And it's a purely pragmatic choice because backprop is not a theory of how brains work. So is performance relative to other available algorithms the only rational criterion for using. And if they don't do comparative benchmarks, then they're just whistling in the wind. Then it's just marketing and self promotion. And it's not science. If they want to say it's how the brain works, I welcome them to explain a fraction of what I've explained in a principal way. They haven't even tried. They don't even begin. [02:08:25] Speaker B: So there's a lot of excitement right now about, you know, deep learning type things, leading to principle principles and new theories about how brains might work. And on the other side there you could say, well, we're never going to get to, let's say, general AI without understanding how brains work. And so to go through in the theoretical and experimental neuroscience the way that you have approached the problems and that is the true path to artificial general intelligence, whatever that actually. [02:08:57] Speaker A: Well, I don't want to get too religious about this, but if by intelligence you want algorithms that have various of the capabilities that we feel are really special in humans, you'll never do it with deep learning. And the algorithms that my colleagues and I've developed over the years have for decades been used for large scale applications. Course they do give you that kind of answer. So would I like it if even more people use them? Sure, because I think they would greatly benefit from them. But you know, in the neural networks community, everyone knows who I am and it's just a question of whether they need what we offer for a particular application and whether they have time to do it. But you know, so many of these things are warped by the political marketing campaign. Just look at what's going on in politics in the United States, how malignant false marketing can be. It can be malignant in science too, if you don't respect the fundamental principles of truth and justice and science. And it's one thing, you know, you said I was shy. Well, one reason I was shy, I'd look out and see a world of such beauty and such mystery that I, how could I hope to understand any of it? A deeply religious, reverential feeling that I still have, even though I've done a lot of work that clarified things for me. I don't think these people look out into the world at all. Because if they looked out in the world and faced the awesome harmony of it, they couldn't talk the way they do. They're just marketers. There's no spirituality there. I'm sorry, it's a different ballgame. It's a different galaxy. Explain data, please, and I'll read it. But you know, people who might be mystified by, oh, deep learning, mathematics, mathematical equations. I'm a professor of mathematics. I read that stuff like it's child's play. There's no mystery, there's no hiding. Don't build your world on illiteracy. [02:11:48] Speaker B: Are there psychological phenomenon that have just befuddled art and you? Because it does account for so much that one struggles to think of anything it might not account for. [02:12:04] Speaker A: Well, no science is ever complete and especially like, you know, the whole issue about emergent properties. Even if you've derived a model at a certain level of sophistication, you can be surprised by its emergent properties. And I often have. If, for example, in terms of Alzheimer's and autism, I wasn't expecting that. I knew how vigilance worked about acetylcholine and stuff. In a search for something, I started seeing all these articles about that kind of process. And I thought, oh, my God. And then I could see how the symptoms could happen. So to say. Was I ever surprised by the implications of models I'd already derived? The answer is yes, because of emergent properties. And that, I think, is important for people to realize, you know, the classical method of science. You do induction, right? I mean, induction. You know, boom, boom, boom. No. And so part of the mystery about brain modeling is a profound mystery is how a theorist can anticipate emergent properties of a model that he or she is deriving before it's derived. And that's happened to me multiple times. And it's really affective. I see it, I feel it, but I don't yet know it. [02:13:49] Speaker B: Right. [02:13:50] Speaker D: You know, that's the intuition, and that's. [02:13:52] Speaker A: Where the intuition is. So you know where you're heading, but you're not sure what it'll be like when you get there. [02:13:59] Speaker B: Ah, beautiful. [02:14:00] Speaker A: And that's very exciting. [02:14:02] Speaker B: All right. That's a great place to leave it. Well, Steve, I'm glad that you're part of the universe and the galaxy and our world. World. And I appreciate you spending the time with me and also just your vast contributions to the field. I hope that by this going out, even more people become acquainted with your work and start to use it. [02:14:22] Speaker A: Thank you very much. [02:14:37] Speaker B: Brain Inspired is a production of me and you. I don't do advertisements. You can support the show through Patreon for a trifling amount and get access to the full versions of all the episodes, plus bonus episodes that focus more on the cultural side, but still have science. Go to Brain Inspired Co and find the red Patreon button there. To get in touch with me, email Paul at Brain Inspired Co. The music you hear is by thenewyear. Find [email protected] thank you for your support. See you next time.

Other Episodes

Episode 0

June 30, 2022 01:20:22
Episode Cover

BI 140 Jeff Schall: Decisions and Eye Movements

Check out my short video series about what's missing in AI and Neuroscience. Support the show to get full episodes and join the Discord...

Listen

Episode 0

December 02, 2021 01:43:12
Episode Cover

BI 121 Mac Shine: Systems Neurobiology

Support the show to get full episodes and join the Discord community. Mac and I discuss his systems level approach to understanding brains, and...

Listen

Episode 0

July 22, 2023 01:24:40
Episode Cover

BI 171 Mike Frank: Early Language and Cognition

Support the show to get full episodes and join the Discord community. Check out my free video series about what's missing in AI and...

Listen