BI 229 Tomaso Poggio: Principles of Intelligence and Learning

[00:00:03] Speaker A: I always probably had a bias, maybe a wrong bias. If you look back about trying to understand a new approach in terms of theories before doing, you know, applications or demonstrations, the theory that we and many others are developing is much richer than I could have expected 10 years ago. There is much more to be done. Yeah, so it's much less trivial. A lot of interesting aspects of it and some are quite deep. So, yeah, I find it very exciting, you know, in terms of intelligence that are better than us, it may take longer than many people think. Right. So cortex is what I think probably compositional and maybe what we can simulate in computers more easily and other parts of the brain maybe not. This would be kind of ironic that, you know, the simpler part, so to speak, the more ancient part of the brain are the one that would be potentially more difficult to simulate. [00:01:31] Speaker B: This is brain inspired, powered by the transmitter. Hey everyone, thanks for being here. I am not going to be able to do justice to the career to an introduction to my guest today, Tommaso Poggio. As far as titles, I'm just going to read from the website because it's a lot. So Tommaso is the Eugene McDermott professor in the Department of Brain and Cognitive Sciences. He's. He's an Investigator at the McGovern Institute for Brain Research. He's a member of the MIT Computer Science and Artificial Intelligence Laboratory, otherwise known as CSAIL. And he's the director of both the center for Biological and Computational Learning at MIT and the center for Brains, Minds and Machines. And Tommaso has been publishing since before I was born and I am no spring chicken on Google Scholar. The first listed work is from 1972 and the title of that publication in Knetic Kibernetik, I'm not sure, it's probably German pronunciation. The title of it is Holographic Aspects of Temporal Memory and Optomotor Responses. So suffice it to say, he has been at this for a long time. And what is this that he has been at? It is studying the theory of principles of intelligence. And theory is the key word here. So Tommy is super interested in the theoretical principles underlying intelligence and to study those things. He studies both artificial intelligence and the way that brains work. So we have really cool AI. We've had really cool AI for some time. Of course it has progressed from not really cool to now really cool and really great. And of course it's getting better, but we still don't know how it works. Essentially there have been some driving theoretical principles that began artificial intelligence, but this modern rise in Artificial intelligence was driven really from engineering, from, from building it, not from understanding it. Tommaso likens our current situation in artificial intelligence between engineering and theory, as like the time between when Volta engineered the first battery and lots of applications were produced using electricity, harnessing electricity and being able to use it, versus years later when Maxwell's equations really brought out the theory of electromagnetism. And because of that theory, we were able to go on and develop lots of new and better things in the electromagnetic space, like computers and modern artificial intelligence. So in that case, it was many years. Of course, information, as you'll hear Tommy talk about, was traveling much more slowly by horse, etc. During that time. But it took many years from when the battery was built, when the engineering component of these systems was developed, and when we actually understood why and how it worked. So he thinks that we're in this time now, like that time then. And he, he has been and continues to work on the theoretical principles that drive what we understand about how intelligence works. So in this episode today, we talk about some of the principles that he's been working on that he has found to be theoretically important if you want an efficiently computable system of functions that when put together, results in a generalized, efficient computational system that could underlie intelligent behavior. To give names to those principles, one is sparse compositionality. [00:05:17] Speaker A: There's. [00:05:17] Speaker B: This is the idea that if you want to efficiently compute an intelligent behavior, it needs to be composed of many fairly simple functions that are simple in the facet that each function that composes this collection of functions, each function itself is fairly simple, meaning it takes a few variables to learn that function. And when you have that kind of system, you put them together, it theoretically guarantees that your system is going to be more generalizable. And it turns out this is a principled reason why you actually need the depth of deep networks for them to function the way that they do. So lots and lots of repeated, simple, basic functions put together sounds a little bit like the neocortex of the brain, which we also discuss whether these principles apply only to artificial intelligence and deep learning, for example, or whether they also apply to our wet brains. So we talk about his development of these kinds of theories, the reasons why he does what he does. Mostly. I enjoyed Tommaso sharing some of his many experiences over decades of working with interesting people on interesting problems, which he continues to do. I link to his autobiography in the show Notes, which is available publicly through his website. But if you read through his autobiography and we talk about a few of These people today. It's basically a who's who of, well, many important names throughout the modern history of studying intelligence and theory and scientific principles in general. Anyway, I also link to the specific paper on compositional sparsity that we discussed today and other things in the show notes, like a series of blog posts that Tommy and others are working on to communicate these ideas to the public. Those show notes are at BrainInspired Co Podcast 229. Like I said, during our conversation we discuss a small handful of the projects that Tommy has worked on over the many years that he's been in this business. But this is really just scratching the surface of what he has done and continues to do. So. I hope you enjoyed this discussion. Here's Tommaso so normal scientists experience these ebbs and flows of optimism and pessimism, you know, throughout their research careers, especially early on. Maybe, maybe that's the key, I'm not sure. But you know, ebbs and flows about their own ability to make progress and ebbs and flows about the optimism in their field as a whole. So for just as an example, when you introduced learning as sort of a fourth level to the levels of analysis framework that you and David Marr developed way back when, I kind of, I would imagine that you were feeling optimism that this would sort of introduce like a new, sort of unlock a new level, no pun intended, and sort of speed things up if everyone realized, yes, this is what we need to focus on. But it seems also looking at your work and the way that you go about what you do, it seems like you're sort of a steady marching forward under all circumstances kind of person. So, you know, are you abnormal? Are you normal in that regard? Do you feel the ebbs and flows? [00:08:38] Speaker A: I feel the absent flow, you know, I think those. You're absolutely right. There are big ones, you know, that may take months or sometimes years, and there are small ones, you know, kind of day to day optimistic or pessimistic. I can prove this theorem. I proved it. No, I was wrong. These kind of things now when, when I introduced the fourth level was kind of looking back. So it was after I decided that learning was quite a few years after I decided the learning was, you know, important. I actually, I think my first paper was on machine learning kind of nonlinear learning in back in the 81 or so. But, but at the time I decided there were other problems that one would want to look at like human vision and stereopsis, you know, how can we see in 3D before getting into learning. So I, my Kind of career about learning was a bit delayed, kind of doing these other things for about 10 years before coming back to learning. Yep. [00:10:02] Speaker B: But is that, is that because learning seemed more daunting or the other problems more interesting? [00:10:07] Speaker A: Okay, no, I think the other problems were lower hanging fruits and learning was more daunting. Definitely. Yes. You know, I always probably had a bias, maybe a wrong bias, if you look back about trying to understand a new approach in terms of theories before doing, you know, applications or demonstrations. You know, this is a question of taste. Other people prefer to try something out. If it works, then perhaps develop a theory or perhaps not at all. Like Jeffrey Hinton is more of a not at all, but. But I was the opposite. And sometime was. Was breaking what I, you know, I could have done, but that's the way my brain operates. So it was only in 1990, when I had a framework for machine learning, theoretical one, that after that I started to apply learning to every problems like computer vision, computer graphics, detection of cancer in gene arrays, you know, text classification, autonomous driving, basically everything that people do these days. Yeah, I did it with the network of the times, which were basically shallow networks, radial basis functions and kernel methods in the 90s. [00:11:50] Speaker B: Well, those were more difficult because there was lower compute, smaller networks. But they're in some sense more principled, no? [00:11:57] Speaker A: Yes, yes. So exactly. We wrote a paper with a great collaborator, Federico Jirozzi, in 1990, which was about the theory of this shallow network. It was basically a theory about kernel machines before the term kernel machines was actually invented. And then based on that theory, I kind of felt free to apply to problems like genetics and vision and graphics and so on, as I said. [00:12:31] Speaker B: But it was only once you had the theory, then you became free. That's your. [00:12:37] Speaker A: Okay, yeah, that's exactly right. That's exactly right. And in a sense, I regret this because. Because maybe, you know, one of the lessons that I think I learned afterwards is this story about Volta. So it's. It's a kind of metaphor, not to be taken too literally, but you know, as we say, history is not repeating itself, but rhyming itself sometime. And so. Right. So if you want to say this analogy between vault and electricity, it's an interesting itself. I don't think many people realize that until 1800, which was only 220 years ago. This was the time of Napoleon. Information traveled at the speed of a horse. Yeah. Never faster in human history. There are these wonderful letters when people wrote to each other about the fall of Constantinople. So the big event in the Christian world. I think it was 1454 or something like this. They wrote in Paris, they, you know, in Vienna they wrote to each other. In Paris they wrote to each other. Did you hear Constantinople is fell to the Turks In Madrid they wrote to each other. So we have a precise date of when the information go. And it was exactly three weeks to Vienna, four to Paris, five to Madrid. Exactly. A horse running 24 hours. [00:14:19] Speaker B: In good weather. [00:14:19] Speaker A: Yeah, in good weather, yes. So Volta 1800 Napoleon. Until then, electricity was just sparks of lightning. And Volta invents the first source of continuous electricity. Actually, I have no, you just put the battery right. [00:14:48] Speaker B: Or the pile. [00:14:50] Speaker A: This is a faithful. A faithful copy of the original pillar. [00:14:56] Speaker B: Oh, that's cool. [00:14:57] Speaker A: Was given to me for the bicentennial of the invention of Alessandro in the year 2000. And at that time I had lunch with Countess Volta. She was the grand. Grand granddaughter of Alessandro Volta. [00:15:16] Speaker B: And she brought you that. How did you get that? Did she bring that to you? [00:15:22] Speaker A: Well, the University of Pavia make a big celebration, gave four Noris. Laura. Laura Norris Causa. I was one of the four, I got this one. So there was a special event for the 200 years, but. And at the time they opened a museum centered around what happened in Pavia after Volta. And so there was this battery, the original one, and then many others bigger that Volta developed. And afterwards it looked like it was a museum of a Silicon Valley of electricity. There are tons of startups for generators, electrical motors, electrical lightning and so on. And one of them, by the way, was Einstein and Einstein. This was the father and uncle about Albert Einstein, who moved from Ulme, southern Germany to Pavia to start to make a startup. And at the time, their setup eventually went broke. Before going broke, they competed in the competition to illuminate streets of Munich Monaco. And the competitor was Siemens. [00:16:57] Speaker B: Okay. [00:16:58] Speaker A: They lost the bid to Siemens. So you can imagine an alternative history in which, you know, instead of cement, there is Einstein and Einstein and maybe we don't have relativity theory. [00:17:12] Speaker B: I know, that's what I was thinking. Yeah. He could have been like some privileged kid that never needed to struggle to figure things out. [00:17:19] Speaker A: Yeah. So anyway, this is, you know, back to the main topic. The story was that immediately after Walter invented this continuous source of electricity for the first time, people could. People meaning scientists could study electricity. Right. And so it was a really an avalanche of discoveries. The electrochemistry was done in the next 15, 20 years. Then there were the discovery of laws of electricity like Ohm, Ampere and eventually Faraday. We invented the electrical Generator, electrical motors, or stud, I think invented connections, discovered the connection between electricity and magnetism that you can. And. And then culminated of course, in 64, 1864, when Maxwell came up with four equations about electromagnetism. A theory of electromagnetism. [00:18:32] Speaker B: Yeah. He developed a theory. [00:18:33] Speaker A: Yeah. So it took, you know, 60 years to do this by horse time, yes, but long time. But in the meantime, even without a theory, until Maxwell, people really did not know what electricity was. Right. But this not. Did not. Was not an obstacle to develop great applications like the electrical motors and the electrical generators and all those things. So, so this is kind of lesson, I feel we are in artificial intelligence. You know, we are still between Volt and Maxwell. I don't know exactly where. That's difficult question. [00:19:21] Speaker B: Yeah, we're faster than horses now, but. So I've heard you tell that analogy and I've also heard you in a different breath voice, your concern or the possibility that maybe we don't need Maxwell of AI, maybe we don't need the theory, even though that's what you're working on. So how do you reconcile those two? I can't imagine you actually believe that. It's almost as if you're. Are you sort of admitting something you don't believe or. [00:19:55] Speaker A: Yeah, I think I admit something I don't believe. I hope we need a theory and there will be a theory. How complete? I don't know. I'm almost sure it will not be four equations. It will be more something like, you know, principles of intelligence like we have in molecular biology. You know, we don't really have equations, but we have some basic principles. For instance, how biological information is copying and reproducing itself by the double helix. That's a beautiful principle. You have to. I imagine things like that that are fundamental but may not give a complete theory in the sense of the electromagnetism of Maxwell. Now this is what I hope. You know, there is always the possibility that in a Sense, Machine Learning LLMs or their successors will develop the theory instead of us and that we may not be able to understand it. [00:21:20] Speaker B: Oh, well, okay, okay. So it's interesting that you mentioned principles because. So I was just in conversation with Alex Meyer, who's a. A neurophysiologist, but he has been enamored recently with integrated information theory as an explanation, potential explanation for consciousness. And the reason that he's enamored with it is because it is embedded in a formalized mathematical formalization, essentially, which gives the possibility of essentially developing laws of, you know, mathematical Laws of consciousness in this case, but because that's what's satisfying. And yes, evolution is, and molecular biology and DNA, those are principles, but they're not like natural laws. And somehow we're not as a, as a people. Scientists seem most satisfied when we can, I was going to say reduce. When we can formalize relations with these natural laws. Is that the kind of thing that you're after with the, the with theories of learning and machine learning theory in general? [00:22:28] Speaker A: I think so. They are more like principle and I think mathematical principles, sparsity and compositionality is. [00:22:39] Speaker B: What we're going to discuss. But are those. But you have to prove theorems to concretely say things about them which you're in the business of doing. Yeah, but so is that different than a formal mathematical law of are those principles or laws? [00:23:01] Speaker A: Well, there is this interesting principles like sparse compositionality, which can speak later about it, but we can prove that this is a consequence of something like a function or the ability to do a task being computable by a Turing machine. Yeah, you can prove that. Now question is. So first of all, this implies that everything that is running on a computer like ChatGPT or so is compositionally sparse because it runs on a computer. But it does not necessarily imply that everything that our brain does is compositionally sparse because we don't know if we can reproduce in a machine everything that our brain does. Most people believe it, but. But do you? [00:24:02] Speaker B: Well. [00:24:05] Speaker A: Not completely because we can discuss later. But this condition of computability is efficient computability, which simply means computer should be able to compute it in a time that is not the age of the universe or something like that. Right, Right. Reasonable time. So it could be that. And so let's put it another way. There are physical processes that like chaotic systems, you know, like how the weather develops, forms and develops that are very likely not efficiently Turing computable. [00:24:49] Speaker B: Right. [00:24:49] Speaker A: And so simply because in order to keep a window predictivity that is constant in, in the future as you go forward, you should increase the precision of your measurements exponentially. Right. So you know, it's computable, but not efficiently Turing computable. And, and so there is there a window which by the way may connect to this question about consciousness. It may be that consciousness is not Turing computable in the same sense that we cannot compute with arbitrary precision the weather, you know, say three days from now. [00:25:36] Speaker B: I can't imagine that it is during computable. Well, you know, and Alex's point, one of his points is, you know, what he wants Is this isomorphism between a mathematical structure and properties of like phenomenal consciousness, like the qualia, et cetera. And he distinguishes it from cognition because everything, cognition is a function. Everything that we do in AI, everything that we do with neural networks, it's all a function and not a mathematical isomorphism. And so there's a huge difference. There's. [00:26:08] Speaker A: Yeah, in my point of view, you have this function that are essentially decomposable, computed and computable by a computer and other functions that are too complicated to be computed in a reasonable time. [00:26:29] Speaker B: And learning itself is a function. I mean, it's safe to say that since you, since you popped learning in as a, as a fourth level, I mean, has that been your passion for. I mean, you've been working on it ever since, but yeah, so what I, what I really want to ask you is like how your sort of out. How your thoughts, you know, about learning have developed over time. If there's anything that you used to think that you now don't think or has it been like that steady march forward that I see that, you know, you actually actuate? [00:27:02] Speaker A: Well, I always thought, I think that learning was really the door to intelligence. I think what changed, you know, first of all was the fact that for a long time I try to preach to com to my friends in the computer science department that learning was really important. And they started to listen at MIT only I would say 2010 or so. [00:27:47] Speaker B: Why would they not listen to that? What was their hang up? [00:27:49] Speaker A: It's an interesting point. If you think about it, it's kind of logical. You know, the paradigm, the basic paradigm, scientific approach or research approach in computer science departments from their inception, 1950 or so was programming algorithm. [00:28:13] Speaker B: Algorithmic programming. [00:28:14] Speaker A: Yeah, yeah. You tell a computer what to do, they may be very sophisticated things you tell him to do, but tell computer to do. And your function as a researchers is to write a smart program. Right. Okay, then this was until 2000, 2010. Then if you look now computer science is computer transform. Everything is machine learning used to be compilers, computer languages, robotics, computer vision, natural language. They were of separate silos. Now it's all machine learning. And it's funny that I started to say this, that machine learning was going to be the langua franca of computer science back in, I don't know, 1990 when. But it took a long time. It reminds me what happened. I remember in the 80s we are using the email at MIT and I was a consultant in this quite interesting little company called Thinking Machines. Huh. Thinking Machines produce the connection Machine super computer with 1 million of very simple processors. Anyway, and I was there a. How they call the corporate fellow. Okay. And another corporate fellow, just to give you the idea, was a Richard Feynman for instance, your buddy. Right. And Steve Wolfram, another one. Oh, wow. And a few other very interesting people. So at the time, you know, it was obvious to me that email was the way to go. But it took another 15 years before people stopped using fax machines. [00:30:32] Speaker B: Well, I've still had to, I still like last month had to fax something and I could not understand why. But. Yeah, yeah, yeah, yeah. [00:30:40] Speaker A: You know what I'm saying? [00:30:41] Speaker B: Yeah, yeah, of course. [00:30:43] Speaker A: I basically gave up on the hope that email was going to come. And then it came, of course. [00:30:50] Speaker B: Yeah, but by then you're onto Slack and other. [00:30:53] Speaker A: Yeah, yep. [00:30:56] Speaker B: But, so, but, but neural nets were around and people in the neural network, the PDP community, you know, they were preaching learning for a long time. There was the, of multi layer learning and you know, the whole back propagation problem, it's slow, it doesn't, you know, it's not efficient. And I know, you know that changed in 2012 when Imagenet was solved much lower error. But it's not like it didn't exist. [00:31:24] Speaker A: No, it did exist. I was a skeptic and in a sense and I was wrong about what? Neural networks. You know, I was essentially using shallow neural networks instead of deep one because basically until 2000, I don't know, 10 or so roughly 2008, shallow network was working as well as deep networks. That's another subject to discuss is how important important are technology technologies for ideas. You know, we often think we develop theories and algorithms on but existing technology, what is possible easy versus what is difficult to do shape really a lot of the ideas and the algorithms. [00:32:24] Speaker B: Yeah, Jan Lecuna makes that point too. Throughout history, so many examples of that. [00:32:28] Speaker A: Yeah, exactly. Yeah. You know, I was in a self driving car by Mercedes in Stuttgart in 1999 or something like this. [00:32:39] Speaker B: Really? [00:32:40] Speaker A: Yeah. And it was driving self driving in narrow streets in the center of downtown Stuttgart. Of course there was a driver with his hands very close to steering wheel, but simply, you know, the, the trunk was full of computers and I remember there were the work. There was a workshop three days about autonomous driving, kind of invite invitees only. And the last day, half day was lawyers of that workshop that the management of Daimler Benz Mercedes decided no autonomous driving. Let's kill it. [00:33:27] Speaker B: Oh really? [00:33:28] Speaker A: Yeah. [00:33:28] Speaker B: Oh, I was gonna say two things. One, I bet the Fukushima new Cognitron was not part of the calculations in that driving car. [00:33:38] Speaker A: Well, no, there were, there was candid, like Fukushima, because this was basically what we are doing and. Yeah, yeah, yeah, no, we, you know, for instance, we had trained a system to detect pedestrians using as many as 200 examples, which is these days. And it was working, you know, from the scientific point of view, pretty well. From the practical point of view, it was giving, I think, about three errors every 10 seconds. [00:34:16] Speaker B: Okay, yeah, okay. [00:34:17] Speaker A: It was very low in terms of number of frames, much less than an error per frame, but, you know, obviously was not usable in any real sense. [00:34:30] Speaker B: Bottom line. Tommy, how many people did you kill that day? [00:34:36] Speaker A: No real people. [00:34:43] Speaker B: I thought you were going to say at the end, they promised that we would have autonomous self driving cars in five years. Because every, every promise is in five years. Right, but you said, they said no. Maybe the lawyers really. [00:34:55] Speaker A: They killed the project inside the Daimler anyway. Yeah, which was a pity because they were at the forefront at the time, but it was kind of too early. [00:35:05] Speaker B: So that was in 1999 that you said that. [00:35:08] Speaker A: That you were something like that. Yeah, maybe 19, seven or so. [00:35:12] Speaker B: So, I mean, I read your autobiography and of course I already knew a lot about a lot of your work, but. And I'll link to it in the show notes. But you write in your autobiography, you know, you started working on object recognition, which is like detecting people in that case in the, in the early 80s, I think, and you actually doubted that. So this kind of relates to the, the learning in neural networks kind of thing. You doubted that the Hubel and Wiesel kind of simple and complex cells could be kind of composed, kind of hierarchically composed into objects. And you sort of admit that you were wrong. And then you got on this HMAX or object recognition in these hierarchical structures and worked on hmax, et cetera. But how were you thinking about learning in those days? Like during that time? [00:36:05] Speaker A: Yeah, learning in that time was really just one layer, learning the output. So there was processing by these hierarchical systems. The features were learned in a very simple way, essentially just grabbing pieces of images at random that were. And the real learning in terms of learning the weights of a classifier, just the last layer of the network. [00:36:41] Speaker B: I see. [00:36:42] Speaker A: And the reason is that I do not really believe that back propagation could be biological. [00:36:53] Speaker B: You're right. [00:36:55] Speaker A: And I was, in a sense, right, but I was wrong in the sense of not using it in machine learning. Right, yeah. And. And so there I was stopped by. Right, but this biological constraint, now we think we have ideas, a model that seems plausible from the neuroscience point of view. I don't know if it's correct or not. This will require experiments, but there is at least a fighting chance of having. It's not really back propagation, but it's something like a general form of gradient descent which can be quite naturally implemented by neurons in a quite little magically, a bit magical way because of self assembly of the connectivity and. Oh yeah. So, yeah, so, you know, maybe. Because I think that's an interesting key problem in neuroscience that could in principle really, if solved, establish a deep connection between neuroscience and machine learning. If we could find the equivalent of back propagation in the brain. Because then we could look at the circuits, the synaptic motifs and say, oh, this is where this happens, how much time. [00:38:32] Speaker B: This is an aside and I'm jumping around. But I mean, of course you just alluded to some of your own work in these self organizing, biologically plausible plasticity kinds of networks. And there have been other ways of suggesting how biologically plausible versions that kind of replicate what back propagation does exist. And sort of. There's been multiple proofs of principle and they've had varying degrees of success in terms of emulating backpropagation. Yeah, but how? I was reading one of your recent papers and the language is so thick with deep learning theory jargon and I thought, oh my God, I'm still like, I know a little bit like I know what a manifold is and stuff. But then you get into the technical terminology and then I think, oh, I feel kind of lost. And you're really embedded in that world and how much of your time thinks of it in terms of machine learning versus in terms of biological learning? Like what kind of. How much headspace is devoted to each of those if they're separate? [00:39:43] Speaker A: Right. I think for a long time has been 50. 50. [00:39:48] Speaker B: Okay. [00:39:50] Speaker A: I think in the last five years or so I've been probably tilting a bit more towards the artificial network because. [00:40:01] Speaker B: Because the data is there to. To test and not so much. [00:40:04] Speaker A: No, it's because I've been really puzzled by the need of F theory. Yeah. And. And so I think again now, now, meaning in the last couple of years, I think it seems to me that I found some principles. These are by no means the most important ones, but maybe some of the principles that seem to be important for artificial machine learning. [00:40:45] Speaker B: Let's talk about them, let's talk about them now. So I mean, compositional sparsity is. I know. Is that the central principle that you're focused on, right. [00:40:54] Speaker A: Now is one. Yeah, it's. You know, to me, it did solve the question was a block. Again, coming back to what I mentioned before, you know, it's. I kind of need to have at least a glimpse of theoretical understanding of what's going on. And back in the. I think we. We wrote a review paper, probably 2003 or so, for the American Mathematical Society with a very famous mathematician, Steve Smail, about machine learning. And. And there we described that the theory, quite nice and quite complete, of basically shallow networks, kernel machines and so on. And then I had in the discussion various paragraphs about this puzzle, why we seem to have a theory that does not need deep, you know, multiple layers. And whereas what we know about physiology, for instance, in visual cortex, seems to suggest there are multiple layers that are important. And so I was kind of asking this puzzle why. And so I was a bit stuck there before I could really apply deep networks. And I think this sparse compositionality is the answer to this and to other. To other similar puzzles. [00:42:54] Speaker B: How did you. I can imagine a scenario where you sort of discovered that through training deep networks and you look at their representations and see the properties of them. But I can also imagine a scenario where you sort of approach it from a more theoretical, principled lens and think what features would matter. So how did. How did that come about? [00:43:14] Speaker A: Yeah, I came out more in the second way. It came out as an answer to a related question of why convolutional networks seem to be so much better than dense networks. And in convolutional networks, as in visual cortex, you have units that essentially look only at a small set of inputs, not all the inputs. Yeah, right. For instance, you have a. Say a lot of photoreceptors, but you are looking each unit in the first layer, only look at a small set of them. [00:43:58] Speaker B: A little local patch. [00:43:59] Speaker A: Local patch, exactly. Yeah. And so the question came up, Suppose I have a function of many variables for this particular case, say 8 variable, you know, x1, x2, x3, x4, x8. But now suppose that this function as this particular structure. So it's a function of functions of functions. So I have a function of two variables, x1 and x2, another function of other two variables, x3 and x4. And then you have a function that take the output of those two functions and so on. So you have essentially a binary tree where you have eight nodes as inputs, and then each other node is a function of two variables. Okay. And the question was, this is kind of a toy version of a convolutional network. Well, convolution is not really important. The fact that the weights are the same under translation. And so it turns out that when you have a function of 8 variable in general, you have the so called curse of dimensionality. In other words, to approximate it, you need a number of parameters. That is typically can be as bad as exponential in the number of variables, which is really bad. [00:45:48] Speaker B: Independent. Right. If they're not. If they're not highly correlated, et cetera, that's the worst case scenario. But. Yeah, yeah, right. [00:45:54] Speaker A: Yeah, right. And you know, smoothness of the function can correct that. But basically it reoccurs. You can see that for instance, if you have a function of thousand variables, which is not much because it's a Small Images 32 By 32 pixels, you have about thousand pixels. The function of thousand pixels. This could give you. If you assume an error in the approximation of 10%, you may need 10 to the thousand parameters. Now 10 to the thousand is a huge numbers because 10 to the 80 is the number of protons in the universe. [00:46:41] Speaker B: I knew you're going with either electrons or protons or. Yeah, so one of those. It's always the metric. And that's a bad sign if it takes more than the protons in the universe. [00:46:53] Speaker A: Yeah, right, right. But it turns out that if the function is, as I said, it's a function of functions, it is kind of. Originally we called it hierarchically local, but the better term is sparse. It's compositionally, it's composition of sparse function, meaning functional, each one of which depends on a small number of variables. [00:47:22] Speaker B: So is sparse here serving a. Is there a precision to the term sparse? Is Sparse Less than 3? Is it? Or is it just a sort of directional attitude? [00:47:35] Speaker A: It's directional, but because you get into exponential losses, I would say that sparse means less than 40 binary variables. [00:47:49] Speaker B: Oh, okay, okay. [00:47:50] Speaker A: Or 14 non binary ones. [00:47:54] Speaker B: So this is obviously sparse. This is like very sparse then. Yeah, so the sparsity. Okay, so go ahead. So you have functions of functions. That's the compositionality part and the hierarchical part. And maybe you can differentiate hierarchical versus compositionality here. [00:48:15] Speaker A: Not really. I think they're two words for the same thing. I think composition is. Is a better term because you're composing functions. Is a function of functional functions. And also it's a term that, you know, it's comes up in a lot in for instance, the compositionality of language. The idea that you can create bigger things, bigger meanings out of simple parts. And you know, Chomsky said that and Helmutz also Said, you know, essentially the capability of getting infinitely complex things out of simple parts. That's one of the powers of language. But it turns out that this property of every functional property that can be computed. [00:49:21] Speaker B: Necessarily. [00:49:22] Speaker A: Yeah, necessarily. [00:49:24] Speaker B: So is. So it seems. Okay, so I could see how this. Where's the bottleneck here? Where's the trick? Is the. Is the trick the functions themselves? I'm thinking of this in evolutionary terms, right? Like, how did evolution discover which functions can efficiently interact with other functions in this sparse composition? It seems like a fragile system, and we know the system is robust. So it seems like you have to get. The trick is getting the functions right and you still have to learn the functions. [00:49:58] Speaker A: Well, okay, but so it's interesting. This is an interesting. I'm not sure whether conflict or dividing line between classical mathematics and computer science. In classical mathematics, you define function spaces. Typically they have property like different types of smoothness, you know, and meet a certain number of derivatives and so on. In computer science, you build every function out of a small number of primitives. You know, you start with and. Or not, and you build everything out of those simple things by composing. It's a fundamental operation in computer science. And so it's very natural for computer scientists to see that compositionality has to be the property of every function that can be computed. [00:51:10] Speaker B: I see. [00:51:11] Speaker A: For a mathematician is a bit more difficult because that's not. [00:51:17] Speaker B: I forget, we're in computer science land. Everything's bull, back to Boolean, Right? So. [00:51:23] Speaker A: Yeah, but, you know. Yes, yes. That is in fact one of the essays I wrote for a collection of 28 essays. That kind of part of this new theoretical framework is do real numbers exist? Because there is exactly this question. In order to have Turing computability, you need essentially you want to describe every number at the end in terms of Boolean variables, perhaps a very long string of 0 and ones. So real number do not really exist in computer science. And in fact, if you look at the foundation of mathematics, the continuum hypothesis, which is at the basis of real numbers, is not strictly needed for con. For. For the fundamental mathematics. Don't lose too much if you give up real numbers. And many real numbers, some are computable, like say the number PI or e, but many real numbers are uncomputable. So essentially they are like poetry, you know, it's not. [00:52:52] Speaker B: Useless, in other words, absolutely useless. [00:52:55] Speaker A: You cannot do experiments. You cannot do anything. [00:53:00] Speaker B: Okay, well, so sorry. So then where were we? So we have a sparse set of compositional structures. And you proved. And what did you prove? [00:53:12] Speaker A: So Every function that is efficiently computable, so basically is computable by a Turing machine in non exponential time. And the number of variables is composition sparse. In other words, can be decomposed in the composition of function, each one of which is sparse, meaning depends on a small number of variables. And you know, there are. These decompositions are non unique. There are many, many of them. For any given function you can think of, the most extreme one would correspond to a very deep decomposition in the most simple elementary function. It's really the composition in terms of the basic operation and. Or not. Oh, you know, I can always translate mathematic equivalent Turing machine program into a Boolean function. [00:54:25] Speaker B: So. So then the idea is that learning that set of Boolean operations that would comprise any given function in this compositional hierarchy of sparsely connected functions, that is not computationally terribly expensive in a learning perspective. [00:54:43] Speaker A: Yes. If I have, if I have input output data for each one of the constituent functions, each one of them is easily learnable. The analogy is imagine a multi layer network. Typically you have input data for the network and the output of the old network. Training with those may be difficult or maybe. But if I would have intermediate data, the input and the output for each one of the constituent function which one of the layers, I could easily learn each of the function and then of course the old function. And this is by the way one of the reason why transformers work. It's one of them. You know, the magic of transformer is because it's trained in an auto regressive framework. I'm not, it's not trained by giving me a sequence of words and then, you know, the last letter of the book or the last word in the book or the last sentence in the book is trained by giving me a sentence of word and then the next word and then a gate. Right, right. And so it's almost like in most cases it's like just being trained to one of these constituent functions. [00:56:21] Speaker B: Oh, I see. Okay. [00:56:24] Speaker A: So and then I can of course predict the next word and then using the this old sequence. Predict next word. Yeah, right. [00:56:34] Speaker B: You predict the word and it becomes part of the corpus from which you predict the next word. [00:56:37] Speaker A: Yeah, all right. Which. [00:56:41] Speaker B: Okay, so I've kind of two. Well, let's stick with the, let's stick with the machine learning version here. So. Well, let's talk about generalizability. I mean, I know that you're interested in how this relates to generalizability. What can we say about. Given that you have to have if you're going to do it In a deep network or that's the advantage of doing it in a deep network is that you can do it with these sparse compositional structures. Yes, most machine learning tasks are very narrow. Right. And there's, you know, there's this continual learning problem. It's hard. Once you've trained a model on a certain task, then you have to relearn things to train it on another task. And so this generalization which is sort of the gold, what is that? The pot at the end of the rainbow, sort of, you know, one of the pots at the. A pot of gold at the end of a rainbow for AI. So what does this have to do with generalization, this kind of structure? [00:57:42] Speaker A: So this kind of structure is important for the old framework. The approach to machine learning that has been, you know, kind of the main one for everybody is essentially the following. I have an unknown function. You know, let's take an example. ImageNet. I want to classify images in ImageNet thousand classes. So I have a function that wants to map this, you know, four is 200 by 200. So yeah, about 4,000 variables into one of the thousand classes. And what do I. But I have only a training set. I don't know the function. I have input data, the images and output with the correct class. I have any such example in my training set. Now the framework is I want to approximate this function using a very powerful general tool. Turns out that this principle of sparse compositionality says the very general tool you should use is a deep network. So it's important deep because every function assuming is computable can be represented as compositional function. That's is mathematics under langate. But that's the basic message. So and, and those the result set gives you a guarantee. It says if you have a network that has multiple layer assuming that you can optimize that you can do the optimization. What you should do is tune the parameters. They're like many knobs, 100,000 knobs that I have to adjust so that my network imitator, which I know about the function on the training set, you have to turn the knob so that it will do the same as the training set. We classify correctly the training set. [01:00:24] Speaker B: Well, Frank, Frank Rosenblatt physically did turn knobs, but I know what you mean. The modern day. [01:00:29] Speaker A: That's right, yes. And here the theorems that we have said you will not have an infinite or an exponential number of knobs because we know the function is composition sparse. Your network will need a non exponential number of parameters. And the guaranteed is very powerful to Approximate every function. Once you have that, this means also that you have generalization. Essentially there is a trade off. Again, this is some mathematics, but basically said if you can represent a function with a relatively small number of knobs, then you will be generalizing also. [01:01:25] Speaker B: Oh, okay. [01:01:27] Speaker A: And if you were should be using a technique with has an infinite number of parameters, very large, very large. There will be two problems. First of all, you cannot deal with very large number, you know, 10 to the thousand parameter. But second, you will not have generalization. You only fit the data. [01:01:51] Speaker B: Essentially overfit the data with too many if you're using over parameterized. [01:01:56] Speaker A: Right? That's right. This is tricky and it's kind of requires a more in depth conversation about what over parameterization means. Because neural network today are over parameterized in the sense that it often have more parameters than training data. Yes, but the point is that without this guarantee from sparse compositionality, the number of parameters be so much larger, it would be really impossible to have. [01:02:36] Speaker B: Okay, so am I thinking about this correctly? It dawned on me like, should I be thinking of functions of functions somewhere in between good old fashioned symbolic AI where you have these kind of modules that are talking to each other and the low level, every single neuron is like a single logic gate, performs the same single logic gate. And it's just a composition of them all talking together. But these are like almost clusters of Boolean functions. [01:03:12] Speaker A: It's a bit like this. I mean the best way to think about is probably think back to the binary tree. So you have inputs from the leaves and one output. So the binary tree goes up, reduces in width. I think it's like having units in the first layer of say visual cortex looking at different patches in the image. And then you have a layer in the layer unit and the layer above which look at the output of this first layer units. [01:03:57] Speaker B: But the layer above is half the number, right? As much as a lower dimension than. [01:04:02] Speaker A: The layer below and so on. Yes, and so it's like having neurons that look at the neurons below and send outputs to one neuron above. So you get more and more. Which by the way is more or less the architecture in visual cortex where you have small receptive fields in V1 and then bigger one in V2 and V4 and still bigger in it. [01:04:42] Speaker B: Well, okay, I was going to ask. I mean, you said more or less, which I agree with more or less. But I was going to. But I was going to ask also like how you think this theoretical finding result essentially matters or applies to thinking about wet brains. [01:05:02] Speaker A: Yeah. First of all, I'm not sure that's an open question. [01:05:08] Speaker B: I know you care. [01:05:10] Speaker A: No, I do care. I just mean that mathematically I can tell you that the sparse composition has to be true for, you know, things like ChatGPT and all similar system because they run on computers and everything I can simulate on a computer has to have this property. As I told you, I don't know about the human brain. My guess is that there are some aspects of what our brain does, like language and mathematics and other things that seems compositional by the other things. Maybe kind of the older brain, mid brain in, you know, in our fish ancestors or so brain structure like the basal ganglia, where maybe there is less modularity, there is less compositionality. Could be. [01:06:18] Speaker B: I, I mean, yeah, if the basal ganglia is just a gain modulator, then you don't need compositionality, for example, maybe not. [01:06:24] Speaker A: Maybe you cannot really efficiently simulate it. I'm. This is bit science fiction of them. It's out there. I'm not, I'm not claiming this. I personally believe it's not likely that this then cannot be described by a computer program. But there is a possibility. [01:06:43] Speaker B: The jury is out. Yeah, right. So, but, but you probably think of it more as like cortex because, you know, all, all intelligent AI is about cortex. [01:06:56] Speaker A: Right, Right. So cortex is what I think probably compositional and maybe what we can simulate in computers more easily and older parts of the brain, maybe not. This would be kind of ironic that, you know, the simpler part, so to speak, the more ancient part of the brain are the one that would be potentially more difficult to simulate. [01:07:23] Speaker B: So you don't know of like any evidence across species or anything that would corroborate that this is occurring in brains, right? [01:07:32] Speaker A: No. [01:07:33] Speaker B: So how does one. So I wanted to ask you anyway about this balance between theory and experiment, which physics has had for the success of physics, like has depended on experimenters dialoguing with theorists and sort of that back and forth. And so let's say in this case. Right. And so you're a theory person. Do you go looking for experimental evidence? Do you try to convince someone, hey, I need this kind of data for you from you, or, or hey, look at my theory. Is this in the brain. How do you go, how would you go about doing that? [01:08:09] Speaker A: Yes, I've always done that in my, in my career, probably less in the last few years. But yeah, I cannot forget the excitement when I made a theoretical prediction, a pretty simple one about behavior in the fly, and then the experiment was done and it turned out was correct. [01:08:37] Speaker B: Oh, my God. [01:08:37] Speaker A: How did that feel? [01:08:38] Speaker B: It must have felt. Oh, yeah, that's one of those things. A lot of theorists, right, they like, they have this feeling like, oh, here's the theory. It has to be correct because it is theoretically correct. So there's like such confidence in the correctness of it already. But then maybe to see it actually come to fruition is something else. [01:09:01] Speaker A: Right, exactly. Yeah, it's, you know, it's, it's funny. There are different experiences. One, of course, is to prove a theorem. Very, you know, you're quite happy with yourself. [01:09:12] Speaker B: And I never felt it. Never felt that one probably never will. [01:09:17] Speaker A: I did. I'm not a mathematician and not a good mathematician, but sometimes I proved something and it has been exciting. But the excitement of, of having an experience, I mean, you know, confirming the theory, that's. That's really something else. [01:09:38] Speaker B: Yeah. Do you see? So this is related. I have a lot of questions to ask you, and so I'm kind of folding them in as, as I see an opportunity along the conversation here. But who is it? Who is more in need of deep learning theory, do you think? Machine learning engineers or trying to build good AI or neuroscientists trying to explain how brains work? [01:10:06] Speaker A: Well, for sure, if you ask. I'm pretty confident that if you ask leading researchers in, say, OpenAI or so they will say, we don't need theory. [01:10:23] Speaker B: How does that make you feel? [01:10:28] Speaker A: You know, I become a customer to it, I guess. [01:10:33] Speaker B: But you have a track record where you can say, well, you'll see in 20 years, you know. [01:10:39] Speaker A: Yeah, well, you know, you never know whether history will repeat itself. This is especially the case in which a very special case where working with intelligence itself. And so I'm always afraid of, you know, maybe that's theory is dead forever. Right? This is. That can't. I don't think, I don't think it is, but it's. I don't have the good argument why theory should be important in the future, as it has been in the case of electricity, say. My argument is a bit like Blaise Pascal, many, many years ago. It's a kind of a wager, the Pascal wager, you know. Yeah, he said it better to. It's better. It's more rational to bet that God exists and behave accordingly. [01:11:37] Speaker B: Because. [01:11:40] Speaker A: If you are wrong and you behave as if it not exist and does exist, the loss is infinite. Right. Hell forever, for instance. [01:11:52] Speaker B: But see, that was an eternal bet. Whereas you have this long track record of success and accomplishments where I would imagine you might be a little more confident than Pascal. [01:12:07] Speaker A: Yes, but. But my main argument is really it's better to bet there is no point in betting that there is super intelligent that is going to take over from us in a short amount of years, like three, five or so. It's much better to bet that there is a pretty long time for us to collaborate with machines and, you know, improve our intelligence and what we can do together. And there are quite a few years before AGI will take over, if ever, if it exists. [01:12:54] Speaker B: I don't think AGI is a thing, but that's a separate conversation. But yes, but it sure makes a hell of a lot of money to claim it is. [01:13:05] Speaker A: Yeah. Which by the way, is a bit dangerous also from the economic point of view, because I think, I must say a big surprise in my career was, was not so much imagenet. You know, this was when Deep Networks had such a good success with classification of the Imagenet database. It was 2012. They were of course, 20% better than previous techniques, which is a lot. But, you know, okay, this was not so surprising. But what I was really surprised was 2017, it took me a bit more to realize that, but with transformers and eventually ChatGPT, so that was a big surprise. And I'm still really impressed by how powerful LLMs are, even if they're not like one of us. [01:14:22] Speaker B: Oh yeah, yeah. [01:14:23] Speaker A: They're still super certainly intelligent from the point of view of the Turing Test. You know, I think they are the first time in the history of mankind where we have not only our intelligence, but also other intelligences. They're different, but it's a wonderful way to situation for us to study what is common and what is different. It's a little bit like the fact that studying different genomes of different animals, you know, of Drosophila, the genome of C. Elegans, you know, we have learned so much about our own genes and what they do. And I think studying these different intelligences, it's probably also going to be very powerful to understand our own intelligence. [01:15:17] Speaker B: Did you have like, I feel like I had early on in the large language models, I thought like, oh, well, it's just another advancement and it's different and it's going to be like people got excited about recurrent neural networks for a while, people got excited about LSTMs. And by the way, every time a new technology, a new model hits the market and has big promises, a large swath of neuroscience says, ah yes, now the brain is a Boltzmann machine. Ah, yes, the brain is A convolutional neural network. And right now it's like ah, yes, it is a large language model. What the hell? Like why are we so gullible and. [01:15:58] Speaker A: Yeah, I mean at the time of the cart, the brain was just a hydrodynamic. [01:16:06] Speaker B: Yeah, that's right. [01:16:08] Speaker A: Absolutely correct. Yes. [01:16:10] Speaker B: That speaks poorly of us. I'm a neuroscientist. It's embarrassing kind of. Yeah, but did you have that same sort of. So I think that I'm admitting to my. But I also kind of think that of every new advancement, like ah. And then it surprises me that like some. Some of. Some of them surprise me like large language models. Whoa. I didn't. I did not see this coming. [01:16:32] Speaker A: Yeah, but like, like many others, I. I don't think I realized that that power until ChatGPT came out. You know, the ability to convert with something. It's first time in our history. Right. [01:16:51] Speaker B: And so, but, but it's also so interesting how easily personally me. But I assume everyone has kind of integrated it. It's not like it's this super foreign thing. It's super easy and it just f. Yeah, that's part of the impressive part, I guess. [01:17:06] Speaker A: And you know, also at some intuitive level, what you can get out and how much you can trust it and how to manipulate it. [01:17:20] Speaker B: Right. Well, some of us are. Well, we all have different skills in that regard as well too because I think a lot of people are a little more naive and gullible and. But suit man as a tool. [01:17:31] Speaker A: Wow. [01:17:32] Speaker B: It's amazing. Yes, but, but you know us. I'm embarrassed for. For myself, for neuroscientists and stuff. But back to that question from a few minutes ago, you know, whether it's more important for neuroscientists or machine learning people to. To have like a deep learning theory. Do neuroscientists need that? I mean would, you know. [01:17:53] Speaker A: Yeah, yeah, I think. Yes, because. Because I think exactly like you mentioned, it makes no sense to think that our brain is a transformer. [01:18:09] Speaker B: Absolutely no sense. [01:18:11] Speaker A: Right. But if we understand the principles on which transformers are based, the basic ones, those same principle could be used by the brain potentially in a quite different form. Could they don't need to. But then if I say the principle rather than, you know, the particular implementation. Right. It could be something completely different from a transformer, but it's using exploiting for instance, compositional sparsity in a similar way and this autoregressive framework. Then, then, you know, then I think one could at least ask him a More reasonable question whether a similar principle is used or not. That's, that's, I think on the other hand. [01:19:09] Speaker B: I'm sorry to interrupt. I was going to say on the other hand, it is somewhat like the story, you know, why are you looking for your keys there? Because that's where the light's on, whatever that cartoon is or whatever. Where do compositional sparsity, where does that sit in the levels framework? It's not an algorithm, it's a principle. Where do principles sit? [01:19:37] Speaker A: That's an interesting question. I think probably sit in the theory of learning. [01:19:43] Speaker B: That sounds, that's a big basket to sit in, right? [01:19:46] Speaker A: I mean, yeah, it's a big bask. Yeah. [01:19:52] Speaker B: Going back to. Okay, so thinking about the usefulness of theory in your claim that or what you believe we may be between Volta and Maxwell, and you talked about all the interesting things that happened after the battery was invented, all the applications. You did not talk about how Maxwell, what happened post Maxwell, was that a game changer or did we actually need Maxwell? Would things have progressed without the theory? So do we need a theory? [01:20:25] Speaker A: I think a lot happened after Maxwell because of Maxwell. You know, radio, tv, radar, Internet, the ability to fabricate electronic components. Everything depends on theory. Not only Maxwell, of course, many other things, but, you know, that's it. So, so yes, theory was needed to do a lot more what had been done until then. Not only to understand what had been done in terms like electrical motor generators and doing it better, but also a lot of other things that came afterwards. [01:21:24] Speaker B: Okay, fair enough. Before we leave, we've already kind of left and stepped back in and stepped back out. But thinking about compositional sparsity in the recent paper, it's on arXiv, you contrast these principles with alternative principles, one of which is manifold learning, which is. Manifolds talk is ubiquitous in the neurosciences now because everything's like a low dimensional manifold structure and if you go off manifold, it's harder to learn if you stay on. The further, the closer you are to the existing manifold, the more quickly you can learn, you know, things like that. So, so can you describe how compositional sparsity is different and why, why you prefer it as a principle to manifold learning? If that's even the right way to phrase that. [01:22:24] Speaker A: Yeah, I, I think there are two views of the same, the same situation. So you have, you, you compose functions and this has the. Can be interpreted as essentially constructing a manifold out of simpler pieces. So you know, it's like you construct a visual manifold by composing patches that are looked at and interpreted by early neurons and then they combine in putting together more complex and larger manifold structures. So I've not, you know, done the mathematics between the two, but it's pretty clear that there is almost a one to one mappings. The language is different because as I mentioned before, the. Typically in classical mathematics you speak about structure like manifolds, you have to glue them in some smooth way and so on, but it's really defining different functions in different parts of the space. And that's, that's what sparse compositionalities. So I think they're rather equivalent. They're equivalent. [01:24:09] Speaker B: So sparse. So sparse compositionality sort of implies or is. Requires a smooth Euclidean local space wherever you are. Like a manifold requires. [01:24:21] Speaker A: Right. And it's, you know, locally this manifold will depend on certain variables in the big space and the other part of the manifold on other ones or overlapping ones. Yes. [01:24:34] Speaker B: Given your fascination with learning, do you think of evolution as just a really slow learning? [01:24:40] Speaker A: Yeah, that's a very interesting question. I think there have been a couple of good attempts to look at evolution as, as learning. One is Leslie Valiant, friend and great computer scientist who had an article looking at evolution as a learning process. And in, in a sense it was like you said, slower learning. I think the truth may be maybe a bit deeper. Like in. In. In learning you have essentially the exploration of a certain set of possible solution or hypotheses. People speak about hypothesis spaces. Evolution may have. And so in learning you look inside the particular hypothesis space for like a kernel, a function of a set of functions described by a kernel. Like a Gaussian, you try to learn the correct one from the data. Evolution may have more kind of defined all spaces of functions, different ones. You can think of it as. Alternative way to think about it is in learning you have an architecture that you use to solve a particular problem by training. But evolution may have suggested different types of architectures. [01:26:38] Speaker B: The space of possible solutions is larger. [01:26:41] Speaker A: But for each one is different. [01:26:46] Speaker B: What? Yeah. What does that mean? [01:26:48] Speaker A: Yeah. For instance, suppose you had maybe at the very beginning of intelligence that's, you know, just speculations. You have simple associative reflexes. You know, you have a flash of light and the ability to associate to it some escape response. At first probably was just a reflex hardwired in the gene. Maybe later it became flexible depending on the stimulus and the condition of the environment. But it's still very narrow type of solution defined by essentially a one layer. If you want one layer network, at some point you discover that you have multiple. You can do Multiple layers. This expands the type of solutions and the type of problems you can learn, but maybe not everything. And for instance, you can learn in a supervised way, but you cannot explore, discover. And so, like, you can do it. Reinforcement learning. [01:28:14] Speaker B: Yeah. [01:28:14] Speaker A: Yeah. So, you know, evolution may have discovered this more and more sophisticated ways to, to be intelligent or to learn different ways to learn. [01:28:30] Speaker B: Yeah, that's interesting. Yeah, that makes sense as well. In the last few minutes here, maybe we can just go out broadly again. So I sort of took us away from the Brains, Minds and Machines project that I had asked you about, so I will. It sounds like it's a project in the future, but you can just let me know and I'll link to it in the future and all that jazz. Um, what I think, what I want to ask you about is whether you are, you know, your outlook. Is it. Are you excited about the next, I don't know, 20 years of progress in theory, or is it trepidation? And paired with that, I want to ask what do you see that you think is holding the field back? And I think that your answer is going to be learning theory. But if there's anything more specific about it, if that is indeed the answer. [01:29:27] Speaker A: I'm excited about the future, absolutely. There is a little bit of trepidation, of course. [01:29:37] Speaker B: But relative to, let's say, 30 years ago, do you feel the same, given your steadfast march forward? Are you more excited than you used to be? Do you feel more trepidation, less? Is it a different, or is it just kind of the, this is the way you've seen it all and it's going to keep moving forward sort of outlook? [01:29:56] Speaker A: No, I'm excited because, of course, you know, I was also excited 30 years ago, but, but now the, the, the stakes are much bigger. You know, a lot of the economy and a lot of the rest of science depends on machine learning did not used to be the case. So it's, it's just much more important to do the right things and do it well. Yeah, it's, you know, I must say I would have not dreamed to be at this point in this race for intelligence. [01:30:47] Speaker B: Really? [01:30:49] Speaker A: Yeah, I thought, I thought things were going, going to be slower. I mean, still, I may still be right at the end, but there were certainly progress, mostly the large language models that really surprised me. And, and so there is, it's a great, you know, grounded stepping stones in which to do more, especially from the point of view of the theory. I also think that the theory that we and many others are developing is much richer than I could have expected 10 years ago. There is much more to be done. Yeah, so it's much less trivial and you know, it's not a neural network. Just happened to work for some one or two simple reason. A lot of interesting aspects of it and some are quite deep. So yeah, I find it very exciting, you know, in terms of realizing intelligence that are better than us. It may take longer than many people think. I've always, I've always thought. I remember that there was a kind of poll that we took in a meeting that Max Tegmark, another Friends organized in Puerto Rico a few years ago. And I think the average. This was probably, let's see, 10 years ago around and most people were saying something like AGI or superintendent. I'm not sure the words were used, but something like that in I think 25 years. And I was 50 years then. Okay, now. [01:32:59] Speaker B: How long ago was this? A few. This is a few years ago. [01:33:02] Speaker A: 10 years ago. [01:33:03] Speaker B: Okay, that's pretty big number for most optimistic people who kind of. Okay, next five years, 10 years. So 25 was a pretty big number. But I like that you doubled it. [01:33:13] Speaker A: Like My estimate was 50. That would be 40. I think I probably keep it. I may, you know, now I, I feel like may happen faster but depending what you want for, you know, because look, autonomous driving is a good example. Autonomous driving. I've been following closely. First working on it myself, but then through my great friend Amnon Sashua Mobilize was the, they provided the first system to Tesla and, and autonomous driving east here you have autonomous taxi in San Francisco and other cities. But it's, you know, pretty rare still to see an autonomous vehicle. And it's still impossible to have a vehicle that could drive in any place like a human could do. And so it's probably the last 1%, you know, reliability and so on this last 1%, that may take a long time. [01:34:48] Speaker B: Right. Yeah, that's, it's interesting. Like you kind of think, okay, when we have self driving cars and when you say that you sort of picture that it is solved and they are everywhere. But that's not how it works. [01:35:03] Speaker A: Yeah, that's right. I think I interviewed mostly for fun probably 10 years ago at Uber in San Francisco. [01:35:14] Speaker B: For fun. You interview? [01:35:16] Speaker A: Yeah, I was not serious about it, but, but I must say my impression is that I had no idea. Oh they were speaking of autonomous driving 2015 as something to do in the next six months. [01:35:33] Speaker B: You kind of have to have that optimism to go to work every day right in that environment, I guess. [01:35:37] Speaker A: Yeah, yeah. [01:35:39] Speaker B: I can tell you the. So I, I guess I'm generation X. I'm late X maybe. Anyway, like I, I grew up using VCR as an analog equipment. Right. And then so I'm part of that generation where people started using computers and I'll just skip forward. I have young children and I do worry the pace at which these things are changing. It is unpredictable. Back in the days when we were carrying letters by horses, you could predict what would happen next year pretty accurately. And now I don't feel like I can predict like how all of the tools that are being used are going to affect them and, and just what's coming next. It is coming faster and faster. And that to me is a lot of trepidation. Just as a parent, I know you. [01:36:30] Speaker A: Have older children, but I completely agree. It's one of the big, you know, I'm worried about climate change, of course, but even more about, about AI and education. Yeah, yeah, yeah. Because as you said, changes are too fast to keep up. We don't know what is the best way to teach. You know, you cannot forbid a kid to use ChatGPT. You should actually encourage it but at the same time where to make sure that it learns the basic of mathematics. How can we do both? Yeah, and I see that already in the, you know, in the, in the university, at college, this, this dilemma about allowing ChatGPT at the same time asking to tell us when to use it, how and so on. Because of course we don't want to have student just, just completely give all the, you know, autonomy to tragic theater to, to rely completely on it would be killing the out cultural our society. [01:38:01] Speaker B: The problem is to address that, to figure out what solution there is. By the time we figure that out, it won't be a problem anymore because it'll be on to the next thing and it just won't even exist anymore. It's just. Yeah, I don't, it's. [01:38:15] Speaker A: Yeah, I, I don't know. Yeah, that's. You know, there was the great writer Garcia Marquez, the author of 100 Years of Solitude. [01:38:26] Speaker B: Yeah. [01:38:26] Speaker A: Once once said that traveling by plane is kind of disrupting our sense of the word. You know, you have to travel in a way like by horse or train that so that time does not change too quickly. Like the daytime throws you out jack leg. It's kind of a big jet lag for education out, you know. [01:39:00] Speaker B: Yeah. Oh, that's good. The world is round like an orange. I remember that from the book, among other things, from 100 years of Solitude. Tomaso. Tommy, we've gone through a lot of materials here. Is there anything that we did not discuss that you would like to include? Make sure that we talk about before. Before we call it. [01:39:23] Speaker A: I think we spoke about almost everything. There is so much more to discuss with you. Very, very nice. But you know, another time I'll be. [01:39:36] Speaker B: Around and I would love to have you back another time. And you know, we could have you back maybe when you, when you actually launch that series or something and I can help spread the word about it. [01:39:44] Speaker A: Sure, that would be great. [01:39:46] Speaker B: Okay. Well, it's been wonderful having you. Thank you for spending the time with me and I'm glad we finally did this. I told you before, I. You've been on my list forever and I'm so. This is a real honor for me to have you on. [01:39:55] Speaker A: So thank you for me too. [01:40:04] Speaker B: Brain Inspired is powered by the Transmitter, an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advance research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives written by journalists and scientists. If you value Brain Inspired, support it through Patreon. To access full length episodes, join our Discord community and even influence who I invite to the podcast. Go to BrainInspired Co to learn more. The music you hear is a little slow jazzy blues performed by my friend Kyle Donovan. Thank you for your support. See you next time. [01:40:43] Speaker A: Sam.

Show Notes

Episode Transcript

Other Episodes

Episode 0

BI 183 Dan Goodman: Neural Reckoning

Episode 0

BI 058 Wolfgang Maass: Computing Brains and Spiking Nets

Episode 0

BI 202 Eli Sennesh: Divide-and-Conquer to Predict