BI 209 Aran Nayebi: The NeuroAI Turing Test

April 09, 2025 01:43:59
BI 209 Aran Nayebi: The NeuroAI Turing Test
Brain Inspired
BI 209 Aran Nayebi: The NeuroAI Turing Test

Apr 09 2025 | 01:43:59

/

Show Notes

Support the show to get full episodes, full archive, and join the Discord community.

The Transmitter is an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advance research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives, written by journalists and scientists.

Read more about our partnership.

Sign up for the “Brain Inspired” email alerts to be notified every time a new “Brain Inspired” episode is released.

To explore more neuroscience news and perspectives, visit thetransmitter.org.

Aran Nayebi is an Assistant Professor at Carnegie Mellon University in the Machine Learning Department. He was there in the early days of using convolutional neural networks to explain how our brains perform object recognition, and since then he's a had a whirlwind trajectory through different AI architectures and algorithms and how they relate to biological architectures and algorithms, so we touch on some of what he has studied in that regard. But he also recently started his own lab, at CMU, and he has plans to integrate much of what he has learned to eventually develop autonomous agents that perform the tasks we want them to perform in similar at least ways that our brains perform them. So we discuss his ongoing plans to reverse-engineer our intelligence to build useful cognitive architectures of that sort.

We also discuss Aran's suggestion that, at least in the NeuroAI world, the Turing test needs to be updated to include some measure of similarity of the internal representations used to achieve the various tasks the models perform. By internal representations, as we discuss, he means the population-level activity in the neural networks, not the mental representations philosophy of mind often refers to, or other philosophical notions of the term representation.

0:00 - Intro 5:24 - Background 20:46 - Building embodied agents 33:00 - Adaptability 49:25 - Marr's levels 54:12 - Sensorimotor loop and intrinsic goals 1:00:05 - NeuroAI Turing Test 1:18:18 - Representations 1:28:18 - How to know what to measure 1:32:56 - AI safety

View Full Transcript

Episode Transcript

[00:00:04] Speaker A: The kind of grand challenge right now in, in AI, even if you don't necessarily care about the brain in particular, is generalized embodied intelligence and the ability. [00:00:13] Speaker B: To is it ideally is embodied part of that? [00:00:16] Speaker A: It is the ultimate endpoint. If you took two brains, there's just going to be variability between them within, you know, individuals in a given species fix the same brain area, fix the same stimulus. There's just going to be variability in how they process that. So when a model is imperfect as a match to the brain, we need to be able to disentangle whether that's because it's actually a poor match to the brain, truly, or because there is inherent evolutionary variability between brains that we need to account for. The fundamental difference between AI, this technology that we're building, and prior technology, is that now you're starting to build technology that takes in inputs and intentionally produces actions. It's an agent. [00:01:12] Speaker B: This is brain inspired. Powered by the transmitter. Hello, everybody. It is Paul. This is brain inspired. Oh, we just had our second complexity group discussion meeting discussion and it was so fun. I'm worn out from it, but it was a lot of fun. If you are interested in complexity, a large group of us, over 300, and I think there are 325 people now on this email list, are going through these foundation papers that I spoke with David Krakauer about a few episodes ago. And we just had our second meeting. I'm learning a lot, so it's awesome. I hope if you're part of it, you're enjoying it. Okay. Welcome to this episode. Aron Nayebi is an assistant professor at my own Carnegie Mellon University. He's in the machine learning department. Aran was around in the early days of using convolutional neural networks to explain how our brains perform object recognition. You'll hear me allude to Dan Yeamans, who was on one of the first episodes of this podcast. Aron. Although he's been through many labs with big names that you've probably heard of, Dan Yemen was one of them. But it was of note to me because it was where I began this podcast. He was beginning to actually do the science anyway. Since then, he has had a whirlwind trajectory through different AI architectures and algorithms and how they relate to biological architecture, architectures and algorithms. So we touch on some of what he has studied in that regard. But he also recently started his own lab at cmu, as I mentioned, and he has plans to integrate much of what he's learned to eventually develop autonomous agents that perform the Tasks that we want them to perform in ways that are at least similar to the ways that our brains perform them. So we discuss his ongoing plans to, quote, unquote, reverse engineer our intelligence to build useful cognitive architectures of that sort. We also discuss Iran's suggestion that, at least in the neuro AI world, the Turing Test needs to be updated. So the Turing Test is this famous benchmark test proposed by Alan Turing, someone as a thought experiment joke, apparently, that if you can trick a human, rather that if a computer can trick a human into thinking that the computer is a human, it passes the Turing Test, which means that the computer is thinking. It's been debated whether this is a good test over many years, but it is the thing that people keep coming back to when trying to assess whether an artificial system is a thinking system or at least a good artificial system. And Aran thinks that we need to update this. So the original test was just about the behavior, whether a computer could trick a human. And Aran's point is, in the neuroai world, where we're trying to build models that mimic human behavioral output, human function or biological function, not necessarily human, it's important to also compare the internal representations between the systems that we build, but also between the species that we're testing them against and between individuals within the populations of those species. And I think I just said internal representations. But that's what he wants to compare, what he calls internal representations. And by that he means simply the population level activity in the neural networks or whatever system you're using to build it. So not representation in the sense of mental representations and philosophy of mind or other philosophical notions of the term representation, simply the activity of the populations of units, for example. Thank you to my Patreon supporters. Thank you for the ongoing support from the transmitter. Hope you guys are well. Hope you enjoy my conversation with Aran. Aran, I was just. I literally was just looking up Dan. Yemen's was episode number seven of this podcast, which was like, you know, 100 years ago or something, but. And now here you are and you matriculated partially through Dan Yamon's lab. You went fast, it seems. [00:05:47] Speaker A: Yeah, it felt like time flew. So I, I started in 2016, my PhD with Dan and graduated in 2022. So a couple years ago with him and Syria. [00:06:00] Speaker B: Yeah, yeah, in Syria. Ganguly, this is not good because I think I got my PhD. I think I earned my PhD in 2016. And look, you're way, way ahead of me. [00:06:12] Speaker A: Well, you know, it's been definitely a ride that's for sure. And things have changed so quickly, right. At least in AI in those years. I remember like when we started and it almost feels like talking about the old times, but really wasn't that long ago. So like when I started my PhD, TensorFlow was not yet out and we were using Theano, so. And Keras had just come out. So as a master student I was like contributing to that a little bit, which is fun. [00:06:46] Speaker B: Did you have a background in math and computer science? [00:06:49] Speaker A: Yeah, that's right. Yeah, so I did. I did my undergrad in math and symbolic systems, which at Stanford was like basically the cognitive science major. So I was always interested in the brain. I just didn't know what to do with that interest until later on basically. And, and then I did a master's in CS in AI to kind of transition to like a slightly more empirical field. And by then I kind of had the. I knew like, so I took. Actually I had a meeting with, very graciously actually by Bill Newsom and he had mentioned that, oh, we need more people with like math backgrounds basically back then. Back then, right, to like contribute to neuroscience. And he actually referenced the Diane and Abbott, like the theoretical neuroscience book. And so I was really excited. I was like, wow, like there's a whole thing like information theory and there was like some machine learning, a little bit of machine learning there. And so I was like, well, okay, I should get myself familiar with that a lot more than like number theory and logic and like theoretical computer science, which is more of my background. And that's, that's what led me to do like a master's in AI actually to prepare to do basically theoretical neuroscience at the time. [00:08:13] Speaker B: I mean, we could go many different ways here. But so, so I mentioned Dan Yamins because those early convolutional neural network models that accounted for brain activity were one of the things that got me into an interest, right, in using AI models to study brains. And it was one of the early successes. And you were right in the thick of it in the early days. And coming from your background, that mathematical computer science background, it's such a computation, it's so non biological, right. It's like going towards biology, uh, and now you. But you've. What I want to ask you is how you've come to, you from perception to embodied agents and come to appreciate what the brain, how you think of what the brain does and studying the brain. [00:09:15] Speaker A: Totally, totally. Yeah. So like I, like I mentioned, right, I, I was super gung ho about like basically applying math like, like very theoretical thinking to questions, whatever they may be, to the nervous system. So that was me as like a senior in college. Junior or senior in college. And I didn't know how to do that. But when I saw these books, there was like at least some hope and what became kind of quickly clear. So there's this book that I started to read called Spike. So around that time, basically as a senior, as I was kind of transitioning to do a master's, I was like, well, I should, I should actually like work in a neuroscience lab to really start to like speak the language a bit more of biology, right? Because I've done nothing but biological at this point. And so I was very lucky to have the opportunity to work with Steve Backus at Stanford. So he's a very famous retinal neurophysiologist, but also computational neuroscientist in his own right. And he always encouraged thinking mechanistically. Don't be led by just the numbers and the fanciness of the technical model. So that was really helpful in my own development as a very math and CS focused student to engage more directly with, with biology. And around that time, so you mentioned like convnets. So I actually, I actually didn't hear about Dan's work until like we did a lab meeting in the Backus lab basically about, about that paper, which by then was already like a year, year and a half later. And. [00:11:11] Speaker B: What does it mean to think mechanistically? What did he mean by that? [00:11:14] Speaker A: He meant like literally like what part of your model is corresponding to biology? And what is the question, what's the scientific or biological really question that you're trying to answer than just like making a more technically fancier model that might. [00:11:32] Speaker B: Predict what does it do, like Mars computational level kind of question? [00:11:38] Speaker A: Even more so. Like in the retina you have multiple cell types, right? And so how do these cell types, for example, talk to each other and collectively yield a particular type of response to particular type of stimulus? So like, you know, there's different types of bipolar cells, there's amacrine cells, and then there's ganglion cells. There's horizontal cells at the beginning there, which are more linear in their response profile. So it's like it's a very. At least in the retina, this is very clean and clear mapping to a particular computational model and each of those components. [00:12:19] Speaker B: And what would the opposite of. I'm sorry, but I want to know where he was coming from and what you took from it. What would be the opposite of that, let's say in the retina. Right. Non mechanistic thinking. [00:12:30] Speaker A: Totally. So it's harder to do this in the retina, I think because there's just so much ground truth. But the non mechanistic approach. But you could imagine fitting, I guess just to put in the language of today, you could imagine fitting maybe a transformer to that data to just retinal data and then being like, well, okay, you did a direct data fit and on held out response patterns you do well and that's it. [00:12:57] Speaker B: Massively predictive. [00:12:59] Speaker A: Massively predictive. And then there's no clear. You don't check if the internals at all develop anything that maps onto like the interneurons that are there in the retina. That sort of thing. [00:13:12] Speaker B: Yeah, okay. Okay. So sorry I interrupted you. You mentioned Spikes also and I feel like Spikes is one of those books that is sort of like Godel Escher Bach used to be or something. So there's this massive tome by Douglas Hofstadter called Godel Escher Bach G E B. Oh, he's going to grab it from his shelf. There it is. [00:13:35] Speaker A: I have it. [00:13:36] Speaker B: Have you read the who? [00:13:38] Speaker A: No, I read parts of it. It was very popular in college. [00:13:40] Speaker B: Nobody has read the whole thing. [00:13:42] Speaker A: Nobody has read it. Yeah, I can't say I have. Yeah. [00:13:44] Speaker B: Okay. [00:13:44] Speaker A: But I have it still. [00:13:45] Speaker B: All right. So that's another influence on you. Anyway, I was gonna say Spikes is one of those books that has been a major influence. And there's the Spikes book. Yeah, but that was like 1992, 1999 I think. [00:13:57] Speaker A: 1990. Yeah, 1998. My copy says 1996 actually. But MIT Press paperback edition 1999. So. So. [00:14:06] Speaker B: Oh, I got, I got it right there at one point. But, but, but it's so non brain like that book. Right. Because it's all information theory, et cetera. [00:14:16] Speaker A: Right, right. So, so it's. It was so. Okay, so we were talking about comnet. So before we act like. So around that time it was, it was, you know, ImageNet was like the ImageNet benchmark was a thing. Right. And comnets were the main way to do at least neural network based vision and were very promising. Right. So that's the context at this time. And we were thinking in the lab. So lane and Lane McIntosh and Miro Maheshwar Nathan were working on. They were taking the first deep learning class basically on comnets at this time that was taught by Andrej Karpathy and Fei. Fei and Justin Johnson, I believe. And so as their class project they Were like, well, let's build a convolutional model of the retina. It's a shallow neural network at the end of the day, let's build that. And by the time I joined, so that would have been during the semester, and then I joined in the summer as a master's student as part of that project. And one of the reasons why it felt very motivating to work on that was actually because of Spike. So I. I was drawn to Spikes because it was very mathematical and it was related to the system that I was going to be working with, the retina. Right. And I was like, wow, there's all this beautiful, like, information theory. And, like, this is going to be great. [00:15:47] Speaker B: Mathematically tractable. Something you can. [00:15:49] Speaker A: Yeah, exactly. Something you can do. And, like, very much spoke to, like, my own, what I was familiar with. So it was a great way to bridge my interests, actually. But one thing that really stuck with me in Spikes was a sort of like a passage there that was like saying, you know, the natural scenes have many, many parameters. So we've basically. You know, you can prove optimal filter guarantees with things like white noise, what's known as Booskang's theorem, with optimal linear filter. But, you know, with natural scenes, not only do you not have that guarantee, but, like, unlike a lot of the stimuli that we tend to probe in the retina, which are controlled by one parameter, so like the intensity of the light, and it's a 1D stimulus that varies in time. So like high intensity, then step down to low intensity. You know, we don't have. There's just like an infinity of parameters that you could imagine could control natural. Natural scenes. And so that hampers our. Our ability to understand the circuit as a result, under that. Under that condition of natural scenes. [00:16:56] Speaker B: Natural. Yeah, yeah. All of a sudden you're in the real world and you run into a hammer. [00:17:01] Speaker A: Yeah, that's exactly right. And so, yeah, it's like, you know, it's very surprising, actually. I think I still. I have it highlighted here. [00:17:10] Speaker B: Oh, come on. Now you're just showing off that you've engaged with it. [00:17:14] Speaker A: Yeah, well, as a student, it was just like I had no other reference. And this was like, the only way I could really start to make a connection with what I already knew. And so it's really powerful. And then I just didn't. I just didn't read the rest of the book at that point. [00:17:25] Speaker B: Oh, really? [00:17:27] Speaker A: Yeah, yeah. Because I was like, well, it's not doing. It's not, you know, it's Very faithful in saying that these aren't maybe the right set of tools to engage with natural scenes, but con nets were that tool. [00:17:41] Speaker B: Okay. Oh, okay. And it's weird though that Spikes is kind of the polar opposite of Godel Escher Bach, because I think of Spikes as dry and clean and Godel Escher Bach as meaty and wet somehow, although it's very like a lot of people say they got into computational neuroscience because of Godel Escher Bach. So it's I guess in that way that they are alike. And I don't mean for us to go down the road of comparing and contrasting the two books and styles, but. [00:18:12] Speaker A: Yeah, yeah, no, absolutely. I mean, I think so early on I would say like when I was in college it was all about cognitive science and philosophy of mind and that's. And those are just incredibly deep and interesting topics. So I actually took a philosophy of mind course with the late Ken Taylor, who was a philosopher at Stanford. The only actual African American philosopher on the faculty there is really remarkable man and a very clear orator of the problems and issues about the mind. And one of the things that always stuck with me in his class was like, you know, you and I all have minds, but like we don't have access to how they work. Like you don't, it's not like you're in your head and you're like, oh, my visual cortex is, is relaying information, you know, to this other, you know, to my prefrontal areas or something. Then there's this thalamocortical loop that's now active. You know, you're not doing any of that. You're not even aware of it. And yet we all have minds. And because we're not aware, weirdly, we don't have access to their internal workings. We don't really know. Like there's all these mysteries as a result, despite of us having minds. And so I know I all to say, right, like I, I was like. What pulled me in initially was that like obviously studying the brain is the deepest philosophical object, you know, you could you study and it's like about the nature of our condition, yet it was so mysterious and the only way to really engage actually. So I, as I mentioned a little bit earlier, I was in the more the logic and philosophy community initially. And one of the things that was very interesting was that there was like a, an annual like logic conference that would happen at the center for Language Information csli, it's called at Stanford. And I would go as like an undergrad student. So I would go to the logic, the graduate logic seminar and then I'd go to this event. And one of the, really the things that stuck with me around that time when I was starting to transition to neuroscience was the logicians were saying, well, look like we have all these theories in the philosophy of mind about how the mind and brain should work, but who. Like we can't. Unless you do an experiment, like no one, there's no way you're going to answer. Philosophers were saying that actually a logician, his name was Peter Kohlner, he's at Harvard now, is a set theorist, actually works on those things, but has some interest in philosophy of mine and was saying this, which really stuck with me. And so that's kind of why then kind of went away from the Godel, Escher, Bach stuff to spikes and more, you know, more like drier science. Yeah. [00:20:44] Speaker B: Okay, so you. Okay. By way of story perhaps again. So you were there in the early days of the convolutional neural networks and fast forward to today and you want to put. Let me see if, let me see if I can state this and then you can correct me. I'm going to state it incorrectly on purpose. You want to build a cognitive architecture of four or five different types of deep learning models, esque things, put them together, have them talk to each other and build working agents that are behaving. There's, there's so many ways I could have said that and that was terrible and I'm sorry, but correct me, that's. [00:21:33] Speaker A: No, that's, that's right in the, in the, in the primary essential. So. Right. Like, but what's the motivation I guess to begin with, which is that the kind of grand challenge right now in AI, even if you don't necessarily care about the brain in particular, is generalized embodied intelligence and the ability to ideally. [00:21:53] Speaker B: Is embodied part of that. [00:21:55] Speaker A: It is. So it is the ultimate endpoint, but it doesn't. I think a lot of the major conceptual issues are actually non embodied. In other words, they're like just building even digital agents that can for example, not go into self loops and can plan and reason and adapt to new situations. And I'm saying this in ways that are clearly things that animals and humans do very well is they're lifelong learning agents and that's really what we want. So I think a lot of the core software issues or the cognitive architecture issues will have to be even addressed in these non embodied contexts. But embodiment is the ultimate goal. But I think, I mean, obviously there's details there about robot hardware and things like that that maybe are not necessarily the core focus. So I agree with you that it's really more about this kind of cognitive architecture and making sure it works in open ended settings to have these lifelong learning agents, but also that we can use these as providing insight both about whole brain data that we're able to collect now and are emerging, but also leverage that cognitive inspiration to build these more general purpose agents ultimately. [00:23:15] Speaker B: Let's get into reverse engineering. Right. So maybe you can I bastardized a summary of what you're up to these days and we'll talk about neuro AI Turing tests later. But so what is your cognitive. Do you consider it a cognitive architecture? [00:23:34] Speaker A: I consider it a cognitive architecture for two reasons maybe. And gets back to your point about asking about what reverse engineering means. I think there's many different definitions and I can kind of tell you what my working one is. [00:23:50] Speaker B: Yours is going to be the Jim DeCarlo one, right? That's my guess. [00:23:55] Speaker A: Yeah, perhaps. I think it's closest to that. Absolutely. And I think it's really about understanding the relevant aspects of biological intelligence, those details that are useful for intelligent behavior. So it's not about necessarily full brain emulation or emulating every biological detail like say the Blue Brain project for example. It's more about like just isolating the abstractions from biology that are, that are basically hardware agnostic algorithms that you can then that are implemented in brains but also can be run in hardware and abstracted into machines. So what does that really mean in more concrete terms? Well, it usually involves matching the population level representations of a model, its activations to a neural population activity. And we found, you know, it could have been any biological observable. Right. The brain is a complex object. It could have been the dendrites and it could. Or it could have been the neurotransmitters. And maybe, maybe for certain questions that that is the relevant biological abstraction. But I think for like a lot of intelligent behaviors, empirically at least what we found with no, there's no necessarily theory here, it's just empirical observations and different brain areas and different species is that matching at the level of population activity is constrained by doing intelligent behavior. So there's a relationship between that biological observable and intelligent behavior. And to be honest, if you wanted a one word summary or one sentence summary of my entire PhD was just showing that this kind of paradigm of a task and an architecture and also a learning rule was a useful way to interrogate those kinds of questions across brain areas. And species that it wasn't restricted to macaque, ventral stream, for example, or human behavior, but actually a lot of these other brains and these brain areas in rodents and in hippocampus or higher cognitive areas, they can be understood through this lens of non convex optimization. Basically the devil was in the details of what those loss functions are and what those architectures are. And that's the language. [00:26:07] Speaker B: Okay, all right, so maybe it's worth mentioning here, I've mentioned it a lot on this podcast, but the reason why convolutional neural networks became so popular is because they are a multi layered deep learning network. And when you train them in a task on in the early days, visual object recognition and you look across their layers, you can actually match the what was then like the population level quote unquote representations in different layers and match them with different layers of what we think of as the hierarchy in our ventral visual stream that we think of as being important for object recognition. And you, there's been lots of work since then. You have done this work yourself, adding recurrence, et cetera. And so that's one of the modules in your cognitive architecture. [00:27:06] Speaker A: Yeah. [00:27:07] Speaker B: And then what are the other ones and why? [00:27:09] Speaker A: Yeah, so I should say that I'm not married to this particular set of modules. It's meant to be free. In fact, a core question is ultimately like by doing these types of comparisons of now agents, agent architectures, cognitive architectures to whole brain data across species as well, that we can start to understand if there's conserved modules, if there's a kind of like general purpose architecture that emerges through lots of empirical comparisons. [00:27:37] Speaker B: Like, you mean across species? General across. [00:27:40] Speaker A: Across species. Yeah, exactly. Basically across this kind of conserved species, conserved sensory motor loop, so converting inputs to actions. So that's why it's an agent. Right. You know, rather than one module is like one large scale set of brain areas, which is kind of how we've traditionally done comparisons in NeuroAI. But now what we want to do here is really engage with whole brain data that's coming online and start to understand how these brain areas interact to give rise to complex behavior. And so I think that's why the agent naturally fits in here. And the idea would be that some natural starting points for modules are a sensory module. So this not only vision but also multimodal. And there's some evidence which we can talk about at some point about how maybe they have similar actually loss functions, like self supervised loss functions across these different sensory modalities. So there's a Kind of unification there, but maybe they have different inputs, but that's something that we can talk about. So that's why I kind of group it as a sensory module. But then there is a kind of like future inference or world model, which I think is, is really the hardest, I think computationally the hardest part of this. Like I think we've made a lot of progress in sensory systems. Obviously there's, there's work to be done still there, but especially the type of work that still remains to be done in the sensory system connects with this world model. So basically being able to like have a model of the dynamics of the world of your environment and not like a model of real physics, like actual physics. Like you and I have the maybe intuition that like two objects will, will fall if one faster, if one is heavier than the other and even though we know they should both fall in a vacuum at the same rate. So I don't mean actual physics, I mean just intuitive physics. Like what's intuitive to us and how. [00:29:28] Speaker B: Does predict things, how does that differ from like the Tenenbaum physics engine approach? [00:29:33] Speaker A: So is, is certainly the, basically is the physics engine. It's just that the difference is that we aren't assuming symbols as the input. We aren't assuming like a program or anything like that. We're actually assuming unstructured visual inputs coming in, being processed by a sensory system. The output representation of that sensory system is then fed into the world model. So it's visually grounded basically or sensorly grounded rather than we kind of assume that there's a particular type of output format that vision gives us and then we proceed with that more symbolically. [00:30:11] Speaker B: Okay, gotcha. [00:30:13] Speaker A: And, and this is the why so why maybe is that distinction actually important for function? It's well because we operate, humans also and animals operate in unstructured environment in a wide range of environments. Whereas when you handcraft those inputs, you know, that might be useful for studying particular environments, but it's very hard to then generalize to the open ended unstructured environments that we all naturally engage with. So that's the key here is open ended being able to deal with open ended environments. So that's, that's at a high level. That's module number two is the world model. So sensory world model. And then there's like planning, which I don't know if it should be part of necessarily distinct, but just to distinguish the fact that like, you know, if you have a good world model, you can also. Planning is easier, especially long range planning. And maybe there's a hierarchy there of timescale. So you might plan at high level and then fill in the details. So at a high level you might have abstractions that allow you to do more longer range planning. So for example, when you do decide to get out the door and then go to the Mellon Institute, right. In that case you don't plan every step, you plan at the level of landmarks or large scale kind of things. And so that's what I mean. But then the rest of your body fills in those other details, the fine grained motor commands, et cetera. And that brings me to the last point, which is the motor module which then executes these high level commands and maybe in a hierarchical way. And then I guess maybe there's like a final module. If it's helpful to think in these ways. Again, I expect these to all be approximate. In the actual brain this is more for like conceptual clarity to frame things is the intrinsic goals. So in other words, how do we guide what plans we care about or select and then therefore the actions we execute? Well, we can leverage the world model by planning through specific types of actions, but those are guided by intrinsic drive. So unlike reinforcement learning in games like Go or chess where you have a very well defined reward function and that's where RL thrives, in real environments there isn't one. So animals do rely on things like different behavioral states, hunger, pain, et cetera that are both built in, but ultimately so that you know, you can think of, there's like built in intrinsic drives, but also more learned ones that might support more open ended learning to like seek out more information beyond their pre training data. So I just put that in the language of AI. Basically if we, if you want an LLM agent even to go beyond its pre training data, you want to specify the right autonomous signals, which is still an open question to do that and to adapt online. Yeah, so those are the five modules, right? [00:32:58] Speaker B: So sensory world modeling for now you say? [00:33:00] Speaker A: Yeah, for now, for now. So sensory world modeling, planning motor and intrinsic goals, how they're combined will matter. So I just kind of assumed a feed forward one for now. Of course I expect there to be back connections. But the other aspect of it is that not only these modules aren't fixed. Right. So the standard paradigm in AI is to fix things and then do it at test time, evaluate it. [00:33:28] Speaker B: Oh, oh, what do you mean fix? Like freeze the parameters. [00:33:32] Speaker A: Freeze the parameters and then evaluate at test time what we want to do. And ultimately obviously there's plasticity right in the brain is for these Animals to or these agents to adapt online. We need to specify update rules as well for those modules. So not only how they interact but also how they update online to new challenges. And so that kind of speaks to the learning rules aspect of NEURO AI, which I think is less touched on and we can touch on it about it too. And there's things to say there but. But that's the high level approach. So you can think of like for example test time reasoning that people have now where you're trying to get the LLM to reason online and either via chain of thought or trying to do this in a more an online setting rather than just scale up the pre training data like just get it to reason at test time. As a special case of this broader goal of getting these modules to be adaptive. [00:34:27] Speaker B: Yeah, let's talk about plasticity. You just said that it's not so much the focus of NeuroAI but has it gone out of favor? Is that a thing? Because for a while it was all about oh, back propagation. It's not brain like and we need to figure out. But there are lots of people working on synaptic learning rules etc. And I'll just jump the ship here and say like and you can correct me again. These modules that you say that are sort of up in the air but they don't have to have the same learning algorithms necessarily in different parts of the brain have different learning algorithms. That's something that you want to figure out. But people aren't focusing on learning anymore. [00:35:12] Speaker A: I would say not anymore. I would say it just wasn't Oftentimes in neuroai we're not modeling the developmental process. Right. And often you might hear phrases like oh, this back prop is basically a proxy for evolution or it's a proxy for evolution and development and we don't cleanly separate those things. [00:35:35] Speaker B: Okay. One way to say this is like we have something that works so we're going to use it and worry about it later is that it's like that. [00:35:43] Speaker A: It's also that like we're not explicitly modeling it and you can't. In the standard framework where you just train something of backprop and you get to the adult state of that particular brain area. I think of pre training is more like you can imagine you pre train these modules to get to a desired state but unless you have an agent, you can't actually go and test an update rule online with not through batches but through online interaction. Can you like more faithfully disentangle the evolution part where the pre training stops and where the module updating begins, the kind of more developmental aspect. So if we wanted a more formal computational grounding on development and we wanted neuro AI to engage with that, I think that's why the agent based approach would more explicitly speak to that by disentangling those two things, those assumptions that we're currently making and we're kind of lumping into backprop. And furthermore, you mentioned that there was a lot of work actually on biologically plausible learning rules. And there has been, and I myself have worked on it with folks where the big challenge say up till 2019 or 2020 was from 2016 to 2020, basically there was a flurry of activity to understand back prop in the brain, right. [00:37:03] Speaker B: It's almost to show that back prop happens in the brain, right? [00:37:07] Speaker A: Yeah, that's right. Because we all use it for training these networks is very successful. Right. And so the natural kind of inclination was like, well some version of it might be in the brain, we should go look for it, what would that be? But the trouble was that we couldn't scale any. So okay, what's the biggest. There's a few gripes about, which are very well motivated gripes about backprop as a non biologically plausible learning rule. I would say the main one, just to keep things succinct, is that the forward and backward weights are always tied. So in other words, an update, an error update in a feed forward network always involves the transpose of the forward weights of each layer. And so oftentimes when we think of implementing backproperties in the brain or as an update rule for a module in this kind of more embodied agent framework, we assume that would be a separate network that is basically computing the errors. And so if something like what's called weight transport is necessary, then that circuit would actually have to have an exact copy of the forward weights. And that's weird. At every time step you need that. And that just seems very like inconsistent with like the fact that biology is very noisy and messy and non robust to. I mean, you know what I mean? So it's like a very. [00:38:28] Speaker B: But it's very. But people like Tim Lillicrap have shown that you don't need to do it that way. And you can approximate back propagation. I'm sorry, this is a total tangent. But you can approximate it almost randomly with some feedback. [00:38:42] Speaker A: So that's the thing. So I actually that ended up being a. Was a control that they ran showing that it shouldn't work. That's What Tim told me, actually showing it shouldn't work, and then it did on mnist and they were like, well, we should investigate this. It was very, very interesting. [00:39:01] Speaker B: Oh, that's great. Yeah. [00:39:04] Speaker A: And but one thing that Tim. So one thing that motivated me to work on it was actually Tim gave a talk at like Cosine 2016 that they tried to take their feedback alignment algorithm, which is this thing of replacing the backward weights with random weights, and scale it up to deeper architectures and on harder tasks than MNIST. So like CIFAR 10, CIFAR 100, Imagenet, and especially Imagenet because that was the main vision data set that if trained with backprop, gave you neurally plausible representations that actually predicted brain data, not mnist, for example. So they wanted to scale up to that. And he was saying, look guys. [00:39:42] Speaker B: The. [00:39:42] Speaker A: Moment we do this, like it just fails, like at these harder tasks, there's this bigger and bigger gap that grows with the performance of backprop. And so, you know that at the time I didn't know what to do with, with that. It was very interesting and I knew I wanted to work on. Wasn't until like 2019 or so that maybe there was some evidence that like updating the backward weights though with what rule? That's the, that's the key question. Could start to patch that up, but not completely. And so with Dan Kunin and Javier, we like did work and Surya and Dan we like developed a broader language of like, you know, basically like very looking through the space of update rules and on the backward weights. So rather than keep them random to update them, and once we had a kind of like library of primitives based on things like energy efficiency, improving the communication clarity between the forward and backward updates, that sort of thing, then we could do a bit of a search there. And we started on smaller scale experiments, finding just what was working and starting to scale with imagenet that we could then find that ultimately through a larger scale search that something like an OHA style update. Though not exactly, but just to kind of summarize it, an OHA style update was good with a few. So anyway, once you had a language for it and update rules, you could finally close this gap. And the main other issue was that even if you close the gap on one architecture like Resnet18, as you went to deeper models, the same hyperparameters of your local learning rule didn't actually transfer. And this is actually unlike sgd. So in sgd, in backprop, that transfer does occur. And so that would be really unfortunate. From an evolution point of view, if every time you create a new organism you have to do another search of the hyperparameters. So we found actually robust primitives that were robust, hyperparameter robust. And then you could transfer two very deep architectures as well on ImageNet. So once we had that, then there was an N of one example of here's a learning rule that's vector error based but doesn't require the weight symmetry that backprop needed. And so then maybe that starts to become a plausible candidate to look for in brain data. [00:42:23] Speaker B: Okay, all right. But it's not the, not the hot topic these days, I suppose. [00:42:27] Speaker A: Not anymore. Yeah, I wouldn't say it is. And I mean there was a kind of question of like what brain data should you measure? And so we had some work on that using artificial neural networks. That was follow up work and showing that the activations are actually enough. So related to my earlier point about how model activations correlate with intelligent behavior, tracking those changes across time also correlates with better identifying the learning rule in artificial neural networks where you have ground truth and you can weaken that with noise and limited observations to start to mimic what we actually get in the brain. And it's still robust to that. Unlike synaptic weight changes which are the more natural thing to look at. And activations are much easier experiment to do. You just do E Phys. [00:43:09] Speaker B: Right. [00:43:09] Speaker A: As opposed to tracking the dendritic spines, which is a much harder experiment. There is an open question of can we go and validate this in data? And we have some evidence that it's a much easier experiment than previously thought. But yeah, as you say, it hasn't been the main focus of the field, nor at least of my own interest at the moment. Yeah, okay, but I think it's important, I think once we get. Because basically I want to get to an agent architecture that does work, but then we can then start to study those questions of the module update, rules development and those things. So those are interesting down the line. They're just not the immediate interest because there's just kind of like upfront challenges to begin with to get to the modules to some initial state that's actually good and they combine well to begin with. [00:43:53] Speaker B: So you want to make an agent in a, in a robot, right? Or is that a more longer term. [00:44:00] Speaker A: The robot is a longer term. Right now it's, it's, it would, it's all in sim with like biomechanically realistic bodies. So biomechanically realistic bodies also have like A much larger number of degrees of freedom than current robot bodies. So. So they're about like. So let's say spot is about, I think, 10 degrees of freedom or something like that. But of course they have lots of low level control. There's a beauty in the hardware stack, but we wanted to kind of more focus on the, on the actual like, biomechanical control aspect of it, the high degree of freedom, biomechanical control aspect of it. So that's why we're doing it in sim, where you don't need hardware to approximate it. You can actually be more exact about it. And because a lot of neuroscientists like do VR experiments, there's no gap in the eval. Like you can literally take the same stimulus in SIM and just put in a new simulated world that matches what the experimentalist did. And experimentalists like it because like sim stuff because it's very controllable and repeatable. And so you can now it close the gap with the evaluations as well, right? [00:45:02] Speaker B: Yeah. And it's not real. I mean, you're very well aware of Moravec's paradox that the hard things are actually kind of easy. So we can play chess in computers really well. And the things that we think are easy, like ping pong, are hard. Right? Like physically doing ping pong, because you can simulate it with however many degrees of freedom that you want in an agent and you can control it, but then once you get in the real world, it's all of a sudden hard. The reason why I'm bringing that up is because one of the things I wanted to ask you, which I think is related is, you know, there have been a lot of. Since the early gofi AI days, there have been a lot of cognitive architectures, right. And it's kind of transitioned from symbolic. Then you have like hybrids like Chris Eliasmith's spawn, and you have Randy O'Reilly that are making like really more connectionist type, cognitive architecture type things. But without fail, I believe maybe this is not the case for Randy O'Reilly, but everyone has. Not everyone. A lot of the people who have worked on these things have suggested that it's not so difficult to actually get one module to perform well. It's actually the crosstalk between modules, the control between the modules, that's the difficult thing. And it seems to take way more effort than actually getting the modules themselves to do what you want them to do. So is that on your. [00:46:42] Speaker A: So I actually agree, but I would say that it comes with a nuance though, that it depends on your goal. So if your goal is more open ended unstructured environments, it is also a challenge in itself to build the modules. So like the world model or it took a while to even get like a good sensory encoder. So if you go beyond like particular tasks to like more open ended tasks, it's already a challenge in itself to get to that pre training, right? Like we didn't have good SSL things till like a few years ago loss function. So and advances in that in better SSL algorithms also led to better brain models in for example mouse visual cortex where it's not categorization optimized. So all to say definitely connecting the modules is not trivial. But even constructing the modules especially I think nowadays with the world model and figuring out what those representations even should be is actually I think still a core challenge. So to give an even more concrete example for today, a very common approach right now is vlm. So vision language models and we can go and collect tons of robot data in the world. Like we can drive Teslas, people drive Teslas and they have a century's worth of data. It's not hard to get lots of data. Actually it's not the bottleneck, it's that the scaling laws have not been as favorable on those types of data with the existing BLM architectures. So even if we wanted to go back to the sensory module, right, like the scaling laws haven't been as favorable as they haven't been in language. And I think part of this is the architecture itself. And in particular the way we tokenize, in other words, the way we process the inputs to give to these specific VLM architectures are random patches. And so they end up basically learning something like a convolution, which doesn't end up being ultimately a step change or an advance over what we had with CNNs, they're actually converged. So it's sort of like the Platonic representation hypothesis. They're like converging on very similar representations. And also as a result, like you know, vision transformers are also very similar matched to the brain as CNNs are as, because they're effectively approximate and convolution. So and I think that in language though, the notion of a token individual constituent words is very much semantically related to what that modality to that modality is trying to do. Right. Combining words gives rise to the higher meaning. But the patches themselves are like we don't have a good, you know, there was prompting language for vision yet. And I think there's still like an advance there to be made even in that domain. [00:49:24] Speaker B: So are these the sorts of issues that made you start focusing more on what would what you have written as Mars algorithmic level rather than so Mars 3 levels. Right. And everyone for the past 10 years has been folk have been focused on the computational level and that's what AI focuses on. For example, you, you have written and many people have that, oh, the reason why these models work so well is because we give them a goal, give them a task. Right. And that's the computational level, something that they need to accomplish. And then the algorithmic level, how they accomplish that algorithmically is somewhat less important, but they can learn them and that's why these models are so great. And then the implementation level, who cares? Just you have to put something in there and eventually it'll give rise to it. But so are these the sorts of issues that have made you focus more on that algorithmic level? [00:50:24] Speaker A: Yeah, so I think just to even translate what you just said to Neuro AI, the computational level is the task and the algorithmic level is related to the architecture, but also the interactions between the task and the architecture too. And so in some cases I think like, you know, it's not just about specifying the right goals, it's not just about figuring out the right self supervised objective like next token prediction or contrastive learning, that sort of thing, which is an advance in itself. It's also figuring out the architecture that meshes well with that modality. I think there's a lot of promise in using transformers or like the token based kind of paradigm because it is a little bit modality independent. In other words, it's kind of general purpose, you just swap in different data. But I think with that generality, I think not all sensory systems are necessarily equal in that way. I think there's nice, maybe there's a lot of shared explained variance between them and we have some evidence for that. But I think going forward, if we really want to get things that are better at intuitive physics, where a lot of the models lack in terms of human capabilities, we will likely have to start to become more specific about how we at least to put in the language of today, tokenize or process inputs that are more vision and embodiment based and they may just be different than language. And this is probably consistent with how maybe things are in the human brain where the language areas were evolved later and are a little bit like topographically distinct from what visual cortex looks like. [00:51:57] Speaker B: You don't have language in your modules yet. Right. [00:52:02] Speaker A: So the way one More thing is. [00:52:04] Speaker B: You'Re concerned about also comparing across species. And given that humans are the only species that use language, I know that's arguable, but let's just say might not be what you're after. [00:52:16] Speaker A: That's right. And I think that's maybe why I think that there's a core underlying desire of us wanting algorithmic desire of us wanting these agents to better understand the world. Animals certainly build models of the world without language, and that's already hard. So this relates to the coming up with a prompting language for vision for VLMs that's better than this kind of like random image patch token thing that I think would better speak to the kind of visual intelligence that animals have that, that could, that needs improvement in existing architectures today. But I think where language can play a role, even if you're modeling at that level, is at least you could argue that you're using language when you train a model on supervised categorization. What do you mean? You're providing labels for the images and so you could make that argument that language is maybe a useful guide to learn representations even if they're not in a linguistic context. So I think, for example, if we're stuck on a certain question, we can't come up with the right self supervised objective that we think an animal could implement, could be implementing plausibly or related to that, a better prompting language for vision that isn't language based. Then I think it's fine as a proxy at the moment to use the kind of the less good but still gets you somewhere supervised language conditioned version of that loss function before you kind of figure out the self supervised thing. And I think that's fine. It might actually get you quite far. And I think we shouldn't throw that away either. It's just not used in a specifically linguistic context. It's really more used for guiding representations, which is what we've been really doing. Language has a remarkable ability to teach machines how to reason like us or for us to communicate that basically yeah. [00:54:11] Speaker B: But one of the things that I enjoy about your approach is that you appreciate the sensorimotor loop, right? And the agentic aspect of intelligence and existence. But then like earlier you were, the way you were talking about it, it sounds very much like a input output brain is a computer metaphor kind of thing. And you started in vision, you started in sensation and you've come to appreciate the motor aspect of it. Some people would turn that around, like active inference people and say that actually you're behaving to adjust Your sensory input. Where do you land on that? [00:54:54] Speaker A: Oh, totally, yeah. So this relates to the online. To why having an agent is actually the way, I think to studying questions about development or online learning rather than the existing way, which is just to lump it all into a single backprop update. Is that. Yeah, this is also related to the goals, whatever goals it's using to guide its exploration in the environment. [00:55:18] Speaker B: The intrinsic goals, which is where do those come from? That's another thing I wanted to. We'll put a pin in that. [00:55:24] Speaker A: Yeah, yeah, no, I think that's an interesting question on its own. But whatever those goals are, they're going to. They're going to alter the training data. And one of the reasons you want the online update rules is to adapt a distribution shift online. So if you're going to go beyond your pre training data and you're going to explore your environment, you have to also adapt to the fact that you're going to encounter out of distribution things simply by exploring the world a bit. And so in order to handle that in a robust and reliable way, you'll also need an update rule. Where. And the exploration strategy is guided by these intrinsic goals. [00:56:02] Speaker B: So where do those goals come from? [00:56:05] Speaker A: So that I think is unclear fully. Like I don't think there's a definite answer yet, but we have. Yeah. [00:56:12] Speaker B: I mean it's the computer scientist way. Right. Is to then program in the goals. Is that achievable? Can you program in the goals? Whatever. Because it seems to be this mysterious central core of our existence biologically. Right. Is that we have these intrinsic goals. No one knows where they come from. It's an internal reference signal that we have to follow. We want to be at homeostasis, but it's kind of a mystery and a computer programmer wants to like just. All right, program in the goal. Right. Is that achievable? [00:56:46] Speaker A: So I think it's less like it's actually this is related to, you know, things like reward hacking that people talk about these days in modern agents, which is that like we might think we're programming it in by specifying it, but actually because these things are optimized, just specifying a particular goal might lead to unexpected behavior. [00:57:05] Speaker B: An emergent kind of behavior. [00:57:09] Speaker A: Exactly. Emergent. Either desired or undesired behavior. That's harder to. If we were dealing with computer programs quite literally, then yeah, it should. I mean even then, computer programs can lead to unexpected behavior too. Right. You just didn't fully anticipate, as the creator of the program, all the possible things that could lead as an outcome. Right. So AI safety people have used the paperclip maximizer as the unexpected version of this, that it maximizes paperclips and then it's like, well, I should just take everything, take over. Right. And that's an unexpected outcome. So even when you knew it was very myopic and very specific, it can lead to that. But to your broader question of like where, where are these goals? I think, I think it likely they could be distributed. So I mean the obvious candidate for everything that someone doesn't know is to say it's prefrontal cortex. But you could also imagine that. [00:58:04] Speaker B: But a lot of organisms don't have prefrontal cortices and that's right. Plants. [00:58:11] Speaker A: That's right. Actually so related to other organisms. You know, there's beautiful work by Misha Arendza's group at Janelia that shows that zebrafish have a kind of futility induced passivity that's actually computed in non neuronal cells in astrocytes. [00:58:31] Speaker B: Oh really? That's cool. [00:58:35] Speaker A: And so we're actually working with them, studying this, related to this intrinsic goals question, trying to figure out what these intrinsic goals are. And we should have something else soon. But the main thing is that, that like, you know, I think, I think these goals can be computed in a lot of different ways and a lot of different parts of the brain that isn't just PFC or something neuronal, but even in other animals and non neuronal cells potentially. And even, even if it is fully neuronal in other higher species, it could be that that's done in say like large scale, like say thalamocortical loops, et cetera. I don't think it's localized. In other words, like, in fact the evidence that we're seeing is maybe that it's not so fully localized, but that animal behavior is still very stereotyped. In other words, from the kind of decade of neuroethology with machine learning that's applied to naturalistic videos. Going back to our naturalistic discussion in the beginning, people have found that these ML tools are kind of auto discovering behavioral primitives that seem to be reliably switched between more or less. And so it does kind of, that's kind of what motivates this kind of like hardwired intrinsic goals thing that I'm mentioning here is that there are unlike end to end RL where it's like one objective, there's probably multiple things that it switches between and dynamically state switches between and that might not necessarily have to be localized. Those individual goals might be represented in different parts of the brain. [01:00:05] Speaker B: Let's talk Turing Test. I mean, is there. We're all over the place and I apologize. That's my fault as the host. But it's fun for me. It's fun for me. Maybe before we move on, is there something that you want to add on the cognitive architecture? Slash. I keep calling it cognitive architecture. Sorry. So the agent embodied agent that you're. [01:00:28] Speaker A: Working on, I call it a cognitive architecture too. But I think maybe the main difference is that what we really care about are open ended, unstructured environments that like humans and animals are in and actual like not just compare. So open ended tasks and also like comparison to not just behavior but, but internal representations as well. So comparing the individual modules to the individual brain areas and then interactions between modules and online updates therein to developmental signals down the line. But, but, but this actually nicely segues into the Turing Test because right at the ultimate root of this is quantitative comparison. Right. With either. [01:01:10] Speaker B: Before you start talking about. So you've put out, there's a manuscript, it'll be linked in the show notes. Is it called the neuro AI Turing Test? Yeah. Okay. Which modernizes Alan Turing's Turing Test, which he didn't call it the Turing Test, but it came to be known as the Turing Test. Where can you fool a human? If you're a computer, can you fool a human to think that you're a human? And so that was very focused on behavior. And you're saying, yes, that's great. But in the neuro AI world we actually need to also compare the internal quote unquote representations, which we'll talk about what that means. And so you have the behavioral comparison, but also the internal representations comparison. And that should be a benchmark of sorts. [01:02:00] Speaker A: That's exactly right. And the key principle is that for any measure of comparison you want, you want your models to be as good as brains are to each other in the context of internal and behavioral representation. So what does that mean? [01:02:13] Speaker B: Explain what that means. [01:02:14] Speaker A: Yeah, yeah. So basically right, we expect that. So, okay, there's two issues here when doing these model brain comparisons at least, which is one is that brains are stochastic. So unlike our models, which are deterministic, they respond variably to the same stimulus. And that's been well quantified up till now as the internal consistency of the neurons. So the statistical noise ceiling that we call in the paper, and that's often been either either 100% is used as a ceiling or implicitly, or that more quantified metric of the internal consistency of the stochasticity of how neurons are consistent to each other across trials has been used. But the other aspect of it is that that doesn't as cleanly semantically map on to how we actually compare models to brains. We're mapping one representation to another. And this statistical noise ceiling isn't really capturing that. It's just capturing the stochasticity of neurons, which is good. We want to capture that. But it's not the only thing we want to like, measure as a ceiling. So if anything, it actually becomes a correction term later on. But the main thing we want to capture is the fact that, like, even if a brain was deterministic, right, Even if the stochasticity didn't matter, if you took two brains, there's just going to be variability between them and there's just like within, you know, individuals in a given species fix the same brain area, fix the same stimulus, there's just going to be variability in how they process that. So when a model is imperfect as a match to the brain, we need to be able to disentangle whether that's because it's actually a poor match to the brain, truly, or because there is inherent evolutionary variability between brains that we need to account for. That's why we emphasize. So whatever metric you use of comparison, we can talk about that is under that metric, you should also do a comparison of the brains to each other as though the brain was another model. [01:04:20] Speaker B: And then the idea that you have one model against lots of different brains, multiple models against lots of different brains within a species, for example. [01:04:31] Speaker A: Yeah. So it's still used in the context of integrative benchmarking of having multiple models to at least one set of brain data. So one brain, you can of course go, it's good to then maybe do cross species comparisons down the line. But this is at a minimum, the base thing you want to first straighten out is the model to brain to single brain comparison question. And then to generalize it is just applying that same procedure over and over. So once you've established that procedure, what that ceiling should be, it's going to be the same one that applies to other brain areas, other species as well. [01:05:10] Speaker B: So this is where your mathematical and logic background really comes through because you really specify how these comparisons are going to be made theoretically. Right. And you were just talking about the metrics. So let's talk about how to compare these. Like you leave it open and say you can compare any metric of Whatever you use for the representation. I just did air quotes. [01:05:38] Speaker A: Yeah, yeah. I think that's very totally appropriate for this because. Right. The question of what metric to use is certainly an important one I think oftentimes as a field and certainly I have as well, there has been an implicit assumption that there's one platonically good metric that we ought to strive for of model to brain goodness. And this is also even reflected in pre neuro AI days where we wanted brains. We had different notions of brain likeness in words like sparsity, energy efficiency, things like this that we wanted our models to have. When we assume that even in this more quantitative setting that there's a platonically good notion of a metric, then in some sense it's like we're saying up front, well we know what, what it means to match the brain well and so you might as well just bake that in and then you're done. Right. Like I mean the problem is that that's not the case. Right. That we don't know up front what a priori what a good brain model should be. We have data and we want to match the data. As good as brains are to each other under that the assumptions of that data collection process, that's the empirical reality of it. And I would also argue that given that the brain is a complex object, there's different things that people focus on. Like we were talking about how people have different definitions with NeuroAI. It's like some people really do care about topography for example, which a CNN is not a 3D spatial map. Right. It's 2D. So under a topographic metric it would be zero, effectively under that. So different people have different things for their question. So I don't actually even think. Not only in platonic reality is there not a platonically good one, but it's just highly question dependent. [01:07:32] Speaker B: Anyway, you keep saying platonic here and I know that you've thought about this. There's. What is it? The platonic hypothesis that's been floating around. What is that and why do we like or dislike it? [01:07:43] Speaker A: No, I just meant platonic in an overall notion of good that we should all strive for. [01:07:51] Speaker B: Yeah, yeah. Okay, then let's not go into the Platonic hypothesis then. It's too far an aside. But you mean like an ideal that. [01:07:58] Speaker A: Is an ideal that we should all be striving for. And I think that basically there is is no such thing as an ideal, it's just question dependent and the brain is very complex so you might focus on different aspects of it. And all we're trying to do here is just say there's, we want to standardize that operationally and say, well, whatever metric you choose for your question, make sure that you assess model goodness up to how brains vary under that measure. [01:08:24] Speaker B: Okay, so it's in some sense it's extremely pluralistic because it allows the user. Here's what I want to ask is like how do I pass your neuro AI Turing test and how do I fail it? Right. So I can come with my own question. As long as my question, as long as I adhere to the scientific rigor of the neuro AI test, I can come in with any question or any assumptions as long as I state them. So I can. It almost sounds like I can pass it if I want to. [01:08:55] Speaker A: Right, so, so of course you could, you could define a, maybe a trivial metric or something where it's zero or something on that. And then, and then the models are right. And so, so certainly you could do that. But then, but then you could argue, well, that was what was sufficient for your question. So I mean, you know, I can't, I can't tell. I can't tell scientists like what is a good question or not. That's their judgment of course, but, but instead. Yes. So the, the idea is that you define a metric that you want to score model goodness on. [01:09:29] Speaker B: Let's say, let's say that metric is. Well, you mentioned efficient coding or sparsity. Right. You could just do it on sparsity. You could do it on individual spikes in a population. You could do it on, I don't know, astrocyte calcium signaling. I, yeah, whatever you want to do. [01:09:47] Speaker A: It's meant to be extensible, right? Meant to encounter the broad range of applicability and diversity of questions that we have in the brain sciences. Naturally, because the brain is complex. I can tell you though in practice what I do and why in terms of the metrics that I choose, I think for a lot of settings. So I want to be clear envision. I think we have had the luxury of such advances to get us to models that were really good. The most common benchmark was for example hvm, which is the Gaiman's et al. One that people push on those initial one that Brainscore used. Brainscore now uses a lot more other vision benchmarks too as part of it. But for a while, when we were first working with HVM, which is in 2016, 2017, impetus for coming up with this animal to animal measure was that according to the statistical noise ceiling, the models were explaining 50% or 60% of that, and we're like, whoa, clearly there must be advances in vision needed to beat this very simple visual behavior of an animal staring at a stimulus and doing nothing else with it, the first 150 milliseconds of visual processing. And it turned out that actually, when you looked at the animals to each other, and this was actually ended up being a supplement in the conver and in paper, and then we made it one of the main figures in the NeuroAI thing, which is because even the authors of the NeuroAI Turing test have just relegated this thing to supplement for some reason. I mean, not intentionally, because there's other focuses. It was just. Yeah, on HVM, it wasn't 60%, it was actually 90%. So in other words, it's just to say that, like, now that's not to say we've solved object recognition. We're saying on this particular data set, this benchmark that people have been trying. I don't know if you would really want to go and invest major money to do advances in vision to push on HVM in particular, you might instead be motivated and say, okay, one of two things, I'm done with the question. Because this benchmark was the thing that mattered to me most, and I think the vision models are good enough for my purposes. Or two, because of this saturation, you'd be more motivated to say, I'm going to go collect more data, higher variation data, or to really push on object recognition and again, set my ceiling to the animal to animal consistency there. For example, we did a bunch of extrapolations of the NeuroAI Turing Test on the HVM data, and we found that actually having more conditions and having more neurons didn't start to improve that ceiling. I mean, it was starting to saturate at some point, but it's within the realm of collecting. It motivates that you should actually go and collect it, do it, do something more concrete and collect a new experiment there, too, as another possible viable route. But it does rule out as a viable route, I think, to continue to push on. For example, hvm, where you may have thought the gap was much bigger, but it's actually much smaller than you, than you imagined. [01:12:52] Speaker B: What is a representation? When you use the term representation, what do you mean? [01:12:59] Speaker A: I mostly. I actually 100% of the time mean, like, the population activity. [01:13:05] Speaker B: Yeah, it's weird because that's all. I mean, yeah, I think that that's the common usage in neuroscience. But then, of course, there's the philosophy of mind way of using it. Cognitive sciences. So it's kind of a slippery term, and it's kind of deflated in that sense, in that it just means the activity of whatever you're studying. [01:13:24] Speaker A: Yeah, I, I, so I, I've had a hard time, like, understanding. And maybe you could, you could tell me about this, like, understanding the arguments in cog. In like, cognitive science that like, aren't population activity and like, why it's a more nuanced term. But I, I literally meant it as a vector of, of population responses. Firing rate, binned firing rate. So very precisely, it's a very precise notion. I mean, I guess, I guess from my understanding in the cognitive sciences, it's like, it's like, does that vector necessarily contain all the semantically meaningful stuff that we're assigning to a behavior? If I, if I, like, I think, I think that's what. [01:14:06] Speaker B: Well, I mean, you know, I don't like to do the etymology thing or tear apart words, but re present. Represent is re present. And in, in the. I think it is the idea of, like, okay, well, this is presented in my mind and it is attached to the thing in the world, and it's like somehow a copy of that it's representing in my mind, which is very different than just a measurement of activity in some brain region. [01:14:35] Speaker A: Yeah, yeah. Oh, and, and I mean, I think we can, I mean, we can see that, like, you know, if you go to visual cortex, there are some inference copies from other areas representing other things that you can decode. [01:14:47] Speaker B: That's the thing is you can decode a lot from a lot of different brain areas. But is that meaningful? Is it causal? Is it, is it correlational? Is it a representation? So there's a lot of talk these days about being more careful with the term representation, but the way that you use it, you don't have to be careful with it. Maybe we just need a different term because it just means the activity. [01:15:09] Speaker A: Yeah, I literally just mean the activity. So for me, that's always there. The question of, like, the semantic representation or something like that, if you want to call it a slightly different term or maybe a completely different term. [01:15:19] Speaker B: I mean, like a daring meaning, meaning. [01:15:21] Speaker A: Meaning to it. And like, oh, this, this part of the. It kind of like emerges from this, like, maybe pre neuroscience view of, like, pre modern neuroscience view of like, there's this one function that you can assign to this one cluster of cells in the brain, and that's where it is. And I think that's by and large Probably a very toy view of the brain because it's quite distributed in ways that we don't expect. And I think the only way to really get at those kinds of questions is not to go in assuming that there is a very localized function. I mean in some cases there are, but not in a lot of cases. And then do those kinds of quantitative comparisons between these. That's why to engage with whole brain data, you want to go in this embodied agent direction because ultimately these things do interact and do influence each other. And you want to test whether that hypothesis of the interaction between the modules is a good one. And the only way you can do that is to have the modules, to have them interact and then do this kind of neuro AI Turing test on top, just to kind of combine all these ideas together at the level of population activity to begin with, to really assess does this vector, this of population activity contain that semantic content? And I think. And you can answer that in a quantifiable way, yes, no or 0.65. Right. It's not 1, 1.0 or 0, basically. [01:16:37] Speaker B: It's interesting though that you. Okay, so you just said something that I very much jibe with and appreciate that back in the phrenology days, right. We said brain area X does function Y. Right. And you said that that's not necessarily the case. It's probably not the case. It's very distributed. However, the thing that you're wanting to build is made up of modules that do functions and then have to talk to each other. How do we reconcile those things? [01:17:04] Speaker A: That's an excellent, excellent question. So when I say function, right? I'm not saying do the Jennifer Aniston recognition or anything like that. It's a more general purpose. It's such a general loss function anyway that it's actually highly nonspecific. Yes, there is some localization, but it's like for example, for self supervised learning, like next token prediction or a world model, it's trying to figure out the dynamics from the current state to the next state. That's a very general thing. It's not as specific as maybe in the phrenology days where it's like, oh, it's this specific thing like recognizing a particular person or places alone or things like that. I'm not saying that the brain doesn't have it. I think that maybe through this optimization of a general purpose high level goal, you can learn internal representations that do have more specific content to them. So I think that's entirely possible. And we know that, right? Like face patches Emerge, that sort of thing. So it's just that with the, it's just that the kind of top down guides to build those high level modules, those large scale modules are a lot more general purpose than specific. [01:18:17] Speaker B: Okay, fair enough. Aron, what's holding you back these days? Two questions. What are you excited about? And then what's in your way? [01:18:28] Speaker A: That's a good question. So I mean what I'm excited about is ultimately like even beyond you know, specific scientific questions we can ask. Like we are entering an era where we're just building more capable systems and even modern agents that we're building today with LLMs, they start to have cognitive components to them. People are starting to realize, oh, you need a memory, things like that and compositionality, modularity. So that's good. And it's consistent with how the breakthroughs needed in AI to get to the next generation have also led to better, better overlaps with the brain, partly because there's been fewer solutions to get there. And so like, and since the brain has already reached that it's, there's a high probability of that overlap. So that's the contravariance principle or in AI, the Platonic representation hypothesis that we've referenced a couple times now. And so what I'm excited about is that we're entering an era of more and more capable systems. And I do feel that like it's, it's I, what used to maybe be a sci fi dream or even a dream, it was still, it felt a little bit within reach but still a faraway dream of 50 years back when I started with neural networks about to. [01:19:43] Speaker B: Say it's all going to happen within five years. [01:19:47] Speaker A: No, no, I'm not about to say give, give a timeline but I do think that it's, it's a lot sooner than maybe or even a capable. So like I don't mean general intelligence, I just mean even like a weakly capable AI system that can do tasks for autonomously for 24 hours. That would already have like a huge economic impact. And already LLMs have changed education, right? Like you know, we make our tests like in class and like on paper because like so that you know, students are really tested on what they actually know. [01:20:20] Speaker B: I was just questioning myself just yesterday walking in the windy sunshine in Pittsburgh here and I was thinking like am I, am I learning faster now that I'm using ChatGPT to find things out? Or is it slowing it down? Or like how is it affecting me? Anyway, it's different. [01:20:38] Speaker A: I was, I was thinking that too. And like you Know, in some ways am I like engaging as critically as I used to, but at the same time, does it outweigh the amount by which I can quickly learn a new topic? [01:20:48] Speaker B: Right. It seems, it seems quite efficient to me. I think overall for myself it's been a real benefit. [01:20:54] Speaker A: That's right. And like, yeah, like, yeah, I know there, there are problems with hallucination, et cetera, but it's gotten much better. One and two, like even at the stage it's at, it's like hugely impacted or like I haven't met a person that doesn't fully use it or use it to some extent even when they were skeptical initially. Like, right. It started, it started to become like Google Search. Right. [01:21:13] Speaker B: Like basically has taken that place. [01:21:15] Speaker A: Yeah, has taken that place. Right. And so that's, so I'm just saying like even like a not AGI at all AI system that's just useful and capable, which I think is the target of a lot of AI companies today, is going to impact things in ways that maybe we can't always foresee. So that's what I'm excited about. It's like at least entering an era where that sort of starts to feel like sci fi a little bit like having a little assistant that actually you can hold a reasonably productive conversation with is cool. So that's, that's what I'm excited about. What I'm, I guess held back by is, is I think it continues to be like as always in science ideas, in other words, like, I don't think it's so much compute actually. Like, I think as an academic you try to be more, in fact, the lack of compute though, it's quite fine actually for our purposes. Is. Drives you to be more creative and I think that's the fun part. But it's also that like, you know, making sure that, you know, what if, you know, we're in the wrong paradigm or something in some way. Right. And I think I'm always open to that. [01:22:25] Speaker B: What percentage do you put on the likelihood that we're in the wrong paradigm? [01:22:33] Speaker A: Well, probably, probably 10%. [01:22:38] Speaker B: Oh, you think we're in the right paradigm? And this is a weird thing to say, like there is a correct paradigm because there's. I don't think there is, but we're in one right now. [01:22:47] Speaker A: We're in one that I think is very productive and empirically so and more so than other prior approaches. So I think that speaks volumes. I don't, I think there's limitations to, to it too. So, you know, maybe the ultimate Paradigm that people use might be very different. And I think that's totally normal within scientific progress. We just, it makes sense to push as hard as you can on the existing thing while it's bearing fruit and see how far you can take it. And then when it stops doing that, you have like very like reasonable next steps of what to take rather than kind of completely jumping ship immediately. And that's sort of my, I guess my, my style is push hard on the things, you know, really steel man it. And then because it's failed, you know, you are the advocate of the thing now you're seeing it's no longer empirically giving you gains, move on to the next thing and at least you know where, where, where it no longer what problems it's falling short on to generate new hypotheses. But and I also do think that like, you know, if the ultimate goal is like is, is the brain at least at a very detailed like molecular level too, that's helpful for disease, not just at this kind of algorithmic level that we're talking about. That's more hardware agnostic. That's often talked about a neuroai. I think that can take a lot longer. I think that'll take. That will probably happen before we have these very competent algorithms that occur. [01:24:09] Speaker B: Wait, especially before. Wait, what? So sorry. We will have better disease treatment before we have. [01:24:18] Speaker A: No, no. [01:24:18] Speaker B: Oh, okay. [01:24:19] Speaker A: After, after. And I think, I think, I think that's where AI for science can be helpful. I think actually that like by having these better agents that can also accelerate the types of. Because biology is enormously complex and the brain as a part of that is itself enormously complex, especially as you go down to beyond the algorithmic level to the synaptic level and the neurotransmitter level. And I think that what will ultimately aid that discovery are systems that can really process lots of data and not be tied to particular simple stories, which is what biology has suffered from to some extent and really help maximize and find new information to generate those hypothesis. I think in other words, that's why I particularly focus on the algorithmic level is because I think that there's a lot of room to improve there and I think we can get there much more quickly. But then that in turn has benefit for studying the brain at a more detailed level for disease, that sort of thing and biology more broadly, science more broadly down the line. [01:25:24] Speaker B: Jim DeCarlo, you know we mentioned him earlier in his reverse engineering approach, thinks that when you can predict, when you engineer something, you build it and you can make predictions. That basically is understanding. And what you were just saying about that we need, you know, these agents to handle lots of data. There is a. We suffer. We're simple. We need simple, short, low complexity sentences, symbolic things to hang on to, to say that we understand something. So are we losing understanding there or do you agree with Jim that it's. That's what understanding is? [01:26:05] Speaker A: Yeah. So, you know, I think that ultimately we will always need our AI systems to communicate simple things to us. It's just that, just like, you know, just like how in neuro AI we talk about three things, the task architecture and learning rule. Like we're not you and I don't transmit the weights of the neural network to each other. Right. Like in terms of our understanding, we say, look, this pattern of these three things explains this brain data or predicts this neural activity of like thousands of neurons to hundreds of conditions. So that, so, so there's always going to be a higher order language by which humans talk to each other and form models of the world. And they'll just be informed by AI systems that don't make those simplifications but still communicate to us in that context. I don't think there's any way around like us as a species using these things. It'll have to be in that kind of language. But I would argue that NeuroAI already does that because it summarizes those three things in the particular context of Neuro AI. And one of the reasons why. So I do agree definitely with Jim about the prediction thing. So related to the Turing testing, the types of metrics I usually use, especially in settings where we have less knowledge of brain areas like where we're making progress, it's much better to have a predictive model. We basically start with zero predictive models of the system to get to one better predictive model. Already under. Linear prediction is already a huge advance and leads to control, optogenetic control, that sort of thing that we were talking that Jim is known for. And then down the line, and this is kind of where the AI agent thing comes in. We want to make more finer grade distinctions for particular questions. Like we're no longer in the dark ages of that brain area where we want to push on linear predictivity. We then want to ask very specific questions about maybe neurotransmitters and chemicals there to aid in a bmi, for example. So if you're going to have a brain machine interface, then you're going to have to care about the particular physiology of that individual. Not just the average. So that's where, again, like when we come to disease and other things, you're going to want to ask those more finer grained questions, but you're going to build on the most linear, really predictive model to start with and then iterate it for that particular question. [01:28:17] Speaker B: But you do still have. So, coming back to the neuro AI Turing test again, because I wanted to ask you this earlier because we were talking about how you can measure any metric that you want, any representation that you want, as long as you adhere to the theoretical premise of the test. But it seems that there's a lot built into what you decide to measure, what you decide is the right metric. And so there's judgment, I think, that could be had on that. Right. I could dismiss a model that measures only oscillations, for example, or that would be one that a lot of people would agree with me with, and a lot of people would disagree. Oscillations. I just wanted to throw something out there that some people think is epiphenomenal and doesn't matter, and some think people think that it's causal. I think it's both. That doesn't matter. But. So if I decided to measure beta synchrony and that's my measure, then someone would say, could say, well, that's not even worth paying attention to, even though it passes the neuro AI Turing test. So how do I know what metric I should measure? [01:29:28] Speaker A: No, that's an excellent question. And there's two things. I think if your goal is that particular question, then it makes sense to discard the other stuff and focus on that for now. [01:29:39] Speaker B: Then I could say, then a lot of people would say, well, that's a worthless goal. [01:29:43] Speaker A: Totally. Totally. But that's what scientists debate all the time about. I mean, it's no different than anything we've already been doing. The main thing, though, is if our ultimate goal, though, as a field is to have a consistent, complete theory of brain function across scales, and we don't have that today, but maybe in the future, then I think we want it to agree on as many metrics that we all agree on as a field are valuable. And anything that the brain has, like if that's your goal, is ultimately a complete. We want it to be a consistent. We want it to be passing neuro AI multiple neuro AI Turing tests, rather than just one across all of those benchmarks. [01:30:19] Speaker B: Okay. The other related question that I wanted to ask is given multiple realizability and degeneracy in the way that populations can transform signals to then enact some action if the representations don't align. Is that really a problem? Can't I get to the same location by taking a different route? And as long as I'm getting to that location, it doesn't really matter what my internal representations are doing if I'm achieving the task. [01:30:55] Speaker A: Yeah, that's right. And you could make that argument. One thing that's just interesting is that you end up like do the contravariance, basically the Platonic representation. You just end up. Even if you were like, hey, I'm just purely an AI person, I don't really care about matching the brain. It just turns out that your advances lead to better brain models too. And like this was the case in vision even with SSL objectives. Like not only were we better models of or with vision, it was like just high variation task was better models of primary visual cortex that is better ssl. [01:31:32] Speaker B: It is astonishing and so fucking cool that. [01:31:35] Speaker A: Yeah, and it's like it didn't have to. I mean the controversy principle is kind of explaining maybe why that might be. But it's interesting that same with language and transformers, right? Like the GPT based models are the best models, predictive models so far of human language areas. And when SSL objectives came out that were better and we were part of that, you also got much better models of mouse visual cortex because it gave you a more general purpose thing for the constraints of a smaller cortex and lower visual acuity it needed. So in other words, it was like all of these advances in AI, these fundamental advances that we need to get to this ultimate goal of an open ended autonomous agent, basically. Right. I mean that is what AGI is really. And all of those to get there have led to much better theories of the internals than prior theories that came before it. So I would say that like if there is a science, what's lurking? What does that tell you? Well, that tells you that there's a kind of lurking underneath at a science of intelligence that unifies like neuroscience's goals, cognitive science and AI, where it is about really building a kind of hardware agnostic theory of intelligence, that it's about optimization under different constraints. And that's how these different brains, these different brain areas relate to one another across this type of spectrum. [01:32:57] Speaker B: All right, Aron, thank you for letting me take you on lots of divergent wandering paths here. Is there anything else that we missed that you wanted to discuss or highlight or that you're excited about or fearful of? [01:33:09] Speaker A: Oh, I can, I can mention maybe, maybe for Like a couple minutes, the, a little bit of the AI safety stuff. [01:33:16] Speaker B: Sure. Right. We've had Steve Burns on the podcast of a big AI safety person. [01:33:21] Speaker A: That's awesome. Yeah. So I mentioned, you know, the main goal is building better autonomous lifelong learning agents and using that to try to engage with whole brain data. But the other, the other aspect of it is what happens once we get there. And I think that's also another place where being an academic actually makes a lot of sense because we don't really have a good science right now of alignment, of making sure that these AI systems are aligned with human values and preferences. And you mentioned programming and goals and how that leads to unexpected behavior, the reward hacking. And that's a very common thing. And we want, we want to try to avoid features of that as we build more and more capable systems because even a weakly capable system will have consequences both good and bad. And we want to try to mitigate those, those things as much possible. Now to be clear, I'm not a, I'm not a doomer or anything like that. Like, I don't, I do think that genuinely that I think humans have a higher risk of harming one another than any AI system, especially we have today. But that doesn't mean that, that they can cause some harm. And at the end of the day, it's a technology we're building. So we, you know, we have to. Yeah, there's a nice quote by Dylan Hatfield Mill, who's at mit, it works on AI safety saying that look like when you're a bridge builder, like a civil engineer, you also care about bridge safety. Right. So it's like that. I think it's like a part of, it should be a natural part of if you're building these systems, to think about that. And I think that's one thing where academics can help because like right now a lot of the alignment stuff is very high level frameworks like policies. They're not really policies, they're just like discussions. They're not precise. And we also don't have any guarantees of like, okay, say hypothetically we, we do achieve agents that are autonomous and capable. What then? Like, what are the guarantees there? What's hard and what's. And so I think that's where actually like my math background maybe from earlier comes in, is like you can start to prove theorems about rational, capable, fully capable agents that don't misalign for trivial reasons by failing. But literally they're ideal, they're computationally unbounded, even if there's things that are hard for them, then it's going to be something we should avoid in practice. So one work that we had recently, let's go like barriers and pathways to alignment that's under review is showing that some of the first complexity theoretic barriers to alignment to the alignment problem. So if you have basically in a nutshell is showing that if you have too many distinct tasks or too many agents, there's always going to be problems where the number of bits they have to exchange to provably reach alignment is going to be too large basically. And so in other words, what you ideally want to align are you want to choose your tasks and agents wisely. In other words, you don't want to necessarily. It's not a question of like if they'll misalign, it's when there's always going to be tasks that even if they were incentivized to align, they will misalign. So we have to really be careful about the ones that we and agents that we want the tasks and agents that we want this alignment for. A corollary of this theoretical result is saying that some people talk about brain computer interfaces as solving the alignment problem. So in other words, like Elon Musk for example in neural link, right. That's the only way we'll merged with the AI. And obviously in practice the issue is with that is that our brains are constrained. And so I don't think that that's naturally going to be a band aid to solve the alignment problem. But the other one is that because these theoretical results is that even if our brains were unconstrained, we were these perfectly rational capable agents that were computationally unbounded, there would be these. If we have too many distinct tasks and agents like imagine all our BMIs or BCIs are connected by Bluetooth that that you wouldn't the number of bits you have to exchange would just be too large anyway that that you couldn't guarantee it. So still this so one like bcis won't solve the alignment problem and to choose your tasks and agents wisely. You know, the task of making a sandwich and getting your agent aligned with you is far less whether when it causes misalignment is far, far less harmful than running a nuclear power plant. [01:37:38] Speaker B: All right, let's go back to the if you have another minute or two. [01:37:41] Speaker A: Yeah, I do. [01:37:41] Speaker B: All right, so I'm a child. I'm going to try to illustrate this through a stupid story. Alright, so I'm a child and I build my first bridge by putting a piece of tree trunk over the creek and I try to walk over it and it breaks. And in that case, I didn't plan everything and say, all right, I'm going to build this bridge, but I have to care about the safety. I just made it and I'm still around. And then I made a better bridge the next time, and then even a better bridge, and then eventually started thinking about safety. And the point of this stupid little story, I apologize, is I think throughout human history, human history is not replete with examples of we're going to plan for the safety. Human history is we're just going to move forward and make it, and then the safety comes later. So why is this an exception? [01:38:39] Speaker A: This is an excellent question. So one of the reasons why I think the theoretical study in this case is warranted because it's actually faster to prove a theorem than to run it. We don't have AGI yet, and we probably shouldn't anyway, if we were. The reason for this is that the fundamental difference between AI, this technology that we're building, and prior technology is that now you're starting to build a technology that takes in inputs and intentionally produces actions. It's an agent. And so as a result, it's not. We're not talking about the type. I mean, people. It's not pass. So it's not like, yes, it's not passive. It's. It's not like, for example, you're. I was in an airport and the elevator broke. And like, that's an inconvenience. And you know, it's. What do people do when they have disability? Right? Like it causes, that causes issues. Yeah, but I'm not talking about machine failures here. I'm talking about like, you know, things that are unique to AI. So like, people in AI safety study that right now. They study hallucinations and current LLM systems and so forth, white box attacks, et cetera. And that's, that's great and really useful for today. I guess what I'm talking about here is like, suppose we, we fix the, the functional problems. Not, I'm not talking about misalignment with like, failure modes there, but we want to avoid the situation where you've now built a capable agent that, that, that is out there. It might in the short term seem like it's agreeing with you and like helping you out, but it's like either spreading lots of misinformation either for its own end or otherwise, and that ultimately that leads to catastrophe in one way or another. And I'm not actually like I don't think that's tomorrow or anything, but I think that as we're building more and more capable AI systems, this question starts to become more relevant. And I think we can go far beyond sketches of discussions that are not as precise to really prove guarantees of like, okay, look, if this is actually hard for a very capable system to align with it, then we have to avoid it in practice. And furthermore, and this is something I'm working on now, is how can we build better incentives so beyond RLHF reinforcement learning with human feedback, which is the way we right now align these LLMs with human values. Can we go beyond that and design incentives with theoretical guarantees that prevent this scenario that I'm talking about and then really implementing current systems today. So it does speak to systems today, but has guarantees in the systems of tomorrow. [01:41:10] Speaker B: All right, cool. Last thing before I say goodbye. I happened upon. We were trying to figure out what movie to watch with my son the other day, and one of the ones I thought was like, Matrix would be a good one. And then anyway, I happened upon like a little. The little clip where they're talking about what the Matrix is, right. And the. What happened with humanity and stuff. And I remember, I'm old enough to remember when this movie came out and it was super cool and everyone loves the movie same. [01:41:39] Speaker A: I remember going to theaters. [01:41:41] Speaker B: Oh, really? [01:41:41] Speaker A: To watch it? Oh, yeah, yeah. [01:41:42] Speaker B: But I watched this thing. I thought, this is so dumb. It's, it's. It looks so cartoonish. Not looks, but it, the premise is so cartoonish and ridiculous. It made me feel better that I've learned something and now it looks. It's not like, oh, it's Matrix. That's awesome. So, I don't know, how do you, how do you, how do you feel about the Matrix in retrospect? [01:42:03] Speaker A: In retrospect, I think it's. It's obviously very exaggerated. I don't think anyone's going to get there or anything. [01:42:08] Speaker B: But human bioelectric. [01:42:11] Speaker A: Yeah. Very creative. I think it's. I mean, the, the first Matrix, hands down, one of the best movies. Everything else after was, you know, especially. Did you see the fourth one? [01:42:22] Speaker B: No. [01:42:23] Speaker A: Good. [01:42:24] Speaker B: Okay. We can leave it at that then. [01:42:26] Speaker A: Yeah, good. [01:42:27] Speaker B: All right. Anyway. All right, Aron, I've got to actually go to work in a few minutes and I suppose you do, too. Thank you so much for your time and I'll see you around campus, I hope. [01:42:37] Speaker A: Yeah, yeah. Thank you so much. This is wonderful and a great pleasure and honor to be here. [01:42:46] Speaker B: Brain inspired is powered by the Transmitter, an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advanced research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives written by journalists and scientists. If you value Brain Inspired, support it through Patreon to access full length episodes, join our Discord community and even influence who I invite to the podcast. Go to BrainInspired Co to learn more. The music you're hearing is Little Wing performed by Kyle Donovan. Thank you for your support. See you next time.

Other Episodes

Episode 0

September 13, 2022 01:37:08
Episode Cover

BI 147 Noah Hutton: In Silico

Check out my free video series about what's missing in AI and Neuroscience Support the show to get full episodes and join the Discord...

Listen

Episode 0

October 12, 2021 01:31:20
Episode Cover

BI 116 Michael W. Cole: Empirical Neural Networks

Support the show to get full episodes and join the Discord community. Mike and I discuss his modeling approach to study cognition. Many people...

Listen

Episode 0

January 08, 2021 01:19:13
Episode Cover

BI 094 Alison Gopnik: Child-Inspired AI

Alison and I discuss her work to accelerate learning and thus improve AI by studying how children learn, as Alan Turing suggested in his...

Listen