BI 193 Kim Stachenfeld: Enhancing Neuroscience and AI

September 11, 2024 01:32:41
BI 193 Kim Stachenfeld: Enhancing Neuroscience and AI
Brain Inspired
BI 193 Kim Stachenfeld: Enhancing Neuroscience and AI

Sep 11 2024 | 01:32:41

/

Show Notes

Support the show to get full episodes and join the Discord community.

The Transmitter is an online publication that aims to deliver useful information, insights and tools to build bridges across neuroscience and advance research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives, written by journalists and scientists. 

Read more about our partnership.

Check out this story:  Monkeys build mental maps to navigate new tasks

Sign up for “Brain Inspired” email alerts to be notified every time a new “Brain Inspired” episode is released.

To explore more neuroscience news and perspectives, visit thetransmitter.org.

Kim Stachenfeld embodies the original core focus of this podcast, the exploration of the intersection between neuroscience and AI, now commonly known as Neuro-AI. That's because she walks both lines. Kim is a Senior Research Scientist at Google DeepMind, the AI company that sprang from neuroscience principles, and also does research at the Center for Theoretical Neuroscience at Columbia University. She's been using her expertise in modeling, and reinforcement learning, and cognitive maps, for example, to help understand brains and to help improve AI. I've been wanting to have her on for a long time to get her broad perspective on AI and neuroscience.

We discuss the relative roles of industry and academia in pursuing various objectives related to understanding and building cognitive entities

She's studied the hippocampus in her research on reinforcement learning and cognitive maps, so we discuss what the heck the hippocampus does since it seems to implicated in so many functions, and how she thinks of reinforcement learning these days.

Most recently Kim at Deepmind has focused on more practical engineering questions, using deep learning models to predict things like chaotic turbulent flows, and even to help design things like bridges and airplanes. And we don't get into the specifics of that work, but, given that I just spoke with Damian Kelty-Stephen, who thinks of brains partially as turbulent cascades, Kim and I discuss how her work on modeling turbulence has shaped her thoughts about brains.

Check out the transcript, provided by The Transmitter.

0:00 - Intro 4:31 - Deepmind's original and current vision 9:53 - AI as tools and models 12:53 - Has AI hindered neuroscience? 17:05 - Deepmind vs academic work balance 20:47 - Is industry better suited to understand brains? 24?42 - Trajectory of Deepmind 27:41 - Kim's trajectory 33:35 - Is the brain a ML entity? 36:12 - Hippocampus 44:12 - Reinforcement learning 51:32 - What does neuroscience need more and less of? 1:02:53 - Neuroscience in a weird place? 1:06:41 - How Kim's questions have changed 1:16:31 - Intelligence and LLMs 1:25:34 - Challenges

View Full Transcript

Episode Transcript

[00:00:03] Speaker A: Not only is neuroscience inspired AI not really, like, super what's going on, like, science inspired AI is just, like, still happening and in lots of different areas. But it's what neural networks do is they have several stages of processing, and at each one, it's a re representation of their input. So this question of what information they represent at each layer, the neural network gets to figure that out on its own. It's not like you fix it the way we did, but the question of what it does choose to represent has a big effect on what it can do downstream. Neuroscience just has this tremendous variety and diversity and eccentricity and stuff, and I just love that. [00:00:52] Speaker B: This is brain inspired, powered by the transmitter. Hey, everyone, it's Paul. You may have just caught that brain inspired is now powered by the transmitter. That's right. I'm excited to announce a major milestone here. Brain inspired is now a proud partner of the transmitter. So for those of you who haven't heard of it, the transmitter is an online publication that provides information, insights, and tools to help neuroscientists at all career stages stay current and build connections. It's funded by the Simons foundation, but it is editorially independent. I am delighted to join their team. So what does this mean? Well, brain inspired will stay the same, but I'll be contributing to and collaborating on various new projects in line with the transmitter's mission to spread the word about neuroscience. In fact, you can find this and future episodes on their website, thetransmitter.org, where you can also easily sign up for email alerts every time a new brain inspired episode is released. Actually, if you visit their newsletter page, which I'll link to in the show notes, you can customize what kinds of neuroscience news and topics and columns that you'll receive in your inbox. And trust me, there's a wide variety of options there. Moving forward, I'll also point to stories that grab my attention. For example, recently I read a summary of a compelling study from Merdad Jazieri's lab about how monkeys brains build mental cognitive maps and use those maps to imagine things that they've never seen. I'll link to that in the show notes as well. So this is really an exciting new partnership between brain inspired and the transmitter, and I'm grateful for their support. All right, now today's episode, Kim Stackenfeld embodies the original core focus of this podcast, the exploration of the intersection between neuroscience and AI, now commonly known as neuro AI. That is because Kim walks both lines. She's a senior research scientist at Google, DeepMind, the AI company that sprang from neuroscience principles. And she also does research at the center for Theoretical Neuroscience at Columbia University. She has been using her expertise in modeling and reinforcement learning and cognitive maps, for example, to help understand brainstor and to help improve AI. So I've been wanting to have her on for a long time to get her broad perspective on AI and neuroscience. One of the things that we discuss is the relative roles of industry and academia in pursuing various objectives related to understanding and building cognitive entities. She has studied the hippocampus in her research on reinforcement learning in cognitive maps. So we discuss what the heck the hippocampus does, since it seems to be implicated in so many functions and how she thinks of reinforcement learning these days. Most recently, Kim at DeepMind has focused on more practical engineering questions, using deep learning models to predict things like chaotic, turbulent flows and even to help design things like bridges and airplanes. And we don't get into the specifics of that work. So I will link to it in the show notes. But given that I just spoke with Damian Kelty, Stephen, who thinks of brains partially as turbulent cascades, Kim and I discuss how her work on modeling turbulence has shaped her thoughts about brains. Okay, so that was a lot, I know, but you can find all the details in the show notes at Braininspired co podcast 193. Again, thank you to the transmitter. This is really exciting for me, and thank you to my Patreon supporters for your continued support. Okay, here's Kim. So I moderated a panel that you were on at cosine. A few. What was it, a month ago, two months ago? I can't keep track of time. And it struck me, you even mentioned at one point that you were saying something in service, not just to be defensive, I think was the quote. And it was about DeepMind. And how so? The original mission of DeepMind was to use what we know about the brains to make better AI. And that has sort of gone off the board. Right? Because these days, it's just scaling up is what AI is all about these days. What I wanted to ask you is, is it fair to say that DeepMind failed? [00:05:18] Speaker A: I mean. [00:05:20] Speaker B: Okay, all right. [00:05:23] Speaker A: I don't think so. I mean, I guess so. I'm not sure. Okay. Yeah, I guess there's a couple things. I mean, one, the original characterization of DeepMind's mission, it was like, it was something a bit more circumspect than that. It was like, solve intelligence. Here are lots of different directions that we think might be viable. Neuroscience is one direction that DeepMind had that maybe it was unique and that other groups didn't. I think, one, the neuroscience research projects were always pretty careful to pick projects that were in a particular zone of impact so that it could be useful to neuroscience and say something useful about neuroscience, and had the possibility to say something useful or relevant to machine learning. So the kinds of questions that we focused on were things that we thought had some application to a particular open problem in machine learning. Continual learning, robotics learning, rules and optimization in general. Structured credit assignment. Like, we kind of picked things that were. That we wanted to learn something about the brain and were also, like, open problems in machine learning. The other thing is that. The other thing that the neuroscientists did at DeepMind. [00:06:40] Speaker B: This is all past tense, I'm noticing. Go ahead. [00:06:43] Speaker A: I mean, a lot has changed in the last year of, like, machine learning. So the neuroscience team has also shifted their focus quite a bit. Or the neuro lab. The other thing is that neuroscientists at DeepMind were like, as Matt Bofinek, who headed the neuro team for a long time put it, were bilingual. So they would work on machine learning problems, publish machine learning. Neurips papers, also work on neuroscience problems. And part of the benefit of this was thought to be a more intellectual or abstract exchange. If you're trained in neuroscience, you just have different ways of thinking about problems. And so, like, a lot of cool stuff came out of the neuroscience team that had really, like, very little to do with neuroscience. Like, a lot of the stuff with graph neural networks that, and using them for simulation. That was a project I worked on for a while. Pretty tenuous connection to anything to do with neuroscience, but, like, it was a pure machine learning project. There was a lot of stuff with, like, concept learning, too, that was a big focus of the neuroscience team. Not directly, like, paired with a neuroscience investigation project, but just, like, something that neuroscientists tend to think about. As you mentioned, this is, like, pretty past tense, and I think the neuroscientists at DeepMind, we've kind of largely been pivoting for a couple of reasons. One is that we're in this, like, as I said in that panel, like, the whole field of machine learning is, like, very much in a scale up mode right now. It's, like, build it bigger, build it better, and just, like, add more data, add more tpus, and try to generate it for longer. This is very much the mode, and this is for some good reasons. There's two papers that I think really encapsulate this nicely. One was a paper from OpenAI. Now, the guys who wrote it are anthropic, but it was on scaling laws and transformers, and they basically showed, you make the model bigger, you add more data, you add more compute, the model will get better, predictably. So if you are looking for a place to turn money into good machine learning models, that's a really reliable knob to turn. That's the kind of signal you want. The other paper was a paper from DeepMind showing that emergent properties start becoming apparent as you scale up a model. So this is almost like the opposite. Rather than just as you add more compute, the model will get better. This will be like. As you add more compute, the model doesn't get better until suddenly it does gets different. Yeah. Yeah. It starts doing different things. Some abilities it used to not have now emerge. So, basically, in both ways that are predictable and not predictable. As you add more compute, the models seem to get. As you add more compute, more scale, the models seem to get better. So it seems like. I mean, it is. I'm a researcher. I'm not an engineer. So, you know, not myself. [00:09:31] Speaker B: Well, you majored in chemical engineering, right? [00:09:34] Speaker A: That's true, but that doesn't actually come up that much in my day to day. [00:09:37] Speaker B: Right. [00:09:38] Speaker A: But, yeah, I guess this is just to say, like, I can see why we're in a period that is more about, like, that, where engineering and increasing scale are having a moment. Rather than, like, let's investigate new methods. Let's try to, like, search broadly and find inspiration from different areas of science and make interdisciplinary connections. Like, not only is neuroscience inspired AI not really, like, super. What's going on? Like, science inspired AI is just, like, still happening and in lots of different areas, but it's. There's a lot to be gained from scale right now. There's a lot to be gained just from making things better. So it's just not really, like, the pendulum has swung a bit. [00:10:20] Speaker B: Oh, no. The pendulum again, this came up in the panel as well, and I made one of these comments like, I stopped everything. I was like, the pendulum. [00:10:29] Speaker A: Well, it's an attractive metaphor because it makes it feel like it's coming back. [00:10:33] Speaker B: Yeah. [00:10:34] Speaker A: The good news for neuroscience, though, is there's just. There's a lot to be done there. Like, the. I think that's the other thing, is there's just, like, a lot of good opportunities for applying data. Like, this is a moment for data driven methods as a tool, but they're tools, right? [00:10:48] Speaker B: They're not necessarily models of the system. And I know, like, people like Jim DeCarlo and, you know, David Cisillo, there's a lot of people using them as models, but, yeah, but what I'm hearing you say is that it's all tools. [00:11:02] Speaker A: I think they're both. I think they're really both. And you can kind of interpolate in between. Like, one aspect is to just use them for tools. Say, like, here's a method for getting patterns out of data that's complicated and hard to understand. The brain and behavior is filled with complicated data that's hard to understand. If we train models, maybe we can predict it well, maybe we can summarize it well, maybe we can decompose it into some form that is a little bit more from which humans can grok patterns. That's one opportunity is just try to summarize, and that's having a moment. Like, there's a lot of methods that people have been working with. Mackenzie Mathis has awesome work, just like summarizing behavior. Ben Selevsky. That summarizing behavior would be like, that's on the tool side, and it really expands what you can do. But I think on the models of the brain side, it's also pretty useful, because even if a model, if you train a model to do something, there's many ways you can structure it, and they kind of trade off interpretability and messiness in a nice way. Like, for instance, if you train the same big, messy black box model with different optimization algorithms, you get different representational properties that emerge. If you have different objectives on a deep neural network, you can say something systematic about what's happening inside of the representations forces us to think a little bit more about what features are happening, what kinds of cognitive operations are happening implicitly in a big, messy, deep neural network. Rather than structuring something as like, this is the module that does search, and this is the module that does reinforcement learning. If you're like, we trained a big model to do all sorts of stuff, and we're still looking for those cognitive functions, but we're trying to pull them out of implicit operations rather than structuring them deliberately. I think it's probably a useful perspective and still counts as model driven research. It's just that the specific variable that we're modulating is related through a messier system. So this has been something that we've been thinking about a lot. If we're thinking about how to use neural networks to comprise models of the brain, there are certain problems that only arise when you have a neural network system or there's certain ways that you can combine structure and expressiveness and get in this sweet spot where you are getting something out of the neural network, but also not completely foregoing any ability to say something structured. [00:13:35] Speaker B: One of the things, and I'm not going to just focus on the panel that I moderated, but I was revisiting it. And one of the things I asked was whether any of the panelists thought that AI has actually hurt neuroscience in any way, and then no one answered. And then you were kind enough to, you couched it in a different answer that someone else asked a question, and you were kind enough to at least address the question that I asked, and you mentioned something about biological plausibility or non plausibility biologically. And I think that's interesting because, and I want you to expand on that a little bit, because I thought one of the lessons here, and this goes to what you were just talking about, was that biological plausibility doesn't matter. And so it was odd to hear you say that non biological, biological. Non biological plausibility, whatever it was, that that was in some way how AI has maybe dampened neuroscience or hurt it. Am I making sense? [00:14:42] Speaker A: It's funny, because I was like. I'm like, my research is not like a paragon of biological plausibility. [00:14:48] Speaker B: Like, I know. [00:14:49] Speaker A: You know? [00:14:50] Speaker B: Well, I know. Yeah. So. And we'll talk because, you know, turbulence. [00:14:54] Speaker A: Yeah. Yeah. I mean, I think the. I guess. I don't know. I mean, I think that there is a level of abstraction that, like, that models bring. Yeah, I say I think. But I guess this is a common framing principle. When thinking about model driven neuroscience is like, all models are wrong, some models are useful, and exactly what you're trying to say about the system and whether or not your claims are justified depends on the level of abstraction you're using. So, for instance, if I say, like, this effect appears to be driven by the statistics of data, and I think that it is indifferent to which particular learning algorithm you used, then I can say, well, we used a biologically implausible one because we wanted to get the system actually working at the level where I could reason about these kind of statistics. We couldn't do that with a biologically plausible one. So we did it that way. Somebody could say, like, well, I don't agree with your claim that this is irrelevant to the biological implementation. Details like that could make a really huge difference. And then I would have said something really misleading. And I think the reason that occurred to me in that panel is because that's just what I worry about a lot is I focus on a particular level of abstraction and try to say things that are justified at that level about how the statistics are having an effect, what kinds of computations are supported by different kinds of representations in a kind of specific implementation agnostic way. But those details could really, like, trickle up and matter, and then I would be saying things that are wrong, which, of course, nobody wants to do. So I think that that's kind of the. I don't think that it's like, I don't know, for instance, that we have been critically led astray by that. That wasn't necessarily my instinct, but it would be. That's the risk, is that, like, you work at this level and you were led astray because it didn't obey some constraints which turn out to really make a big difference. I will say, though, that, like, there's two levels of biological constraints. At some level, the implementation details, but then there's the behavioral level. And one thing is the models that relax the behavioral constraints. The advantage of them is that they have an easier time getting this behavioral level, or at least that's ideally the justifying case to use them. So it's not always totally fair to say they're, like, biologically unrealistic, because they are pinning to one aspect that, you know, the biological system is doing, which is doing a good job at some naturalistic behavior. So I'm not, like, the first to make that point, but I think that is like, they kind of. It's almost like they pin to different levels of biological plausibility, and no model gets both of them, rather than saying, like, we don't care about the biology at all. [00:17:45] Speaker B: So you're like, I don't know. Are you half Columbia, half DeepMind? What is the fraction? [00:17:50] Speaker A: So I am officially, yeah, basically, I'm there one day. I'm at Columbia one day a week, officially. I sometimes go up on Fridays, too, because Google is hybrid, but I'm usually there Mondays. And, yeah, my official appointment is adjunct associate professor, so it has, like, a lot of asterisks next to the professor. But, yeah, I mean, I'm up there some fraction of the time, and then I'm at DeepMind the rest of the time. So I do a little academics, little industry stuff. [00:18:20] Speaker B: The reason why I asked that is because I'm curious, a, what's more fun, and b, how much time do you spend these days thinking about neuroscience versus applications like modeling, turbulence, and design? [00:18:34] Speaker A: Yeah, I mean, they're both tremendously fun. That's a boring answer, but, yeah, they're both really fun. They're fun in different ways. I do the Columbia stuff, at least the way I have it structured, because I'm part time, is more advising on projects that's fun because there can be tremendous variety because you can work on more stuff when it's students or postdocs who are driving the projects. Those projects also tend to mostly be more neuroscience focused or theory focused. Some of the projects are more, how does the brain do this? Or here's a neural network. It's doing something kind of like the brain. Does that actually compare? Some are more actually analyzing data that's come out of other labs, and some one in particular is more, it's kind of a machine learning project, but it's a more theoretical and conceptual one. In general, the kinds of stuff that is maybe a bit easier to do in academia, not just possible to do in either place, but actually easier in academia, is things that are with smaller models and that are maybe a little bit more conceptual. Because Google just, like toy. [00:19:49] Speaker B: Like toy problems. [00:19:50] Speaker A: Yeah, kind of toy. Yeah, I think. Yeah, exactly. Things that are a bit more toy or don't at least don't require you to train a gigantic model. You can do stuff with gigantic models that somebody else trained, but, like, you don't want to necessarily be in the business of training a huge model that's not easier to do in academia, at least the kinds of projects that are fun to do at Google are largely things that benefit more from scale, that have some kind of, like, you can use a really big model, you can train a big model to do a different thing. You can really experiment with some of the more conceptual stuff at large scales. This is coming up in the context of. I've been working recently on some projects with, with memory, with retrieval, augmented generation, and there are some neuroscience versions of this. We think about hippocampal contributions to learning all the time as, like, what is memory adding to a process and maybe making hypotheses about when you should see different kinds of hippocampal activity, different levels of activity, different styles of activity. That's a neuroscience question, and you can often ask that with somewhat toy models, you can get abstract versions of that problem that you can look at in similar systems, whereas at DeepMind, you can play around with a large language model, retrieving from Wikipedia, and that's not impossible to look at in academia, but it's harder, and there's infrastructure, and it's nice to be able to play around with the same kind of concepts, but at a scale where it's naturalistic and complex. [00:21:28] Speaker B: I remember I was riding in the car with my father. I think I was in graduate school, actually, for neuroscience, and he asked me or suggested that perhaps industry is better suited to, quote unquote. He didn't use the term solve intelligence, but, you know, to understand brains and stuff. I mean, what is your perspective on this? And I think he made a very good point. Of course, it hurt a little bit because I was in graduate school, but he was like an IBM guy, and I think he made a pretty good point. And I'm curious where you fall on that. Should industry just solve neuro? [00:22:08] Speaker A: That's funny. I would be curious to ask you more about what your dad's pros and cons were for academia and industry. [00:22:15] Speaker B: Well, okay. I can tell you real quick that basically just that we academics are super slow and there isn't like a bottom line to get done. Right. Because it's an endless search. Right. So in industry, there's a thing that you are setting out to accomplish, and I think that's part of it. And academics are just slow. [00:22:37] Speaker A: Yeah, I mean, academics, I guess, like researchers in general, are pretty motivated. So I think the, like, if it's moving, I think it's more like an industry. Because it's not like people are pulling longer hours in industry because they're getting paid more. [00:22:57] Speaker B: No. And they're happier. They go home earlier and they're happier, it seems. [00:23:01] Speaker A: Yeah. I think the difference is like the. Because of this sort of like, goal collapse, in a way. What does that mean? That there's a financial bottom line or that there's like an incentive that's sort of aligning people, people working on the same thing and you have things that are broken into projects. I basically think industry. Yeah, I mean, I'm not an economist, so whatever. Take this with some grain of salt, but my sense of it is that industry, if there's a. If industry is well poised to work on a problem, if a problem does align with the goals of a company, it's great to have it, to have that happen. Like there is an alignment of people who are getting funding and they're going to work on the problem and they'll break it down into parts that will be followed up on tightly with different kinds of progress management criteria. I think a lot of those things work really well. The question is, of course, just, is there an alignment of objectives? I think basically when there is, then it's great to have the problem pursued in industry. If there isn't some problems are really, in some sense. There isn't necessarily. There's not an obvious financial motive to understand how the brain works, except in so much as you could design pharmaceuticals to help it or base technology on it or something. It's these very sideways scant views of it. So I think the benefit of academia is you have more smaller groups of independent researchers all trying their own thing. It's not like a bunch of people are going to lock in and run in the same direction with quite as much ease. But the goal that people have is just the actual can be the same goal as understanding the brain. It doesn't kind of have to align in these sideways ways. So I think that's sort of the pros and cons. When industry does it. Awesome. We shouldn't be delighted. We shouldn't. I don't worry. Like, oh, no, if industry does it, how will academia compete? I'm like, oh, there's still plenty of things to do that aren't necessarily on the path to making better technology or better pharmaceuticals. And we really, we don't have, I think, an alternative to academic research to moving in those directions. [00:25:22] Speaker B: Well, I wish my dad was around to see the course that DeepMind has taken just to circle it back, because incentives change when companies get bought, and pivot is a euphemism that we can use. Right. And then all of a sudden, you're not trying to solve intelligence, you're scaling up. I mean, I don't know how DeepMind works, but I'm curious what he would actually think about the trajectory of DeepMind. [00:25:46] Speaker A: Yeah, it's interesting. And, I mean, this is what people have said this for. I mean, this, I guess, was what happened with Bell Labs, right. Is they did basic research for a while, and then, you know, there was some reorganization that kind of pivoted. Dmind still does a ton of basic research, but the nature of it, and they do a ton of things that are just like, the goal of them is to contribute a high impact scientific result. [00:26:09] Speaker B: Always has been. [00:26:10] Speaker A: Yeah, yeah. And I think that has not, like, that hasn't stopped being true or, like, stopped being a big part of the way that they, like, brand themselves, that we brand ourselves. And I think that that is something that makes it a little bit, that adds a little bit more complexity to the story about, like, what can industry do, what can academia do than just, like, profit motive alignment? Because that is very. It's not profit irrelevant, like, doing good, doing things that are positive contributions to the world are good for a company to do. The company does benefit from that. It's not just like, if you can make a product out of it, then it's beneficial to your company, and otherwise it isn't. So it does make it really. It makes it a little bit more complicated. And, yeah, I think DeepMind still kind of does do a lot of these things, but you do have to appreciate that a company has different goals. I mean, grant funding can be fickle in its ways, too. It's not like anyone is totally immune to this, but, yeah, it definitely. Everyone kind of warns about industry research. Like, as things about the company change, the research will change, too. And that's true of grants, and that's true as well. But the timescale is potentially shorter, and universities have been around for a long time. [00:27:33] Speaker B: Would you rather have a beer with an academic or an industry person? [00:27:36] Speaker A: Oh, it depends. There's huge variability. [00:27:38] Speaker B: Come on, you can't wiggle out of that one, of course. [00:27:42] Speaker A: No, I've had excellent beers with industry and academic people. I don't know. One thing I will say I really like about neuroscience departments is that there is a lot of variety and discipline. Like, when I was in graduate school, my roommate, Diana Liao, she studied Marmoset, how, like, marmosets talk to each other. That was really cool. None of my colleagues at DeepMind study how marmosets talk to each other. That's like, neuroscience just has this tremendous variety and diversity and, like, eccentricity and stuff, and I just love that. So that is a point in favor of having a beer with at least neuroscientists. But it kind of gets away from the more controversial question I think you wanted me to answer, just like it was objectively more interesting. [00:28:35] Speaker B: Yes, let's be objective about it. So, I'm curious. You know, I don't know how your interests have changed, but I know that your projects have changed over time. Right. So you've. I alluded to some of the work that you've done, modeling simulations for turbulence, for design, are those things that you pick? Are they mandated? How have your interests? And you're still doing lots of, you know, interesting cognitive map, reinforcement learning neuroscience stuff, but, you know, how would you characterize your own shifting interests? [00:29:08] Speaker A: Yeah, I think the. So when I was. What I started off working on in graduate school was, like, computational models of hippocampal contributions to learning. [00:29:18] Speaker B: So basically, oh, you were right into hippocampus immediately. Pretty much. [00:29:23] Speaker A: I think that was actually built on, like, a rotation project I did with Matt Botvinek and Sam Gershman. So really, like, right off the bat. So, in the kinds of computational models I was working on then that we were working on for that project were things that more had to do with tabular reinforcement learning and linear algebra, which is to say, they were models where we could set up the math analytically and compute exactly what we were doing. And we weren't training neural networks to do stuff. The motivation was neural networky. It was like that. We want to understand how a certain representation in the brain supports the downstream computation that's happening. So if you represent the world that you're. There's different choices you have for how you're going to represent your experience, and the way you represent it will support different kinds of computations down the line. For instance, a kind of simple example would be, if you want to tell the difference, if you have a downstream task that requires you to tell the difference between different colors, you need to have a representation that is not grayscale, that has color in it. So our work with the hippocampus was about, like, how representations in hippocampus seem to account for what's going to happen next, that they form representation such that different states that are going to the same place end up with similar representations. And if you group things by what outcome they predict, that makes it easier to do certain computations down the line, computations that implicitly need to know something about what's going to happen next. So it's kind of a neural networky motivation, because what neural networks do is they have several stages of processing, and at each one, it's a re representation of their input. So this question of what information they represent at each layer, the neural network gets to figure that out on its own. It's not like you fix it the way we did, but the question of what it does choose to represent has a big effect on what it can do downstream. So the motivation was this, like, representation learning question, but the method wasn't. After I graduated, I was, you know, I was at DeepMind. I wanted to do something with neural networks. I wanted to, like, learn what those were all about also. I kind of wanted to. Yeah, I don't know. I had these kind of concepts that I feel like I was working on in certain ways that were tractable and let you really, like, chew them up and understand every aspect of them. But I wanted to see what would happen if we stuck neural networks on them and made them work that way. So at that point, I. [00:31:59] Speaker B: Wait, wait, let me. I'm sorry to interrupt you, but what percentage of neuroscientists do you think, had that same thought, like, well, we have this thing now. I just want to stick a neural network on it and see what happens. [00:32:12] Speaker A: I bet a little bit. I'm not sure. Yeah, it's definitely an aesthetic preference, and I think some trial and error is required. See if you like it or not. I definitely have talked to some researchers who say they sometimes they work, do some work with neural networks, some works without, and say they just prefer working with them because they feel like, they feel like it's working at scale, and some say they like working without it because they feel like they don't know what's going on. And they want to got into science to think about stuff. And empirically observing is often what you're. I mean, there's exceptions to this, but, like, a lot of neural network research becomes a little bit empirical and observational. So, yeah, I think a lot also just. I think it's. I think it's just a great. It's. It's a very good skill to learn. So I think a lot just want to play with a neural network because it makes them feel safer in their long term prospects, which. Logical. I certainly felt that way at DeepMind. I was like, looks a lot of people, a lot more people doing neural networks here than linear algebra. Like, it's. [00:33:09] Speaker B: Oh, interesting. Yeah, sorry, I interrupted your. [00:33:13] Speaker A: No, no, it's a great question. Yeah. So, I mean, it was kind of the closest path to looking at the same kinds of principles we were thinking about, like, how the brain represents relations between things, how the brain makes predictions about what's coming next, was to work on Pete Batalia's group, where they were using graph neural networks, which are a neural network architecture that reasons about relations between things and using them to make predictions about how a physical system will unfold. So, basically, like, if you have a physical system with a bunch of interacting entities, like things bumping into each other, or fluid particles bouncing into each other, that's a relational system. Interactions between those particles determine what's going to happen. It's also a predictive problem. You're trying to see what's going to happen next. So that's how I got there. It was sort of like the kind of nearest neighbor research manifestation of these relational and predictive ideas, but in a machine learning form, in a machine learning application. [00:34:16] Speaker B: So do you view the brain as a machine learning entity, broadly speaking? [00:34:25] Speaker A: I mean, I guess if. Yeah, I guess at some literal level, I guess I view the brain as a learning entity and not a machine. I guess by definition. But machine learning, I think, is kind of the. The extent to which the brain is a thing that learns stuff. Machines that learn stuff are a good batch of machine related analogies for that, if that makes sense. [00:34:49] Speaker B: Yeah, yeah, yeah, sure. I'll accept that, basically. [00:34:51] Speaker A: Yeah. I think machine learning is a. Is a really great batch of tools for thinking about. For thinking about how learning works, and. Yeah, I think there's just so many concepts in common between understanding how machine learning works, understanding how the brain works, building better machine learning models, understanding why different brains are the way they are. Yeah, I think. [00:35:16] Speaker B: But you see the brain as a learning entity. [00:35:19] Speaker A: Yeah, primarily, yeah. I mean, that's at least the aspect that I study and find most interesting. I think the whole nature versus nurture. I don't have a super creative opinion on that. But I guess the extent to which the brain is a learning thing is probably the extent to which machine learning is a good batch of models for it. [00:35:42] Speaker B: Okay. [00:35:42] Speaker A: Yeah. [00:35:43] Speaker B: So, yeah, you have worked a lot on the hippocampus, and there was a guest speaker that came. I'm at Carnegie Mellon university, and there's a guest speaker that I had lunch with, and he does work with hippocampus stuff. And I realized maybe I hadn't thought of it before, but the hippocampus has been sort of the darling of neuroscience now since place cells. Probably. Right, since that became popular, would you say? That's right? [00:36:08] Speaker A: Yeah, I would. I mean, I'm around back then, but I definitely got that sense. I was warned when I started hippocampus stuff, people were like, it's a crowded field. Good luck in there. [00:36:19] Speaker B: Oh, really? [00:36:20] Speaker A: Yeah. [00:36:20] Speaker B: I thought visual neuroscience was a crowded field, but I guess hippocampus took it. [00:36:25] Speaker A: Yeah, I mean, hippocampus is crowded. Vision is crowded, RL is crowded. I don't know, a lot of the good. [00:36:30] Speaker B: Oh, yeah, these days. RL, do you feel that it. That it's crowded? Because I'm tired of seeing algorithms, new algorithms. I'm like, oh, I gotta learn another one. [00:36:39] Speaker A: I work on hippocampal contributions to RL, so I guess there's some part of me that just thinks objectively, RL and hippocampus are the most interesting and could never be too crowded. But there's a but? Yeah, I mean, they're definitely. They're definitely real popular. [00:36:52] Speaker B: What does hippocampus do? [00:36:54] Speaker A: What does hippocampus do? Seems to help with memory, maybe some kind of structure learning. [00:37:00] Speaker B: You're good. You're good at wiggling out of questions like this. But, you know, I mean, so there's the learning aspect. [00:37:06] Speaker A: I'm not trying to. I'm just also trying not to say things that are wrong. [00:37:09] Speaker B: Well, yeah, when you're someone like me, you get used to that real quickly, and I'm fine being wrong. But. But, you know, I mean, it's memory, it's learning. It's spatial cognition. It's a cognitive map. Is it all of these things? I mean, is it. Is it. Do we need to say hippocampus does function x? [00:37:28] Speaker A: Yeah, I mean, it's definitely. It does. A lot of it seems like it is implicated in a large number of things, and, yeah, I think there's. People joke. Like, people make fun of hippocampus researchers and say that they think that the cortex is just to keep the hippocampus warm. [00:37:47] Speaker B: I haven't heard that one. That's pretty good. [00:37:49] Speaker A: Yeah, I mean, it's obviously not doing everything in the world, I think. I mean, there seems like hippocampus is unique in certain ways that do justify its supposed ubiquity. It gets projections from all over the brain. It seems like it's processing lots of different kinds of information. So it's not going to be specialized on a particular sensory modality. It seems like there are some things about it that are different from other areas. In particular, it seems like it's capable of really rapid learning. I think the complementary learning systems idea makes a kind of specific, but non specific prediction about hippocampus. [00:38:32] Speaker B: Let me just say what that complementary learning systems is real quick. So the idea is that you learn quickly and rapidly in the hippocampus, and then it sort of transfers over time. And what is the word I'm looking for? Consolidates. Right? Is that right? [00:38:47] Speaker A: Consolidate, yeah. [00:38:49] Speaker B: In the cortex over time. So hippocampus keeps sending this learned information to cortex, and cortex, over time, consolidates the information. So there are two kind of learning systems. Did I say that right? [00:38:59] Speaker A: Yeah. I mean, at least. Yeah, that's my take. And that decomposition of roles leaves a lot of stuff for hippocampus to do. It's saying, like, what it's specialized in is rapid acquisition and memory for specifics. And that should relate to all these other things, rapidly learning about spatial environments and where things are stored in them, rapidly acquiring new episodic memories. Remembering. Preserving specific aspects of memories from which you don't want to generalize. Like, maybe you. The classic example is, like, the difference between where you usually park your car would be more a cortex job, and where you parked your car today would be a more hippocampus job. It's something specific. You might change this over time, but you don't want to forget that information. It's useful to preserve. And then also the ability to form new memories should interact with the statistical memories you've formed, the kind of structural memories you've formed in a pretty deep way, because a new thing you want to learn is maybe a reconfiguration of old statistics. If I today put a coffee cup on a bench, I have a concept of bench, I have a concept of coffee cup. Those are probably statistically learned properties, but the specific conjunction of them is something that's new. So that hippocampus rapid learning should interface with slower learning isn't necessarily a contradiction, but it does make it easy to. To just kind of have this explosion of roles where hippocampus is doing all of these things, when actually we should really be thinking about it more in terms of how hippocampus is interfacing with other areas that are all partially doing these things. [00:40:53] Speaker B: I feel like. So you mentioned reinforcement learning, and you've done a lot of work in reinforcement learning. And I mentioned the idea of cognitive maps a moment ago, and I mentioned how hippocampus has been the darling, and all of these things are wrapped together, right. And since then, there's just been an explosion of algorithms, reinforcement learning, you've worked with successor representation learning, et cetera, model based, model free. What is your perspective? I mean, it's out of control, isn't it? [00:41:25] Speaker A: I think it's nice. I think that the. Yeah, it'd be good to get everyone on the same page, I guess. What does that mean? You want model comparison? I guess this is something that you want to see what one model does that another model doesn't do necessarily. So I think a lot of the. Yeah, so maybe a risk of, like, proliferation of different models is if you aren't comparing the models on the same data or if you aren't comparing models at all, you're just saying, look, this model captures this picture in a paper. This other model captures this other picture that was in a paper, and you don't necessarily compare them all together. So one thing that people are doing this with hippocampus a little bit, there's some toolboxes that people have been developing. I know Ratnabox is one of them. And then by Tom George at UCL, Clementine Dominie has one that I'm blanking on the name of, but that's also at UCL, I mean, basically these things that are trying to take lots of models and try to make the same predictions of them. Brain score is a nice example of this in the visual world. And there's issues with making things too score based and too benchmark based, but it has benefits that all people are talking about the same data and not just saying, this model captures this one aspect, this other model captures this other aspect. Assuming that the brain could just put both together and have it work just as well, or not necessarily arbitrating when the models are mutually exclusive. Yeah, I mean, that's just an important thing to have. On the other hand, I think. Good. Yeah. I mean, I almost see that massive proliferation as a success of the RL account, rather than a warning that if you make a model that's somewhat right, that's doing kind of a simplistic RL thing, then a bunch of other models will come that make that, do more nuanced versions of that or more particular versions of that. A multiplexed RL, distributional RL action prediction error, regularized RL, like, you know, all these kinds of things that add nuance to the picture. If the original RL thing was totally wrong, this would be, this would be a really bad use of effort. But if the original art thing is a bit right, but not capturing everything, that's kind of, that kind of worked pretty well. It's like you have a sort, of course, simpler picture, and that got you into the vicinity where people can bubble around making different models and seeing which aspects you can capture and then try to consolidate them. So if. [00:44:11] Speaker B: Yeah, yeah, I think I remember you saying in one of your talks, maybe I don't remember exactly the way you phrased it, but you used to think that reinforcement learning, or at least model free reinforcement learning, was like, hey, you did something good, but now you think it's like, oh, this is the worst way to. How did you phrase it? [00:44:30] Speaker A: I don't remember. Yeah, I know. I know what you're talking about. So I use an example when I'm giving RL lectures on what makes reinforcement learning different from other types of learning. And it's commonly. [00:44:45] Speaker B: Yeah, it's cruel. I think cruel is the word. [00:44:47] Speaker A: I used the word cruel. I think reinforcement learning is a little bit cruel. I think of it as almost very passive aggressive. And the example I have of this is if you were trying to learn biology, one way you might try to learn biology is read a lot of textbooks and try to find patterns and link things across different bits of the text and like, group all of the things that have to do with the mitochondria and recognize, like, I don't know, causal structures from mitochondria producing energy to enzymes that use that energy to build stuff or whatever. Like you, I try to identify structure, and that's what unsupervised learning is all about, is it's a large and kind of amorphous field, and it's all about trying to learn patterns and structure. Supervised learning is much more limited in its scope. That is the setting where you have a. There exists a correct answer, there's a right answer, and you are trying to find that answer. You can set it up as classification problems or regression problems. You have a picture of something, a human labeled it a picture of a dog, and you are trying to train a network to say, this is a dog. If it didn't get. If it gets that it was a cat, the answer it gets is no, this was a dog. It doesn't just get like an answer, like ten points or six points or minus eleven points without any context for how many points are possible or whether there was more points available with some other answer. That latter one is what reinforcement learning does. Try to learn biology by taking tests and then just getting like a score of 34 and not knowing if it was out of 34 or 8 million or, you know, what exactly. Which answers you got wrong and which answers you got right. It's just kind of a. It's a very, like, restrained signal. So it's nice for setting up the problem of autonomous learning. If you can learn from signals like that, then you are pretty good at figuring stuff out on your own. You don't need the handholding of supervised learning, but it, I think, also highlights the challenges of it. Like you, the credit assignment problem is really hard. You have to figure out which things contributed to your points in which things didn't. The exploration problem is really hard. You have to figure out how many points were available to you, were you to have taken different options. So, yeah, I think of reinforcement learning as somewhat cruel rather than this. Like, oh, you did a good job, you get a treat, which I think I used to conceptualize it as a very, like, gentle, warm way to nurture an agent. [00:47:15] Speaker B: So why do, why do we, why do our brains employ such cruel, passive aggressive algorithms? [00:47:21] Speaker A: Well, it's very, it's very flexible. So you can, you can learn lots of different types of things, and it's much more autonomous. Like you, you don't need information to be labeled for you if you don't have labeled information, you don't really have a choice. So the other thing is like, there's a lot of ways, there's ways to make reinforcement learning easier. And a lot of the research that I've worked on has been about how to do that, how hippocampus might specifically help by doing that, by combining unsupervised learning with reinforcement learning, basically by learning some structure so that you can strain the reinforcement learning problem to a narrower set of possibilities or give it some hints about what kinds of things might be driving the number of points. Basically read a textbook and then take a test and see how many points you get rather than just taking the test and hoping that like a billion tests will eventually teach you biology. [00:48:20] Speaker B: Where is model based reinforcement learning these days? My sense is that. So I travel in particular bubbles like we all do, but my sense is that the tide turned to everything is model based reinforcement learning at the pendulum, let's say. And that pendulum has kind of swung back and said, well, maybe very little is you actually don't need model based to do a lot of these things. And so maybe we're sort of overstepping our bounds in thinking of everything as model based. Is that accurate at all? [00:48:52] Speaker A: Well, it's interesting. I mean, a lot of the huge tide in machine learning recently has been this, the rise of self supervised learning. To the extent that it's sidelined reinforcement learning to a large degree. I think there used to be a lot more appetite for, like, let's try to learn as much as we can with reinforcement learning, and we'll rely on self supervised learning when we need a bit of a crutch. Now, self supervised or unsupervised learning, they mean really similar things. Basically, it turns out that gets you very, very far. Like, if you just train a model to do next step prediction, you're like, on words. That's how language models are trained. You're very close to being able to not just predict the next word, but make the next word be something that you actually want it to be. And so I think that is actually very, it's a little bit different from some ways of conceptualizing model based learning, but it's very similar to others. It's basically like you're using a predictive model to start off the process and then you coax it to do exactly what you want, which is very much the, I think, key concept of model based learning, most model based reinforcement learning, or at least like most that I was kind of familiar with or the cartoon I had in my mind of model based reinforcement learning is tree search. So you have a model that can do simulate different things, you simulate different outcomes, and you see what happens, and then you pick the action that leads to the best possible things. But model based reinforcement learning can be structured in other ways, too. It could be, for instance, learning concepts about the world, learning some model of these things are all the same and are going to behave the same way. And then when you learn about their value with model free reinforcement learning, you've done some sort of vaguely model based organization of them that supports it. That's sort of what the successor representation does. It more that way that you represent things in terms of the predictive structure, and then the model free reinforcement learning that happens on top of that automatically takes into account some features that a model based model would have told it about. Yeah, there's other ways that people kind of use these self supervised models in conjunction with reinforcement learning. So it's kind of. It's a little bit of a different take, but I think it's still the same concepts shuffled around in a different way. [00:51:23] Speaker B: Shifting gears. I was going to ask you. I was going to sort of segue into cognitive maps. I'm not sure. Yeah, let's. I'll just shift gears, actually, and ask you. And one of the questions that I sent you was, what do you think neuroscience needs more and less of? And I mean, forgive me, like someone who works, you know, at DeepMind, right, in industry, I think, has a unique perspective, perhaps on these things. And so I'm genuinely curious, you know, from your perspective, if you think that, you know, you have a perspective that might be unique relative to a normal academic like myself. Right. Do you, can you see neuroscience sort of from the outside? I know you're still inside it also, but you're also outside it. And do you have a take on this? [00:52:12] Speaker A: Yeah, it's funny. I really. I struggled with that question, and I. [00:52:18] Speaker B: It's an unfair question. [00:52:20] Speaker A: Well, it's. [00:52:21] Speaker B: I don't know. [00:52:21] Speaker A: It's a. I feel like it's a very fair question. I mean, like the. At some level, like, every part of picking a research problem is trying to figure out what the neuroscience world needs more of that you are equipped to do. Yeah, I mean, and I also. I guess the kinds of things that I wrote down that I tried to brainstorm about this were, wow, you put effort into it, which is extra embarrassing. [00:52:52] Speaker B: Because I don't know if I'm so thankful. [00:52:55] Speaker A: Yeah. I think that the thing that I thought maybe I might have a bit of some unique insight on, or that the DeepMind perspective relates to, is the extent to which we can build pretty sophisticated models of really complicated processes. The basic revolution in machine learning recently, or the. The thing that we have increasingly been showing our ability to do is that we can learn really complicated functions with enough data. The sky's kind of the limit to how complicated a thing you can learn. So learning predictive models is definitely something we can do. I think there's a lot of appetite for this, for building predictive models of neural data, of behavior, building foundation models that integrate lots of different data. And the question that arises from this is, like, what do we do with that? If we distill a black box system into another black box system, what do we do? Was that useful? What is that useful for? And I think, even on its own, that is a useful thing to know. How much systematicity there might be in the data, how much is predictable, how much a model changes when you add new data, what kind of counts as surprising based on the data you've had at the time? So, like, even these kind of black box predictive models, I think, could be really useful. There's some other ways I think, that we can make them more useful if we combine the deep learning model with a model that has a little bit more structure, a little bit more interpretability. This has been a really big theme for the neuro lab at DeepMind recently. Folks like Maria Eckstein and Kevin Miller have done some nice work on this recently, and I've been thinking a little bit about how to relate the learned physics dynamic stuff to this if we were to apply the same kind of techniques to biological data. And I think, basically, while the revolution in machine learning was very much about building good predictive models, we haven't done quite as much work on building predictive models that are also constrained to be interpretable or constrained to interface with existing models that we've built, that we know how to say structured things about. But it's not like that's not really something that's fundamentally ruled out. I don't think we have to consider these models 100% black box. And I think thinking about how to combine really complex models with some structure, either by including it as an architectural prior that you know something about, or like interfacing the model with knowledge, or interfacing the large black box model with some prior that takes into account structure that could be hypothesized. I think that that's a really useful direction for neuroscientists to be thinking about, particularly neuroscientists who have an interest in building large scale models and access to doing that. So trying to come up with something that's in between super complex, predictive, we're just going to build it all and pretend that that was enough just to imitate the system. And this more like, we only care about exactly what we can build in a very tractable way. If we can interface between those levels, I think that would be really powerful, and it would also open up the ability to compare models to data in a much richer way. If we can say, this is the behavior that should be happening at the level of this video of an animal, because we've combined the model with something that actually generates low level behavior, that would be a much stricter way of comparing to neural data and behavioral data than it would be to just say, like, this shows, on average, an increase, or we see this gross statistical effect on average, but it's not capturing every specific eccentricity of the behavior. So, yeah, I think that integration between structure and data is the big thing. [00:56:59] Speaker B: Okay, so is it fair to give the crude summary that neuroscience needs more interpretability on top of the big models? [00:57:09] Speaker A: Yeah, broadly speaking. Or maybe that the dichotomy between interpretability and data driven is maybe a little. It's too Tarsha dichotomy. We shouldn't be fighting if we want models that are predictive or if we want models that are interpretable. Like, we all want both. And we should just try to see, like, when we can, how we are, like, think of this more as a pareto frontier, and, like, that has a more, like, complex trade off. [00:57:42] Speaker B: What about. Did you take any notes on the question what neuroscience needs less of? I hope you didn't work too hard on these things. [00:57:49] Speaker A: By the way, I just jotted on some notes I wrote specifically. I don't know if I really have an opinion on that. Honestly, I couldn't think of anything creative here. Yeah, I don't know. I mean, I think we. Like, one thing I notice in my own research is that, like, when I am targeting a journal paper versus a neurips paper, I have a different. Like, it is easier to do something a little bit bite size or more incremental, or, I don't know, with neurips. [00:58:26] Speaker B: Or with neurips. [00:58:27] Speaker A: Yeah, it's easier to kind of be like, you know, we capture this one thing. Trust me, it's wrong. Whereas when I'm thinking about writing journal articles, it's like, we really need to convince people broadly in a really different way. The goal of a journal paper, I think, is to tell a story. Yeah, tell a story and talk to. I feel like it's much more about talking to neuroscientists and saying, I think this is right. Do you believe me or. I think this has something to offer. Can I convince you that that's true? Whereas with. So I wouldn't say we need, like, less neurips papers, but I think, like, if we just started only doing neurips papers and not writing in that more journalistic style, it might be a bit of a loss. On the other hand, the journal review process is, like, way overbearing and unnecessary. And I don't like for profit journals. And everything about open reviews, I have a lot of respect for. So there's pros and cons to both models. But I think that specific aspect of writing with an idea of impacting the field, rather than writing with the idea of scoring a win and getting a neurips paper in is probably more scientifically minded. [00:59:37] Speaker B: But there's something nice about the honesty of the, let's say the neurips. We'll just call it the neurips method. Right? Because you don't have to change the world. You can just do something that's incremental. You don't have to claim that you're solving something that no one has ever. You know, there's just a lot of overclaiming in the storytelling in journal papers. Like when I, when I think about writing a journal paper, I think, all right, how can I trick them into thinking that my stuff is important? Right? [01:00:04] Speaker A: That's funny. I might just have such a different model of, like, journals versus. Because I think that Nurp's paper is over. Like, I think of them as the worst for the over. You know what? Maybe it's just safer to say that they overclaim in different ways. I think the thing I find frustrating about, and I think that's actually probably pretty accurate, there's an incentive to over claim about how broadly your model captures results in a neuroscience paper. There's an attempt to over claim about how novel your method is with a machine learning paper. So a lot of, I think the most frustrating thing about machine learning papers is that there's just this incentive towards what I call a categorical geometry. You want to say here, we made a model. It is completely different from all of the other models. It is like itself, just like not. There's no frame of reference in other models. It's a unique point on the landscape. It's new and we made it, and it's a new method, and it's innovative and it's novel. I promise you. The method is new. We compare to other models, which themselves are their own little points on the landscape, and our model does better than them. And you're not really looking like, well, does your model have the same thing as this other model at some level and a different thing at this other level? How is this similar to other stuff in other fields? Like, I assume math, but I'm not a mathematician and like neuroscience, I think there's. You get more points for being integrative. You get more impact if you can say how things relate to other things, rather than just, like, saying, this is, like, maximally unrelated to other things. And so I think that makes it really, like, extra work to understand principles about what's working, what isn't working in machine learning. [01:01:44] Speaker B: So you would rather have a beer with a neuroscientist than a machine learning researcher? [01:01:49] Speaker A: You asked. Academic or industry? [01:01:51] Speaker B: I know, but I'm changing the question. [01:01:54] Speaker A: But it sounds like either. But I. Yeah, I mean, I definitely. Because, well, the thing is, at some level. Okay, at some level, I'd rather have a beer with the machine learning person, because the only way to really figure out how a model is related to another model is to have a beer with them, because it's not going to be in the paper. [01:02:10] Speaker B: Oh, wow. A little backhanded compliment, perhaps. [01:02:15] Speaker A: Yeah, I'm not sure, but yeah, no, I think that they're both super interesting fields. If I could have picked a single one, I would have done it. But, yeah, I think it is true that the kind of behind the scenes talking does fill in a lot with machine learning papers, because you kind of learn how they thought of the idea in the first place. The same is true of neuroscience. At some level, though. You talk to people and they're like, yeah, this data, we feel really sure about this data was technically significant, but, like, we feel a little shakier about it. Like, there's always a lot to be learned from actually talking to people. [01:02:54] Speaker B: Neuroscience feels like it's in a. So in one respect, it's in an awesome place, right? Because we have more compute, more data, more models, more everything. But it feels like we don't quite yet know what to do with all of the stuff. And there's a lot of exploratory work. Right. Like what I'm doing right now, it's like, well, we're going to throw all the tools at it that are around because we don't really know what to do with the data. Does that ring true to you? [01:03:26] Speaker A: Yeah, that does ring true to me. [01:03:28] Speaker B: Okay. [01:03:29] Speaker A: Yeah. [01:03:30] Speaker B: What does that mean? Like, how do we get beyond a weird spot? Just a bunch of brilliant people that make the right advances. [01:03:38] Speaker A: I mean, I think of it, at the very least, I think that's a feature of neuroscience that's useful for recruitment because it makes it an exciting time to be in neuroscience. You're not just, like, figuring out. You're not just kind of answering specific questions. You're figuring out what are the right questions to ask and what's the right way to describe them. [01:03:55] Speaker B: Yeah, that's the hard stuff. [01:03:56] Speaker A: Yeah. Yeah. [01:03:58] Speaker B: And the fun stuff. [01:03:59] Speaker A: Yeah. And a lot of people don't like that. They're like, I feel like I spent five years doing a PhD, and I have no idea if the even the question I answered will be relevant in 20 years, let alone the specific answer to it. And I think that, yeah, that's a personality difference, I guess, about risk profiles and what kinds of thoughts you want to have. But I think the fact that it's philosophical is at some level, we're not really sure how to even be saying things is really interesting. And, yeah, I mean, the aspect of the black box machine learning, predictive models versus more structured models, is a really nice example of that. Is that a philosophical choice about which direction to go? Are there ways to combine the best of both? Like, what would that look like? That kind of just maybe there's more structured ways to do it. Maybe there is a right yes or no answer. At some level, it feels like it's a bit of an empirical question. Try different stuff and see what works. And what works is itself a really hard thing to evaluate. Like, what are you going for? If you're just going for prediction, then, of course the models that just do prediction are going to win. If you're going for insight, that is pretty hard to optimize for, and it's really hard to take gradients through it. So, yeah, I think there's a lot of exactly how to operationalize what it means to understand something and make progress on a question. Have a model that makes progress is really tricky. The RL models you mentioned of, like, we have this RL model, and then there was an explosion of things. You could argue that that was a win for the model and that it opened up a lot more research. And on a longer time scale, I guess we'll know if that research was useful or not, but I'm not sure how we'll actually be able to use that to restructure the models we're asking. [01:05:58] Speaker B: Yeah. Yeah. You just made me realize that, like, it happens too frequently on my commute to work, like, on the train. I'll just be sitting there, and then I'll think, oh, God, what is my question? You know, you need to get. Because that's the. That's the important thing, right? It's like, to have the. Have the good questions. And I was like, oh, do I have a question? Oh, no. [01:06:20] Speaker A: You know, so, yeah, no, I don't know. I mean, I feel that way. I feel that way a lot. I put off for a really long time writing my bio for my Columbia page. In fact, I think I still have not written my bio for my Columbia page, because I was like, oh, my God. A one summary sentence of my research. That sounds like a job for tomorrow, Kim, that is really hard. [01:06:41] Speaker B: Well, how have your. This is, again, a very difficult question, I'm sure, but I. Can you articulate how your questions have changed the nature of your questions, if not the content of the questions, over the course of your career? [01:06:56] Speaker A: Yeah, I think one big one. So I think I originally was kind of motivated. One of the things that shaped some aspect of my research trajectory was in college, I took this philosophy course with Dan Dennett that was on AI. It was called language in mind. And one of the thought experiments he raised was this idea of a robot firefighter and what it would take to design a robot firefighter. And he kind of just jotted down all of these competing things that the robot would have to figure out how to reason about. At the same time, it would have to figure out how to search a building to make these complicated decisions about whether it should be looking for people or if it should be taking the people it found out of the building, like, these high level goal questions, while at the same time, also, like, putting one robot foot in front of the other and actually moving in a direction that there's this, like, we end up parsing this or, like, triaging this hierarchy of decisions, and that allows us to function in the world. And there's, like, some kind of organization to behavior. And finding it is an interesting problem is kind of the way to study the brain. And a lot of the projects I thought about and things I worked on relied on these explicit hierarchies, like, how do you. What should the high level and the low level be? I think one thing I have. [01:08:31] Speaker B: Wait, wait. So what you're saying is you came into it because of that, or you were influenced by that into thinking in terms of hierarchies? Is that what you're. Okay? [01:08:38] Speaker A: Yeah, yeah, that's a way of putting it, yeah. I was very thinking explicitly about hierarchies, and I think one thing I sort of moved on was whether was exactly like, when and where to think about hierarchies being useful or necessary, that a lot of times if you train a model that is a big model that is trained to do something like next step prediction, it can do some of those things implicitly. You don't necessarily need to hard code those things or the places you thought you would need to hard code them. You don't necessarily. It turned out, I felt like I had all these projects where I was like, this obviously will need hierarchy. And then it was like, well, actually just a neural network doing simple stuff is, or trained in a simple way is a pretty tough baseline to beat. So it's not necessarily like that. Hierarchy isn't useful. I think it just often is emergent or kind of implicit in the system. And I think a lot of the ways I shifted to thinking about hierarchy were instead of just thinking about how to put it into a system, which I still do and I think is still useful in the certain settings for getting better generalization and stuff, it's not like that's not useful, but thinking about when it's useful, I think I took for granted that it was obviously useful a lot of the time. And now I think of it as a much more, as something that you need to think a lot more carefully about when the data tells you enough to generalize, when you actually need hierarchy to reason more efficiently or to make a broader inference. The other thing is, I kind of shifted to thinking like, well, how does this hierarchy, if this, what appears to be a hierarchical ability, is emergent, how can we understand how that is unfolding in a neural network? Like, really shifting to it as an evaluation rather than a method? If the behavior is happening, can we understand it? So I think thinking a little bit more like how things. Yeah, thinking more implicitly about how structure and generalization emerges was maybe a little bit of a soft trend in my research. [01:10:56] Speaker B: Has modern AI shifted the way that you've thought about that as well? I know that you're impressed with large language models, foundation models. I've heard you say that, but of course, everybody is right. But I think I remember you saying that that has sort of shifted the way that you think about intelligence or brains, perhaps. [01:11:13] Speaker A: Yeah, I mean, I think there. I mean, I'm definitely. Yeah, I think large language models are good. That's not the world's hottest take. The fact that they can get as far as they can with next step prediction is pretty fantastic to me. It reminds me of a. Maybe it should have been more predictable because it reminds me of this thing that was observed a really long time ago with. Hm. That supposedly, if you just kind of had a conversation with him, he didn't seem that. Oh, sorry. I should say. Yeah, for people. [01:11:47] Speaker B: I was going to do it. That's okay. You want me to do it? Hm was the most famous patient in history in the neuroscience world because he. I don't know how he lost his hippocampus. Was it a stroke? [01:11:58] Speaker A: No, surgery. They took it out. [01:12:00] Speaker B: Oh, that's right. Epilepsy. They removed it. They removed a large swath of his hippocampus and some surrounding tissue as well. And he had all these. He had retrograde amnesia. Is that what it's called? Yeah. So he couldn't remember anything moving forward? [01:12:14] Speaker A: Yeah. [01:12:14] Speaker B: Enter a grade. I don't remember. [01:12:17] Speaker A: I should definitely notice. It was the one where I. He could remember stuff before the surgery up to, like, a few days before the surgery, but he couldn't form new memories. [01:12:28] Speaker B: Neither of us know what it's called. That's embarrassing, isn't it? [01:12:31] Speaker A: Yeah. We have both of our hippocampi, as far as I know. [01:12:33] Speaker B: I know. We also have computers in front of us. [01:12:36] Speaker A: I'm worried your listeners will hear me typing if I look it up. [01:12:38] Speaker B: I'm gonna. Let's just fix this. Let's fix this immediately. Here. Surgically anterograde. [01:12:46] Speaker A: Cool. Okay. I'm glad we checked. All fixed the anterograde. I always. I totally suspected. Yeah. So he had. He had his. Both of his hippocampi surgically removed and a little bit of OFC, too, I think, and lost the ability to form new episodic memories. Basically, he lost the ability to form most memories besides, like, slowly acquired complex motor control tasks. But supposedly, if you had a conversation with him, he didn't seem that different. You would think that the ability to form new memories would manifest pretty immediately in a conversation. Instead, it would be like if you were just having a conversation with him about the weather, what was going on, or something that didn't require accessing old memories. He would sound pretty normal. It was really like, specifically tasks that require things we think of as hippocampal functions where the deficits would show up. [01:13:52] Speaker B: Why is that surprising to you, though? [01:13:54] Speaker A: Well, the reason it is, I mean, maybe it shouldn't be surprising, I guess. I just think that memory informs so much of our ability to have conversations that when you're talking about, when you're talking about new things, it feels like you're searching your memory for the thing to say next. If you say something that reminds me of something that I heard, and then we talk about that for a while, it feels like a very, like a process where you keep diving in, scooping up memories and sticking them into the conversation. And maybe his conversations were, like, a little bit boring or something and didn't do that, or maybe, I don't know, that's, I guess, harder to write a paper about. I think the reason I found it surprising is just because it feels like memory informs what you're going to say next in a very recognizable way. And maybe that's just the wrong instinct, but it seems like actually you can continue in a conversation saying pretty complex things without obviously seeming that different unless you're asked pretty specific questions. So that's, I think, kind of the how I now interpret retroactively this surprise about predictive models, that they're going to do a pretty good job predicting what's going to happen next in a sentence, and only on really specific questions about factuality or things that specifically go against the statistics of your experience, specific reasoning questions, will you see something that's kind of obviously wrong? You can get a. That's pretty far and just like, next step prediction. So that, I think, is maybe kind of a neuroscience, a thing that kind of should have made something not surprising. That was surprising to me, not surprising by historical context. [01:15:45] Speaker B: So you think it's all about prediction? [01:15:47] Speaker A: I think prediction can get you really far. I kind of think, yeah, I think prediction can get you really far. Yeah. It's hard to rule out other things. There's certainly, like, there's things that are not just prediction, I think especially, like, in hippocampus, we had a model of how hippocampus represents predictive structure. There's a lot of retrospective structure that it represents, too. If you came from different places but are going the same place next, hippocampus will be different based on where you came from. It represents information about your past as well. Keeping information around about the past and the future seems like it's important. But, yeah, I think prediction can get you really far. [01:16:31] Speaker B: Do you think you have a better understanding of now as opposed to, I don't know, whatever, five, six years ago of what intelligence is? I think I have a worse understanding. [01:16:46] Speaker A: Yeah. [01:16:46] Speaker B: Than I did. Really? [01:16:48] Speaker A: Yeah. I guess I feel like this I kind of had in mind. Yeah, I guess I thought of it as something maybe a little bit more structured, and now I'm not. I'm maybe less sure. I mean, I guess maybe a large enough neural network trained on complex enough data. The fact that it could eventually imitate something that looked basically like language comprehension maybe shouldn't have been surprising, because we just know that neural networks are universal function approximators. They can get any function, even a really complicated one. I think I probably had different intuitions based on just nothing in particular, but my own intuitions about what would be required to make that work efficiently and what we would have capacity for. So I think I kind of maybe felt like more structure would be required to make as much progress as has been made. [01:17:55] Speaker B: What do you mean, structure? I'm trying to. [01:17:57] Speaker A: Yeah, I mean, like, explicit decompositions, like, okay, built in category. This is another category. Their operation is constrained in some explicit way. Yeah, I think that I might have felt like. I think I had an instinct for us needing some aspect of that. Yeah, I mean, I think one thing that's really interesting about transformers as the model that is, like, currently reigning supreme is they are a very. They're very beautiful architecture. Like, they have this deep symmetry built into them that processes sequences in a really different way from how people previously thought sequences should be processed efficiently, which is to say, they just look at everything in their recent context, and instead of saying things that happened a long time ago are going to get processed differently from the model, they're going to have been processed more times. Because you have to keep applying a recurrent neural network that keeps updating itself based on new information. You just say, like, we're just going to have all of the information we have in recent history, and we're going to label it by itself, its position in the sequence. But every point in that sequence, every entity in that sequence is going to go through exactly the same processing pipeline. You're going to have the same weights that look at every token in that sequence, and only the label on that token that tells you where in the sequence it is, tells you information about the sequential structure. So you could basically scramble up the sequence, and as long as you kept the labels the same, the model would do the exact same thing. And that, I think, is really cool. It's a really fundamentally relational structure at a very low level. It's saying we're going to process all relationships between entities in our sequence with the same kind of general purpose relational operation. And you can tell us if you are nearby in a sequence or far away in a sequence, you can tell us what features you have, and as layers of processing happen, you can update these tokens with whatever the model has kind of processed about them, but it's still doing the same relational mechanism. It has a deep connection to these models, graphnets that we work with a lot for physics that, like, fundamentally represent predictions about a system in terms of relations between its entities. So I think there's been a lot of work on relational reasoning as a powerful mechanism for computation. And I kind of hypothesize, although I'm not entirely sure how I test this, that that operation is really is a powerful one, that is partly like giving the model some kind of computational object that it can kind of implicitly sort of arrange relational operations into. [01:20:45] Speaker B: But what does that. Does that tell us anything about brains? Sorry, this is a naive question, but, you know, we don't get labeled sequence information through our senses, right? So is it a. Yeah, well, the. [01:20:59] Speaker A: Brain might go out of its way to label them. So that's. That's, I think, one hypothesis that's related to cognitive maps, which you mentioned. I think one. One. The hypothesis about place cells and time cells in hippocampus is that they are. They're learning to label incoming information with a spatio temporal tag, which then could. Could be like a useful attribute to have on some kind of information. If you want to index it by space and time, that could be really useful. I think one thing that's useful about transformers is they structure sequence prediction. They structure memory as an attention, as a problem of attending over your past. I think that's a cool way to think about how you might use memories, rather than just recalling the thing that is the most similar to your current situation, loading that into memory and then reasoning accordingly. It's a much more flexible and directed process. It's saying, like, you can choose which things in your past you're going to attend to and use to inform your current decision. It's really like a more. It's kind of a more flexible take on something like, temporal context model, where you have, like, where you are in a sequence, informs how you label and retrieve entities that you remember. But instead of having, like, a. Just an exponentially decaying factor, you have a weight on it that you can learn something that you can, like more flexibly modulate, to relate your past to your present. So I think this model class is useful for expressing questions in that form, like, how could we attend to our memories in a way that's a little bit like that? We now know how to train, but is a little bit different and richer than what previous models are doing while still relating to those models. There's a lot of questions about biological plausibility, but that's true for all the neural networks that we currently reason about. [01:22:57] Speaker B: Who cares about biological plausibility, though, really? I mean, if it's working, it's working, right? [01:23:02] Speaker A: Yeah, I guess if it's. I mean, it depends if it's working to do. To like, explain how the brain works or if it's working to kind of. [01:23:08] Speaker B: Yeah. [01:23:09] Speaker A: To be good at machine learning, that's a different question. [01:23:11] Speaker B: Right, right. [01:23:12] Speaker A: Yeah. If it's explaining stuff you couldn't explain otherwise, like, I think that kind of. That kind of counts as working. But. Yeah, I guess the key biological implausibility of transformers is like, how would you keep all those tokens around? [01:23:27] Speaker B: Yeah, yeah, I don't. I know that I don't keep. I'm almost. I'm close to. Hm in that respect. [01:23:33] Speaker A: I don't have. Yeah, I'm not exceptional for my memory either. [01:23:37] Speaker B: How long are transformers going to be the thing? [01:23:39] Speaker A: I don't know. States based models are getting some attention. I think they have a property in common with transformers, which is that you can train them really efficiently by parallelizing over hardware. I think the key innovation with transformers was. Yes, in some sense, it's lessen efficient because you're just representing all of your memory rather than re representing it in the way an LSTM does with a shared batch of parameters. However, you can parallelize the operations that you're training a matrix on. So even though you have way more parameters, you can efficiently train them on way more data. So it's actually kind of nice. And state space models too, I think, have a rearrangement of RNN's that allow parallelization to happen a lot more easily. They just like, at least my. Yeah, that's. That's essentially my understanding. So I think the architectures that lend themselves to parallelization will that. That's a really big advantage if you've got lots of data and lots of hardware. I don't know if they'll be the. I don't know how long they'll be the chosen model. I mean, once they. Training a big model is really expensive. So once you train a big model, there's a real switch cost. [01:24:54] Speaker B: Oh, yeah. That's not stopping anyone. Yeah, I guess there's a lot of money in it. [01:25:01] Speaker A: Yeah, I think that's the thing is there's a lot of money in getting a good one, but it's really expensive to train a big one. So I think the burden of proof to say you should switch your model or you should train a second model just to see if it's as good is a little bit high. But, like, yeah, definitely there's a lot of upside. So there's. If you're a big tech company and you've got money to burn on really good AI, there's probably incentives to do it. [01:25:34] Speaker B: All right, Kim, I won't keep you too much longer. I have basically two more questions to ask you. One is, is there something that is, like you are currently struggling with just beyond your reach? Is there something that that is gnawing at you? [01:25:51] Speaker A: I mean, definitely, if we. The relationship between predictive models and structured models, that's gnawing at me. That's the thing. I feel like there's opportunities there. So I'm kind of like, yeah, I guess something that really annoys you and something that's, like, good research motivation has a lot of overlap, but that feels like there should be something there, and I find that really exciting. Yeah, I think that's probably been the major thing that is gnawing at me at the interface of neuroscience and AI. [01:26:28] Speaker B: I mentioned your work on turbulence a few times, and it struck me, actually. So I just had someone on who thinks of the brain as a system of cascade turbulence. And so he, like, uses multifractality and principles from turbulence to think of our cognition. And I thought, well, here's a stupid question I could ask, Kim, did your work on the learn simulators for turbulence? Did that shape at all how you think about brain processing? [01:27:02] Speaker A: Yeah, I'm so glad you asked. That also is thing that is gnawing at me. So, one is this idea that we should be able to use the same kind of models that we train to capture physics systems, should also be able to capture biological systems, too, and neural data. And where that's a good model and where that can show us stuff is also just something that's been gnawing at me. And it's the same kind of question. It's like, if we build a predictive model with this predictive with this kind of structure built into it, when is that structure the right fit for different kinds of problems in neuroscience and also in biology? I think one structure that's built into the graphnets and the convolution nets that we used for fluid modeling is that they have this repetition just the way convolutional neural networks, what they do is they learn about a little patch of an image and then they repeat that pattern, that filter, for extracting features everywhere in the image. And if, for instance, you learn something that's kind of an eyeball detector, then it'll go through your image, and wherever there's something that looks kind of eye like, it'll pass that on to the next layer of processing. Graph neural networks do the same kind of thing, but instead of looking at patches of an image, they look at patches of a network, like a little neighborhood, and then extract rules about, like, you know, maybe if there's a bunch of particles that are crashing into each other in that window, they learn that there's going to be a collision kinds of feature extractions like that. And this seems like a really cool model for fitting data in the brain. If you want to say, if you want to try to fit a model of how the brain's connective structure is giving rise to different computations, or how the structure of different brain areas relates to different dynamics, this seems like it could be a useful model to apply. The main kind of risk of it is that the way that they share and repeat their pattern, understanding when that isn't or is a good fit for neural data is kind of complicated because the brain isn't actually constrained to literally repeat itself at every point. However, a model that does repeat itself can be a useful way for saying, we're going to take the same kind of model and apply it to different, different motifs across the brain and say, like, and try to capture within the same model different circuits that happen to be arising in different ways. Basically, say we're going to take a model that reasons about local graph structures in general, and then says, which one applies here or can reason about what would happen if you, like, set up the graph structure in a different way because it's learning general principles about the local structure, rather than letting every part of the brain learn its own eccentric structure. This idea of learning a general model of connectivity and then trying to apply it everywhere and make the data tell you how it needs to differentiate itself in order for the model to make the right predictions seems like it could be really powerful. I don't know if that made complete sense, but that is something. [01:30:10] Speaker B: Well, I was thinking that's the, like, cortical columns fit that, right? [01:30:14] Speaker A: Yeah, yeah, exactly. Cortical columns are. Yeah. And I think one of the things these models do really nicely is you can train them on a small system and then generalize to a large system, so you can train the models on, like, a small window of fluid interactions and then generalize it to a much larger one, because you're just learning these little local operations, and you can arrange them into different global patterns without breaking anything, without taking the model out of distribution. [01:30:37] Speaker B: So it's scale free in that respect. [01:30:39] Speaker A: Exactly. And that's like this ability to train on something simple but still work on something complicated or like scale to something complicated is like the hypothesized property of these cortical columns, that you find a structure that a little of them is good and more of them is better. It's the scaling property that people hope transformers have or shown that transformers have. So I think that would be a really cool thing to capture. Yeah. If any of your listeners want to talk about that. [01:31:09] Speaker B: Yeah, that was inspiring. That's a great place to end it. Ken, thank you so much for taking time out of your busy schedule. It's great to hear that you're having fun doing what you're doing, and so I wish you continued fun and keep producing great work. So thank you. [01:31:23] Speaker A: Thank you. This was really fun. [01:31:30] Speaker B: Brain inspired is powered by the transmitter, an online publication that aims to deliver useful information, insights, and tools to build bridges across neuroscience and advanced research. Visit the transmitter brain inspired.org to explore the latest neuroscience news and perspectives written by journalists and scientists. If you value brain inspired, support it through Patreon to access full length episodes, join our discord community and even influence who I invite to the podcast. Go to Braininspired Co to learn more. You're hearing music by the new year. Find [email protected] dot thank you for your support. See you next time. [01:32:04] Speaker A: The star of a bandaid. [01:32:12] Speaker B: Led me into the snow the covers up the. [01:32:21] Speaker A: Path that take me where I go.

Other Episodes

Episode 0

May 09, 2023 01:27:12
Episode Cover

BI 166 Nick Enfield: Language vs. Reality

Support the show to get full episodes and join the Discord community. Check out my free video series about what's missing in AI and...

Listen

Episode 0

January 02, 2023 01:20:59
Episode Cover

BI 157 Sarah Robins: Philosophy of Memory

Support the show to get full episodes and join the Discord community. Check out my free video series about what's missing in AI and...

Listen

Episode 0

February 08, 2021 01:23:57
Episode Cover

BI 097 Omri Barak and David Sussillo: Dynamics and Structure

Omri, David and I discuss using recurrent neural network models (RNNs) to understand brains and brain function. Omri and David both use dynamical systems...

Listen