Episode Transcript
[00:00:04] Speaker A: The type of things that you want, in an algorithmic way of thinking, is to look for invariants in the data, right? What are the things that don't change? And this is a very powerful invariant, right. And so if that's true, then, I mean, this should be the thing that everyone should be modeling, because this is the core thing that any algorithm should capture.
[00:00:25] Speaker B: We might not know what dopamine does and what dopamine is, but we almost definitely know that dopamine is not pleasure. So one takeaway, dopamine is not the pleasure signal. And maximizing your dopamine would not, is not necessarily a good thing.
There are, like, specific predictions that, predictions where TBRL fails to make predictions about what happened.
[00:00:51] Speaker C: But why would that be? Why would that be controversy as opposed to progress?
[00:00:56] Speaker A: It's a very good question. So why is that controversial?
[00:01:01] Speaker C: This is brain inspired, powered by the transmitter. Hello good people. I'm Paul. Vijay Nambuduri runs the nom lab at the University of California, San Francisco, and Ali Mohebi is an assistant professor at the University of Wisconsin Madison. Ali has been on the podcast before, a few times, and he is interested in how neuromodulators like dopamine affect our cognition. And it was Ali who pointed me to Vijay because of some recent work that Vijay has done reassessing how dopamine might function differently than what has become the classic story of dopamine's function as it pertains to learning. The classic story is roughly that dopamine is related to reward prediction errors. That is, dopamine is modulated when you expect a reward and don't get it, and or when you don't expect a reward but do get it. VGA calls this a prospective account of dopamine function, since it requires an animal to look into the future to expect a reward. Vijay has shown, however, that a retrospective account of dopamine might better explain lots of known behavioral data. This retrospective account links dopamine to how we understand causes and effects in our ongoing behaviors. So in this episode, Vijay gives us a deep history lesson about dopamine, also his newer story, and why it has caused a bit of controversy and how all of this came to be. Coincidentally, I happen to be looking at the transmitter the other day after I recorded this episode, actually, and lo and behold, there is an article titled reconstructing dopamine's link to reward. Vijay is featured in the article among a handful of other thoughtful researchers who share their work and ideas about this very topic. So go check that article out for more views on how the field is reconsidering how dopamine works. So I link to that and all of Vijay and Ali's information as well in the show notes at Braininspired co podcast 194 okay, hope you enjoy our conversation.
So Vijay and Ali, thanks for coming on. So I've had Ali on before and Ali pointed me, Vijay, to your work and said that you'd be an interesting person to talk to. And he pointed me to a specific paper and so I thought it'd be fun to have Ali on as well because he's a dopamine expert. You're a dopamine expert. I don't know anything about dopamine, so I thought it'd be fun to have kind of a conversation here, having said that. So welcome and thanks for being on the podcast, first of all.
[00:03:40] Speaker A: Of course, thank you.
[00:03:41] Speaker C: Having said that, I put the word out to my Patreon supporters in my discord community last night, so it was kind of late, there wasn't much time, and all I said was I was going to have a couple dopamine experts on and to send me any dopamine questions you might have. And the question I got reads like this. I'd really like to hear a view on how the causal association story contrasts with the reinforcement learning story.
[00:04:06] Speaker A: Wow.
[00:04:06] Speaker C: I understand it's been quite controversial in the literature and so it would be great to hear from experts. And then he sent me a link to your paper.
[00:04:13] Speaker A: I see. Wow.
[00:04:16] Speaker B: There you go.
[00:04:16] Speaker A: BJ, that is a very well informed Patreon community.
[00:04:21] Speaker C: They're on top of it. Yeah, they're on top of it.
So I do want to talk about that. But I also want to talk about dopamine, the dopamine story writ large and just historically and how it's evolved. Yep, absolutely. So let me just, I'll start off by reading. I did a little chat GPT research here. I'm going to read ten functions, ten theories about the function of dopamine, and I'm not going to describe them.
Reward prediction, error incentive salience, motivational drive learning and habit formation, attention and cognitive flexibility, motor control theory, aversive learning and punishment prediction, effort based decision making, dopamine as a generalized neuromodulator, and finally, social and emotional regulation. That's too many functions, isn't it?
[00:05:13] Speaker A: Yeah, absolutely.
[00:05:16] Speaker C: But it's telling that the first one listed was maybe the first real. Oh, I don't know. Well, maybe you can correct me on this. Real success in the dopamine literature. Was that temporal difference learning reward, prediction error. And that's kind of the classic story of dopamine, right?
[00:05:34] Speaker A: Yeah, absolutely. 100%.
[00:05:35] Speaker C: What is that? What is that story? If one of you could just.
[00:05:38] Speaker B: I would say that may interject. I would say that that's not where it's started, right?
[00:05:42] Speaker A: That is correct.
[00:05:44] Speaker B: I mean, what happens if you. So, I think, I mean, we always learn function from malfunction, right? So with dopamine, what happens if you deplete a person, a patient, from dopamine? That's parkinsonian case, right. So I think the motor control aspect is where it all started. Right.
[00:06:05] Speaker C: So I should back up and say, so the big success story in my little area, the thing that this podcast focuses on and how neuron and neuroscience and AI can coexist and inform each other was the temporal difference learning.
[00:06:20] Speaker B: And then, like, I think early in the eighties, when people started doing electrophysiology, sending down electrode and monkey brains and looking at dopamine cells, they were expecting to see, like, motor related signals. So early Schultz studies were all trying to look for motor related signals, and then they would see a monkey sitting there, reaching out to grab a piece of apple. There's some dopamine activity when I. They get there. But surprisingly, it was always after the movements. Right. So if a signal is related to movement, why is it happening after that? Right. And then, like, in time, these stories of TDRL were developed that. I'll let Vijay talk more about the history of that, but I just wanted to say that that is not where Dokamin's story started.
[00:07:07] Speaker C: Okay. Yeah, no, I appreciate that. And, Vijay, like, you don't have to go into, like, technical detail or anything, but just broad strokes.
[00:07:13] Speaker A: Yeah, I mean, I think, actually, one more thing to add to Ali's point about the history is that, like, even pre dopamine becoming TDRL, in addition to the movement stuff, there's also the idea that dopamine's related to pleasure. Right. Like, it's a rewarding molecule.
And so that was actually maybe even one of the OG theories before. I mean, actually, in popular press, like, lots of people that I talked to, it is still that theory. Right.
And so, I mean, the great thing about dopamine, I think, as a case study, in terms of how science progresses, is that.
[00:07:50] Speaker C: Talk about this.
[00:07:50] Speaker A: Yeah. There's been sort of multiple ideas of what dopamine does right from the get go. Right, right. From when it started. There's always been multiple views. And I think that's a sign of a healthy scientific discussion. Right. That there are different things going on.
I do agree that I think for right now, I think definitely the most well established view is the temporal difference reinforcement learning view. And the discovery by Schultz was seminal. That kick started in some sense. This whole field of at least what dopamine function is as it relates to very rapid dopamine activity. There's this parallel literature, as thou was developing, where people were looking at long timescale disruptions of dopamine signaling a la Parkinson's disease is like the longest time scale one. But then also pharmacological inactivation experiments where that story is slightly different from the standard story of the learning story. Right.
[00:08:52] Speaker C: But can they coexist? So you start off by saying something to the effect of there are many different views, and then you just started talking about the timescale and, oh, I mean, maybe they're all correct, right? Like, does dopamine need to have. Oh, no. Here's what you start off by saying is like, it's a sign of healthy science, right? But. And yet I'm not sure how you feel about your dopamine theory, of course. But then people who have their own dopamine theories, they need to be staunch supporters of their own theory, and everyone else is wrong. Right?
[00:09:23] Speaker A: Yeah, yeah, yeah. I don't think that that's the right way to look at things. I mean, no, of course. I mean, there's one line that I think in one of the reviews that, you know, it's a beautiful thing, even for dopamine. It is too much to be doing all these things. Right.
I.
So that's, I think, a great line. Right? I mean, I think in many ways, some of these theories are complementary, but in many ways they're also not complementary. Right. So it can't be that they're all simultaneously correct. It can't be that, like, the single molecule does everything, and even if it's involved in everything, maybe the way that it's involved in everything is probably not exactly the same sort of algorithmic function and stuff, but at broad strokes level, I think it is a sign of a healthy debate that there are different views, but the next step that we want to is really figure out exactly what the things are that are not possible to coexist in these different views. And what are the things that are possible to coexist?
[00:10:24] Speaker C: Yeah, just to contrast it with a brain area. So I think it's great that it has a neutral name, dopamine. It doesn't say pleasure amine or surprise amine. Right. Whereas in, like, the cortex, for example, there's a brain area named motor cortex, and that means it does motor activity. Right. And that's not all it does, but because we've named it that, it's really hard to think of it in any other way.
[00:10:52] Speaker A: Yeah, absolutely.
[00:10:53] Speaker C: Dopamine is neutral.
[00:10:54] Speaker A: Yeah. Yeah. No, I mean, I think that is good. There's a funny one along those lines. We've named our field neuroscience. Right? It's named after one cell type in the brain.
[00:11:06] Speaker C: Right. Do we need to switch it to brain science?
[00:11:08] Speaker A: Brain science, right. That's probably the more welcoming view of all the different cell types. Right.
[00:11:14] Speaker C: Well, maybe we can do brain science, and the rest of the world can do neuroscience the rest of the field.
[00:11:18] Speaker B: And then brain and behavior. Brain is nothing without behavior.
[00:11:22] Speaker C: Oh, my gosh. All right.
[00:11:23] Speaker A: All right.
[00:11:23] Speaker C: And then we have to go into society and the universe.
[00:11:26] Speaker A: Yeah. We're going right to all the key controversies.
[00:11:31] Speaker C: All right, well, let's focus.
[00:11:33] Speaker A: We'll go back to the temporal difference reinforcement learning stuff. So, yeah. To back up in the history. So, TD was really, like. I mean, I do believe that it was really, like, one of the big successes of this field. Right. Cause it is one where, like, we had, at least in neuroscience up until then. I mean, there were aspects of this going on in other fields, but, like, TD was one of the poster childs for where we have these computational theories that were developed into different fields that have clear algorithmic function and can be very effective, and then you find that something like that actually exists in the brain. I mean, that's exactly what every theorist's dream is.
It is essentially what kickstarted this whole way of thinking, of thinking about things in an algorithmic way.
Easily the most successful theory in that sense.
And so, yeah, so before we get to, like, what the current state of the field is and whatnot, just to explain what TD is. The core idea actually predates TD. Right. The core idea comes from psychology, is a question of how do animals learn? What do they actually learn about? Is there some sort of algorithmic principle whereby they learn associations in their world? Right.
Associations are probably too many things. You could learn that one cue, one word is associated with something else, so on, so forth. So it's a very general set of things. But then people, just because it's in animal models, it's actually very difficult to assay things without any sort of behavioral output. You typically only study associations related to rewards or punishments because you can actually get these very defined motor outputs that you can clearly tell that the animal actually is responding and sensing something. First of all, and then responding to something else based on the fact that they predict something else.
So then the set of associations that we study in general is limited in that sense, just because of the way that we assay them in animals.
Within that field, the biggest success is the Roscorla Wagner theory in 1972. The idea being that up until then, the core idea was that the way that animals learn and people learn associations is when two things happen relatively close together in time.
Then you learn that those two things are associated.
And then people pre Raskar la Rotten found some really interesting examples where that's not true, where you can have stimuli that are very close together in time with a reward or a punishment, but you still don't learn that that is actually associated with a thing.
And then the core insight there from Raspberla Wagner was that actually a simple way to describe all these different phenomena is that you learn based on error. And that was the first big revolution in this field in the question of how animals are.
The core idea of that is very simple. You just have some prediction of what will happen. You have some prediction of rewards, whether they'll happen, or punishments, whether they'll happen. And then you actually see what happens, and then you look for the difference. And then if there's a difference, that means that your prediction was wrong. You could use that difference to actually improve your prediction. The next time you see the same thing.
[00:15:03] Speaker C: The way that you're saying it, it sounds like you're aware of all this, but it might be more accurate to say that circuits in your brain take care of this for you, and you don't have to be aware of predicting it. It's just happening in the brain as an algorithm, right?
[00:15:15] Speaker A: Yes, that is 100% true. Right? I mean, how much of this is actually conscious versus not?
It's unclear. Most of it is probably unconscious. Right. I think the way that we all talk about our favorite theories, we talk about, like, intuitions, right? And the way the best way to tap into intuitions is to think about conscious things that we sort of at least have reason through.
But, yes, I mean, all of this could be completely subconscious.
And so the idea was that, like, error is like prediction error, specifically, and specifically reward. Prediction error, as far as rewards go, is the key quantity that allows you to learn better predictions of whether things will actually be followed by rewards. And so that was all done in a trial based way, if you will. The idea was that every time you have a trial, you get, let's say, one queue predicting associated with a reward and then you don't think about time in any other sense. It's just basically on this trial, you had a cue and you had a reward, or you didn't have a reward, and then collectively, across a bunch of trials, rather, do you learn to associate the things or not? But obviously, that is not how animals live. Animals don't magically know that. Right now, this is a trial. Right now this is a trial kind of thing that was obviously a limiting factor and temporal difference. Reinforcement learning actually started to solve that right. For the first time, where it actually started incorporating time within a trial. And so that was one of the key advances in TDRL when it comes to the neuroscience aspect of things. I mean, there's a whole bunch of TDRL stuff related to computer science that I'm not going to get into.
[00:16:56] Speaker C: Right.
But, yeah, I was going to say that it also drove reinforcement learning algorithms in AI.
[00:17:05] Speaker A: Yes, absolutely. Still is.
[00:17:07] Speaker C: Still does.
[00:17:08] Speaker A: Still does, yeah.
And so the core thing, as far as neuroscience and animal learning goes, is that TDRL allows you to have some way that you can define the progress of time within a trial. So the idea is you could have, for instance, on a single trial, you have one queue followed by another queue followed by reward. And now this allows you to actually keep track of the sequence of things within a single trial, which is actually kind of hard to do with the West Carla Wagner type of theory because that was all just at the same moment. There are a bunch of things happening versus not happening. So it clumped the entire trial into a single thing where the sequential effects was hard to actually model within the Westboro Wagner framework.
The key advance in TDRL is that this allowed you to actually form these actually fairly good time resolution predictions with good time resolution.
[00:18:10] Speaker C: And it's, I mean, correct me if I'm wrong, but it's attractive because it's a fairly simple idea.
[00:18:17] Speaker A: Exactly.
[00:18:18] Speaker C: And so that's another reason Occam's razor sort of approach. It's appealing in its simplicity.
[00:18:24] Speaker A: Exactly. Yeah. I mean, it's super elegant. Right. I mean, the core idea of TDRL is extremely elegant, the basics of it. Right. And so, I mean, as far as a didactic version of TDRL goes, the one that I teach students, it's a beautiful theory because it just has the core elements that you need to actually explain the phenomena that you want to explain.
[00:18:51] Speaker C: I didn't realize that you were teaching students the history of this stuff, too, because then you have a real left turn probably midway through the semester with your own work.
[00:19:01] Speaker A: Well, actually, I don't teach students yet my own work, because I actually do just one lecture on this stuff for the core systems neuroscience class here for the grad students. And so, like, on that resolution, I feel like they need to know TD and Wes, Carla Wagner, way more than they need to know my work.
[00:19:22] Speaker C: I see.
[00:19:23] Speaker A: Because the context of that is way more important. Right. I mean, to get the cause. That is the way of thinking. And I'm trying to get. For students, I'm trying to get people who have not thought about anything related to this, like, from all backgrounds in neuroscience, to actually start at least thinking about the problems in neuroscience in this.
[00:19:39] Speaker C: Sort of way, in sort of an algorithmic way.
[00:19:41] Speaker A: In an algorithmic way. And so, for that, I think our sort of work comes later. Okay. And so it's not quite there yet.
[00:19:49] Speaker C: It comes on the brain.
[00:19:50] Speaker B: Having said that, I will be teaching Vijay's work next semester. I'll do that. You don't have to teach.
[00:19:56] Speaker C: So, yeah, I will be talking. Are you teaching a full course, then?
[00:20:00] Speaker B: Yes, it's a full course on neurobiology of learning and decision making. So causal association and dopamine.
[00:20:06] Speaker C: And where will Vijay's work come in? Have you got the syllabus ready yet?
[00:20:12] Speaker B: Yeah, I mean, I have, like, 85% ready. Okay, I'll send it to you. Yeah.
[00:20:17] Speaker C: I realize we're burying the lead, but that's okay.
We're building attention.
Well, because. So you just very nicely told that historical temporal difference learning story and. And why it's attractive and how it worked and how, you know, ushered in this kind of algorithmic thinking.
But then. So then why are there so many other theories? Why are there so many various theories that all have some supporting evidence, etcetera?
[00:20:53] Speaker A: It's a good question. So, actually, before I get into, like, our own stuff, I want to sort of still stick with a historical perspective. So while the TTRL stuff was becoming very successful, right, like, where Schulz's seminal work, like, you start recording dopamine neurons or putative dopamine neurons, in Julz's case, you find that the activity of these neurons in a very fast temporal resolution actually seems to evolve very nicely with this prediction error sort of idea.
And that's very powerful. So that stuff was going in parallel.
You also had all this data on dopamine pharmacological inactivation type things, where you actually inactivate dopamine on longer time scales, and then you look for whether dopamine is important for learning, whereas dopamine is important for other things.
In general, the story there was much more complicated.
The story from the TDRL side was very simple. Now, dopamine is really about learning.
We found the magical signal that is useful for learning prediction error. So that story was become, it's elegant and very nice and narrow.
And that's what you want, like, powerful theories to be. Right? Like simple and broad in scope, in terms of explanatory power.
[00:22:11] Speaker C: But at the same time, you do an experiment.
[00:22:13] Speaker A: Yeah.
And people. So, like, one of the things, one of the mysteries in the field that is still not resolved and that's not even not solved, I guess, including in our own work, is the idea that, like, there are forms of learning that people find that dopamine is not important for as best as we can tell. Right.
With, like, pharmacological studies. So the classic example of this one is sign tracking versus goal tracking. Right. And you can show. Are you aware of this?
[00:22:49] Speaker C: No. What's sign tracking?
[00:22:52] Speaker A: Yeah. So.
So this is all work from rodent studies.
What you find is basically that if you do just a simple pavlovian conditioning. So the idea is you have a rat. Let's say there's a lever. It doesn't have to press the lever.
And then there's, let's say a port where you get the reward. And let's say the lever is the queue. So you use the lever as the queue. So the lever just comes out before the reward is available.
You don't have to press the lever at all, and then reward comes later.
So when you take normal rats and make them do this task, you find that, broadly speaking, there are two classes of animals.
Both class of animals learn the association between the lever and the reward.
But one class of animals actually, once they learn that the lever predicts the reward, the moment the lever comes out, they'll go out and hang out by the reward port.
So those are called goal tracking animals because they know that the rewards are about to come, and they're right there by the reward to actually collect it.
[00:24:06] Speaker C: So by sign, you kind of mean q. It's a synonym for q.
[00:24:10] Speaker A: Yes, exactly. And so the alternative is the sign trackers, those animals actually, when the lever comes out, they go to the lever and they start messing with the lever, press the lever, bite the lever to the lever, all sorts of things. They don't go by the reward, and then when the reward actually gets delivered, they'll walk over and collect the reward.
So it's this weird dichotomy, but both of them are learning. They're both learning pavlovian associations in that the queue predicts the reward.
[00:24:41] Speaker C: They both know that, except the sign tracking animals are actually thinking they're learning some sort of operant conditioning, I suppose, because that they don't have. They're not passive.
[00:24:52] Speaker A: Yeah, they're not passive, but it's not clear that they think that they need to press the lever. Sure. Yeah, but, yeah, either way. And then some of the behaviors that they produce is not just like, you know, typically in the operand lever pressing stats, they just go and press the lever. Right. And they don't do anything else. But here the sign trackers are different because they actually go into the lever. They do even, like, consummatory type things as if the lever itself is a food or like the lever itself is a reward.
It's a weird thing.
And what you find is that dopamine is actually not needed for that type of behavior, for the sign tracking. Dopamine is actually. So you cannot get goal tracking, but sign tracking seems to be independent.
[00:25:41] Speaker C: So wait, so you deplete dopamine and then all of a sudden all you have are side trackers? You do not deplete dopamine and then you have some side trackers and some goal trackers.
[00:25:51] Speaker A: Yes, exactly. Yeah.
And so that's weird. Right? And so then, so the idea was basically that you have this whole type of behavior, behavioral learning, where animals actually don't seem to need dopamine.
And so that was sort of a parallel literature, essentially, in that, like, what exactly is dopamine useful for? Wasn't actually clear.
[00:26:18] Speaker C: But they're learning something.
[00:26:20] Speaker A: They're learning something. Exactly. You know, all of this was sort of fodder for the other theory that doping is actually important not for learning per se, but for ascribing value to Cuesdae. Right. Like incentive motivation.
[00:26:39] Speaker C: Yeah, that goes hand in hand, I was going to say, with motivation.
[00:26:41] Speaker A: With motivation, exactly.
[00:26:43] Speaker C: Goals.
[00:26:43] Speaker A: Yeah, yeah. Which makes me think that it's the other way. So now I'm black.
[00:26:50] Speaker B: It's a sign tracking, perhaps. Right. So I would say that it's a sign tracking. You're right.
[00:26:54] Speaker A: Yeah, yeah.
[00:26:54] Speaker B: I mean, if we are to recreate the theory, I mean, that makes total sense for dopamine to be needed for sign tracking.
[00:27:02] Speaker A: Yes.
[00:27:03] Speaker B: With the incentive value idea that you are attributing value to this cue. Right?
[00:27:10] Speaker A: Absolutely.
[00:27:10] Speaker B: As Kent Berridge would say that it's like a magnet. So, like, approach behavior is an important thing in motivation research, at least in rodents or animal work, that, like, when you're motivated, you are more likely to approach a reward or like a rewarding cue or something. And that becomes important in like drug addiction research as well. So it's not the reward itself that you're attributing values to, it's the sign that it's the cue that you get attracted to. And that's why when you're re experiencing an environment where you had a hi hat, some drug related incident happened, you will recreate all these.
[00:28:00] Speaker C: So you chew on the crack pipe instead of loading you.
[00:28:03] Speaker B: You may do that because there's value in that queue itself, not the high. And intrinsically dopamine is not related. And it makes sense what Vijay mentioned earlier, that dopamine is not the pleasure signal. People still, if you go ask in the street, people who say dopamine is a pleasure signal, oh hell.
[00:28:21] Speaker C: If you ask any, like a popular neuroscience communicator, they would say that, you.
[00:28:25] Speaker B: Know, yeah, they would say that cold shower also has some effects on dopamine.
[00:28:32] Speaker C: But yeah, you mentioned, you mentioned barrage. There is barrage, the like the liking, liking, wanting, right?
[00:28:40] Speaker B: So yeah, that is exactly the idea. So liking would be the pleasure, right? I mean, when you like something, it is the pleasure part, but then wanting is the approach, is the motivation part that's more related to dopamine.
[00:28:53] Speaker A: Yes.
[00:28:54] Speaker C: Okay.
[00:28:55] Speaker A: I did mix that up. I've set everything correct except for the fact that, like, dopamine's not important for sign tracking. It's actually crucial for sign tracking, not for goal tracking.
[00:29:04] Speaker C: Oh, okay. Okay.
[00:29:05] Speaker A: Yeah. Which is exactly what I was leading up to, that it is important for incentive salience.
Yeah, I just mixed them up.
[00:29:12] Speaker C: So what is the incentive salience? What were you gonna say about that?
[00:29:14] Speaker A: So the idea about incentive salience is basically, so it was the second hit on your chat, DBT, right?
I think so, yeah. So the idea was for incentive salience exactly what Eli was saying, that that basically dope means not important for learning per se.
It's actually important for ascribing sort of motivational properties to cues.
[00:29:37] Speaker C: Oh, this is related to your work that we'll come to perhaps.
[00:29:41] Speaker A: Perhaps, yeah, they're all related. They all have related components. Right. And so the idea was basically that like the animals that actually go and chew on the lever and stuff, that seems to be dopamine dependent, but the animals that just go and hang out by the reward port where they know exactly what's going on, that seems to be dopamine independent.
So this is the controversy in the fields at the time where you had all the shul stuff developing where it was more and more evidence that rapid timescale doping activity is all related to learning. Crucial for learning. But then here is one type of learning that dopamine, in fact, you would argue in like, a simpler form to understand, doesn't require dopamine or doesn't, apparently.
[00:30:36] Speaker C: And actually you're learning. If you're learning, you're learning the wrong thing.
[00:30:39] Speaker A: Yes.
[00:30:40] Speaker C: Yeah.
[00:30:41] Speaker A: Yeah. So now what is right versus wrong is unclear because, like, it's not operant, right? Because the animals are free to do whatever they want. Okay.
[00:30:48] Speaker C: You're learning the less the thing that is going to decrease your survival and therefore decrease your evolutionary lineage. So one could say if evolution is normative, you could say, that's the wrong thing.
[00:31:02] Speaker A: Yeah.
[00:31:03] Speaker C: Leap there. I know.
[00:31:04] Speaker A: Yeah. Yeah. That could be an interesting discussion.
But actually. So let's say we skip that discussion because there are complex thoughts.
[00:31:13] Speaker C: I just mean. Okay, how about a simpler thing is you're going to get satiated less quickly if you are.
[00:31:20] Speaker A: Basically, if you're hanging out by the.
[00:31:23] Speaker C: Reward, you have the dopamine, you're going to go chew on the lever. That means you're going to get the reward a second later, 500 milliseconds later.
[00:31:31] Speaker A: Yes.
The argument, I think, is that now I need to go back to the data. I think that asbestos. Remember, these animals were not slow at picking out the reward. I mean, they were slightly slower, but not that much slower.
And then the complex thing that I was going to say is that the argument is that if you were to go by the lever and do on the lever and all that, maybe that is exactly what increases your survival and that you know exactly what the cues are that are important.
[00:32:02] Speaker C: But it's more important just to see the lever and make a beeline to the. All right, so anyway, I'm nitpicking.
[00:32:07] Speaker A: Yeah, no, I agree. But I'm just saying that there are different views that people. I've heard people say. Yeah, okay. Yeah. So, okay, cool. So that was a controversy that was happening, you know, while the shul stuff was developing. Right.
[00:32:20] Speaker C: And a lesser known story.
[00:32:23] Speaker A: Yeah, lesser known story in the. I guess like the recording systems neuroscience community, but in the dopamine community.
Yeah. Well known.
[00:32:31] Speaker C: That's the thing.
[00:32:32] Speaker B: And specifically.
[00:32:33] Speaker C: Sorry.
[00:32:33] Speaker B: In the drug addiction. Right. Because big part of, like, NIda is National Institute of Dopamine. Right. So a lot of funding.
So, like, addiction is. Or like, dopamine research has been an important part of addiction research. And I would say that incentive value has been very dominant and influential in that field, but maybe not as much in the learning and reinforcement learning and AI fields.
[00:33:02] Speaker C: Yeah, and I was going to also say, I mean, Vijay, one of the things that you point to is work by people like Randy Galistell. And one of the things that you are deeply familiar with, as is a parent in your work is all of this literature on learning and different modes of learning, which I think, as Randy Gallastol has pointed out to me, is super important and a super rich history to draw from to test these different things that there's so many different ways you can test learning and different facets of learning that will key into whether your story is more or less correct. So that's an important thing that I have missed out on. And so I need to revisit all that literature, but I don't know where to begin because it's so vast.
[00:33:51] Speaker A: Yeah, no, I agree. So, I mean, Randy's stuff, I would say, genuinely is lesser known when it shouldn't be. Right? I mean, everyone should be knowing this. Everyone interested in learning should know all the literature that Randy has talked about. I mean, and collaborators of Randy worked on experimentally. So that literature is like, absolutely crucial. I mean, it forms some of the bedrock in sort of our own thinking of like how, you know, our own work evolved a lot from Randy's sort of core ideas and the listing of problems.
[00:34:23] Speaker C: Okay. Yeah, I mean, one of the, one of the nice things about that, I keep pointing to Randy, but that's just because I had him on the podcast. But about that, that kind of research is, the tenor of it is you say, well, look, we can show that animals behave in this way and it just doesn't work with your algorithm and it doesn't work with the account that you're giving of a circuit level mechanism. So therefore. And so you have to account for those sorts of behavioral findings.
[00:34:46] Speaker A: Yeah, absolutely. And so that has been a tricky one. Right. I mean, so explaining. So just to back up, I mean, in case, like, listeners aren't familiar. So Randy has, and this sort of can lead into sort of our own work. Randy has shown experimentally, not Randy, but like Peter Balsam, others, John Gibbon, etcetera. And then Randy has sort of summarized them in a very influential paper. And then he's sort of talked about this for a while now and has published a lot of papers. And so they showed that on the one hand, there's this competing sort of idea, right? Like there's the temporal contiguity idea that if Q and reward are separated from each other, you learn less about them.
And then there's a different literature that shows that if you space things out more, then you learn more from each experience. Just at a qualitative level, how these two things interact was not clear at all. Some of the old literature actually suggests some extremely interesting sort of relationships between both of those aspects of learning.
So what they found was that, so actually to back up. So in TDRL contiguity, like temporal contiguity, even though, like, we say that it moved beyond temporal continuity by saying that error, error prediction and prediction errors are a thing that drive learning, right. Temporal continuity is still a factor in the learning, in that, like, if you separate queue and reward more, there's more temporal discounting, and so you actually end up learning less, it's harder to learn. Right.
And so temporal content is still a factor. But the interesting thing in the other literature, vast literature, as you're saying, shows that if you actually increase the Q reward delay, and I'm saying Q reward, but a lot of the stuff was actually done with punishments and stuff. But, like, I'm just going to stick with Q reward just for simplicity.
So if you increase a Q reward delay, it's not clear, it's not actually apparent in the data that it always increases the time to learn or the number of trials to learn.
You can have the Q reward delay be long, but if you increase the intertrial interval also in a proportional manner, you find that the number of trials it actually takes to learn is actually conserved. It's invariant.
Yeah. Very counterintuitive and very profound. Right? I mean, if you like, the type of things that you want in an algorithmic way of thinking, is to look for invariants in the data, right? What are the things that don't change? And this is a very powerful invariant, right? And so if that's true, then, I mean, this should be the thing that everyone should be modeling, because this is the core thing that any algorithm should capture. Right.
But it turns out that, I mean, you know, like a lot of the original work that summarized this was a meta analysis in the, in the early eighties, but from work in the seventies, and then Randy's paper popularized it in early two thousands. But it's largely been ignored in the neuroscience community. Right?
[00:38:00] Speaker C: I mean, very few neuroscientists, it's terribly inconvenient.
[00:38:04] Speaker A: Yes, exactly. It is, it is terribly inconvenient, and especially to the, to the dominant model, like the TDRL model, it is inconvenient. And so I think that's sort of the reason why maybe it's been ignored. And that, I think, is a fatal flaw. And, I mean, that's for sure. I can say pretty confidently that we should rectify that as a field. Right. And so the difficulty is that in a standard TDRL sort of view of how learning works, these sort of invariants don't naturally come about.
And so you can maybe bake that in to the system, into the algorithm. But that's not like it's a natural thing. It's not a natural thing. You're just adding in that thing as a constraint. And so that's not really satisfactory. And people have taken cracks at it and found that there are ways to take the error based idea and try to capture this, but not in an easy way that captures all the other things with dopamine. So this is, I think, somewhat unsolved problem on the TDRL side of things, right?
I say somewhat unsolved in that there's been some attempts, but it's largely not been shown that you can actually capture the standard dopamine things while also capturing this in the same framework.
[00:39:16] Speaker C: You're telling the story as if this is the history of your thinking about it almost.
And then this led you to the conclusion that you needed something new, something else, something to account for. Is that how, what I want to know is where your idea, which is a fairly simple inversion, Tdlr, like, how did, how you came up with it, what led you. Maybe you could just really briefly say what the idea is, and then we can unpack it more, because I want to know how you came, how you came up with the idea, because it's so simple, man.
[00:39:48] Speaker A: Yeah, it is a very simple idea. Right. The core essence of it is extremely simple. So to explain what the idea is. So the core idea of TDRL is that the way that you learn predictions is by learning through prediction errors, right? So you actually have, you make a prediction, and then you look at what actually happened, and there's an error, and then use that error to actually update the prediction.
[00:40:09] Speaker C: So there's a bell, and then usually you get a cookie after the bell, and then sometimes there's a bell, and you don't get a cookie, and you learn, you know, then you have a different signal. The first time you get a cookie after the bell, you weren't expecting it, and the dopamine says, whoa, there's. I was not predicting that. And there's a big error because you get the reward of.
[00:40:27] Speaker A: Exactly.
[00:40:27] Speaker C: And so these signals are modulated based on your predictions and the error that is generated.
[00:40:33] Speaker A: Exactly. And so the critical thing here is that everything that you need in this algorithm is forward looking, if you will, in that, like what you're, you're always looking for future relationships, right?
Exactly what predictions are.
[00:40:46] Speaker C: And you call that prospective.
[00:40:48] Speaker A: Prospective, right, exactly.
And the alternative view, just to state it simply. And then we can go to the history and how this came about. The alternative view is that actually the way that there's a different way that you can learn associations, and it's simply by looking backwards.
So imagine that you got the ice cream or something like the output, the cookie. Then you look back to see what might have been the thing that caused this. And so if you consistently find that something precedes it, then you know that those two things are associated. And it turns out that you can show mathematically simple base rule type things, showing that if you know the backward associations, you know the forward associations too.
You can compute it very easily.
[00:41:31] Speaker C: So here's my guess. Here's my guess is that you were just thinking about Bayes rule and then you thought of the prospective story as part of some conditional probability, and then you're like, I could just reverse this with Bayes rule. Am I right? Am I right?
[00:41:47] Speaker A: Is that how you thought that is? Definitely was one of the steps.
[00:41:50] Speaker C: Ah, one of the steps. I'll take it.
[00:41:52] Speaker A: Yes. So it's very close, I think. So. Now going to like how this evolved.
And actually maybe before going that, like, just to say that, like, you know, why is this so intuitive in some sense, right? Like the version of the story that I give in talks to get people intuition.
It's also one that I tap into consciousness for, just because that's the easiest way for people to actually tap into the intuition. But all of this may be subconscious, but the core intuition is this. Let's imagine that one day you just feel sick, you feel nauseated, and you just like, stomach pain, etcetera, you feel like you've been food poisoned or something. You ate something that you just didn't like, and what do you do?
I will say pretty much everyone looks back in their memory to think about what is the thing that they might have eaten that would have caused this. And let's say that you happen to eat at a new restaurant that day.
I probably bet that you'll probably not go back to that restaurant again for a little while because you think that you will associate the eating at that restaurant with getting sick. Now from that example, there are two things that are clear. One thing is the way that you associated eating at that restaurant with getting sick is backward. It's fundamentally backwards. Right. When you were eating at the restaurant, you weren't thinking, am I going to get sick? Am I going to get sick?
It's just that when you got sick, you were actively looking back in memory to think about where you ate, and then you realize that there's a possibility, possible explanation.
[00:43:27] Speaker C: Right.
You just made me realize, I believe my wife is slowly poisoning me, but go ahead.
[00:43:34] Speaker A: Uh oh.
[00:43:35] Speaker C: I couldn't resist. I'm sorry.
[00:43:36] Speaker A: I'm sorry.
Yeah. And so the other thing that you realize from there is that once you've associated eating at the restaurant with illness, then you can invert that retradiction to a prediction. Right. You intuitively do that. Like the thing that makes you decide not to go there is not, that is not the retrodiction. It's a prediction that if you see the restaurant, you will probably imagine getting sick, and that's why you don't want to go there. Right. So you've intuitively converted retroaction to a prediction. So intuitively it seems like this process is happening, right. That you learn a retrodiction and then you convert that implicitly to a prediction. Question is, how does it happen? And base rules the answer.
[00:44:24] Speaker C: But why has this not been thought of years and years ago or proposed? Or has it in some form?
[00:44:31] Speaker A: Yeah, so this was the thing that, like, I. So once, you know, like this sort of worked out, I'll get into that a bit more. But once I worked up the full theory on the retrodiction to prediction conversion, I was just puzzled by exactly this thing. I mean, it seems such a simple idea. Why is it this not being thought about? And so the answer is, after working this out, like, and I'll tell you, there are some key differences in the way that we worked it out from the prior attempts at it, I did realize that people have thought of similar things. So, in fact, in one of the original things that kickstarted the whole field is common's blocking experiments that preceded Roscarla Wagner and Commons explanation for blocking is something similar to this, that you basically look back, he called it backward scanning.
And the idea is that you look back, defined associations.
[00:45:27] Speaker C: What is blocking? Let's just say what blocking is. Sorry, I know, because it's kind of a, it's just a technical term.
[00:45:32] Speaker A: Yeah, absolutely. And I mean, this is something that I sort of alluded to at the start. So this is actually one of the most sort of influential results in the early days in psychology, which actually moved the field from temporal contiguity as the, as the key thing to learning from errors. Right? So, so the idea is this. So imagine that you first teach an animal that one cue predicts a reward. A lot of this was done with punishment, but again, I'll stick with rewards. Cube predicts reward. You've already learned this. And then in the second phase, what you do is while you present this queue, you also present another cue and then give the same reward.
Now, this queue is obviously temporarily contiguous with this reward.
So if it's just temporal contiguity that you're learning, that is the thing that allows you to learn. You should learn the new reward relationship as well.
But it turns out that in general, animals don't, at least they don't show behavioral evidence that they do.
Then why is that? And that's called blocking in that the first queue actually blocked the ability of the second queue to learn, to learn a new cue, to learn a new association for the new.
The idea there was that Russ Carlo, Wagner showed that if this is all driven by error, prediction error, then this works.
If this queue already predicts reward, then the second time this queue is associated with reward, even though there's something else, this has already been predicted. There's no prediction error here. So you don't learn the second queue, but the original explanation that the person who discovered this actually gave was not, that was actually that you're looking back for causes type thing. So it is a retrospective backwards.
[00:47:23] Speaker C: And then that's similar to your idea that, well, so that first queue is predictive. If you're looking back, that first queue is predictive 100% of the time.
[00:47:32] Speaker A: Exactly.
[00:47:33] Speaker C: And then the second queue is just predictive less percentage of the time because it was introduced later.
[00:47:37] Speaker A: It was introduced later, exactly. Yeah. And you don't need to attribute causal significance to that thing. There's always distractors that happen in your world. So you don't need to assign causality to everything that precedes something. Right. You want to know that it's consistently preceded. Yeah, yeah. And so this idea, so that was maybe the original form of the backward retrospective view.
Now, the critical thing that was missing there was this inversion from the retrospective to perspective, right?
In that view, this was proposed as a way of learning associations, but it was not mathematically formalized as a way that you could actually take this retro diction and then convert that to a prediction, which is finally the thing that you want to learn using base, using Bayes or any other form. Right? Sure.
So that was missing in that explanation.
Now, the other time that this retrospective thing was actually in the literature, was actually Randy's work.
So Randy, as we touched upon briefly, has this idea that all of this is just based on a temporal map, or a cognitive map, if you will, of time.
And so if you just store the exact time of when everything happens in memory, and your memory is perfect, let's say, then you can arbitrarily go back and think back about whether things proceed something or things are followed by something. And so then the idea is that retrospective makes as much sense as perspective in that sort of a computation.
[00:49:13] Speaker C: I see.
[00:49:14] Speaker A: And so then you can have. And they could, in fact, be not directly and in the way that he writes it. It's not just a direct base inversion, because it's a slightly different framework of the prospective versus retrospective, but either way. So that was one other place where the retrospective had made its appearance before we published it.
[00:49:35] Speaker C: But did you discover this after you had your ideas? Yes.
[00:49:40] Speaker A: Which is funny, because Randy's work is sort of been like one of the core sort of drivers for me in terms of my at least thinking about the philosophical problems of the learning. And Randy just has published so many papers that I know.
[00:49:55] Speaker C: How could you be expected to. I mean, it takes a career just to follow someone else's career.
[00:50:00] Speaker A: Exactly. Yeah. And so, like, it was just like, yeah, I just didn't know about this. I mean, there's a very specific paper where Randy talks about the retrospective thing, and I just didn't know about it.
[00:50:10] Speaker C: Ollie, has that happened to you frequently, where you have an idea, you start working on it, and then you realize it's been done seven times before in different ways?
[00:50:20] Speaker B: My ideas are sewers.
[00:50:24] Speaker C: Of course. Of course.
[00:50:26] Speaker A: Yeah. So here, too, then, with the retrospective thing, there was not. This prospective conversion of a retroaction to prediction didn't exist in this way either. So that part is new, I think. So the way that we proposed it, I think that was the first time that that came about. Right.
[00:50:42] Speaker C: Because you need that. Because you do need to behave. Moving forward.
[00:50:46] Speaker A: Exactly. Moving forward. Exactly.
And also, just given that we are on the historical sense of this, this idea. So, like, you know, there are two threads to how I realized this. One thread is that, like, when I first learned reinforcement, learning and tdRl, I had this vague feeling that something was missing, and that was just the time component of it. It took me, like, about 13 years to verbalize what I thought was missing. It took a long time. I knew that it was something related to time, but exactly why it was hard, and we can get into it later.
That was one thread where I had some dissatisfaction with the standard view of PTRL.
But the retrospective versus perspective thing was almost independent of that thread in that that came about because like, I'd actually collected some data in my postdoc where I found some weird patterns of responses in orbitofrontal cortex neurons that project a VTA. And that didn't make any sense to me. And the only way that I could sort of post talk rationally, retrospectively rationalize the results was to, was to think about this retrospective framework other time. So then, so the first time that in my published work that I mentioned retrospective is actually in the discussion of that paper.
And because that paper was, it's not like designed to test this idea in any way. So it only makes an appearance in the discussion because it was like a loosely formed idea at the end of that paper to just sort of roughly qualitatively explain the patterns of findings that I found.
And then, so I had that idea that maybe this retrospective thing could work. The problem with, the problem with the way that I was just describing it is that all of this is the rescarla Wagner type equivalent. It's a trial based view in that you're just thinking about does the Q proceed to the ward? We're just looking at whether on a given trial, Q proceeds toward versus not. But the core advance of TD is to go from that view to a time based view where you have time differences. And that actually required additional work to show that the same retrospection of retrodiction to prediction work would actually hold in the standard way that people did TDR out, because.
[00:53:08] Speaker C: Yeah, like the Bayes equation has nothing to do with time.
[00:53:13] Speaker A: With time, exactly. It's just a conditional probability thing. Right. And conditional probability alone does not get you to these long run sums of, like, value type things that you define in TDRL, which we can get into as well. Yeah.
[00:53:26] Speaker C: And it turns out time is fairly important in life.
[00:53:29] Speaker A: Yes, exactly.
And it's tricky to think about how time plays a role in this sort of stuff, like in learning.
And so now that we've maybe sort of built up the intuition for these things, I can start to now poke holes in the standard way that people assume things work in DDRL. And that's sort of, it's when I realized these things that I was able to verbalize what I thought were the problems. The question is, how can you then try to figure out a solution for this? And that's where I made the connection between the retrospective thing and then the problem with the time stuff.
[00:54:06] Speaker C: Nice.
[00:54:09] Speaker A: The view is this standard TDR, all the ideas set in the core didactic way. It's a very simple idea.
You just have a queue and a reward here and you make a prediction here and you make a prediction that value is zero or something at the next moment, let's say. And then you break up time into these small components, time bins. And then this time bin you're predicting that nothing will happen, but then you actually get reward. And then when you get reward, you actually are surprised. So you get a prediction error. You ascribe that prediction error to something that came before it. And then the standard didactic view, you actually ascribe it to the thing that immediately preceded it. And that's, let's say a time step. And so you assign value to that time step and then the next time the state before that gets value from the next state and so on and so forth. And it eventually you go assign value to the queue.
[00:55:06] Speaker C: Right.
That reward prediction error is a dopamine hit.
[00:55:10] Speaker A: Exactly.
[00:55:11] Speaker C: And since just to bring it back to dopamine.
[00:55:13] Speaker A: Exactly. So that's the standard TDRLV. So now if you think about how this works in reality.
Well, let's. So the thing that you're assuming happens here is that you keep track of time from the queue. Right? Now, in the simplest way, that's obviously a simplification, that you keep track of time perfectly bin by bin, and that obviously everyone knows is a simplication. So no surprises there.
But still, the critical assumption is that you are keeping track of time from the queue. Now this raises a problem.
Think of how many queues there are in our environment.
There's so many infinite, there's no way that that's not infinite. And so are you keeping track of time from every single queue that you experience?
And are you doing this in parallel? So like, you know how much time it's been since that queue and then this much time since that queue, and all of those things are happening in parallel. So every queue, you need to have a separate clock for how long it's been since that queue has happened.
[00:56:17] Speaker C: You should look up Mark Howard's work. He's been. So he might say that you can do that using a laplacian transform, but no need to revisit that right now.
[00:56:25] Speaker A: Yeah, very familiar with Mark's work. Yeah. And Mark's stuff has relationships to the learning as well, which is a different angle to this whole thing. But I don't think that he solves this particular problem in that like in the standard TD way. Standard TD way, right? We're still talking standard tdem that like each queue, you're assuming that you keep track of time from there, right? And then when you ascribe value for that reward or prediction error, you are ascribing value to that queue in terms of how long it's been since that queue has happened, right? So for each queue you assign it differently. And the other critical assumption is that all of TDRL depends on states as the input to this algorithm, right? I, so basically the idea is that TDRL is an algorithm that operates on some inputs and gives you some output. The output is the value, the input is the state. State of the world.
This, in standard computer science stuff, this all makes very simple sense. And states are something that you know a priori in an engineering sort of setting, you know exactly what you're trying to learn. So it's very easy to define all this. Now, when you're trying to take that architecture and ascribing it to animal behavior, you have to make a bunch of assumptions.
So basically the state inputs that you give to this algorithmic black box, if you will, I mean, it's not a black box, obviously, we know every component of it, but the inputs that go into it are what exactly? What are the state inputs? So essentially what you need is a state input that tells you that for every time, moment or every time step, something tells you that this is the current state of the world.
Because TDRL fundamentally is about temporal differences in that. What that means is there's one time step and another time step, and you're calculating the difference in predictions between these time steps. And to do that, you need to know what state you're, what is the thing that allows you to even make a prediction at this moment? That is the, so you have to.
[00:58:31] Speaker C: Have predefined states, essentially.
[00:58:33] Speaker A: Exactly. And now when you think about that sort of thing with this problem, standard sort of q happened, let's say trace conditioning. So nothing is happening. There's a lot of delay, so a queue happened, there's a lot of time delay where nothing else is happening. No external input is being given to the animal, and then a reward comes.
Then you need to have a state defined for every moment in time.
And how do you define a state for every moment in time? The state that you define for every moment in time is basically that you define the time since the queue. That's the only thing that defines the state of that month, even though that moment is it's just like a delay, period. Any other delay period. There's nothing else externally that tells you what this moment of step time is.
[00:59:15] Speaker C: Yeah.
[00:59:15] Speaker A: Right. And so now you have to, it's.
[00:59:19] Speaker C: Sort of like you have to arbitrarily keep track of nothingness.
[00:59:24] Speaker A: Keep. Keep track of nothingness from every possible previous thing. Right, yeah.
[00:59:28] Speaker C: Going back to that for just a second. Like, I sort of thought about this just right after I said, you know, it's infinite. It is infinite in principle. However, there are salience differences in queues.
[00:59:40] Speaker A: Right, absolutely, yes.
[00:59:42] Speaker C: Like not all objects are as shiny as others.
[00:59:44] Speaker A: Absolutely. Anyway, yeah, absolutely.
[00:59:46] Speaker C: So you can narrow it. If you add an attention component to it, or bottom up, let's say, and or top down, that really narrows the search space.
[00:59:54] Speaker A: The search space down. Yes.
There are still problems with that view, but, yeah, I mean, each of these can be reason foo and then there are some problems that come and some solutions, and then you keep on going.
[01:00:05] Speaker C: Right.
[01:00:06] Speaker A: But this could be a whole day.
[01:00:09] Speaker B: May I just add that the salience also has to be learned. Right, yes.
[01:00:14] Speaker A: Yeah.
[01:00:14] Speaker C: Unless it's like a hawk coming down if you're a mouse. Yeah, something.
[01:00:18] Speaker A: Exactly, yeah, yeah. Like loud thud of a door. Like you don't necessarily have to learn it, but even so. Yeah, I mean, I won't get down that argument just yet.
[01:00:27] Speaker C: Yeah. Sorry. Yeah. Really a side.
[01:00:29] Speaker A: No, it's absolutely the very good point. Right. I mean, these are some of the good points that you need to start to consider as you go down this path. Right. It's just that each of them will have its own set of issues. So just sticking to the main thread. So essentially you are trying to keep track of time from everything. Now, the important thing is for you to ascribe value for this reward to the state that preceded it. The state that preceded it should be a repeatably identifiable thing, at least in the standard view. So if it's a repeatably identifiable thing, then that means that whatever the neural state is in the brain at the time that just preceded that reward should, on the next time that the cube got presented at the same delay, you should get the same state, brain state in the brain. And that seems just hard. Right. Like to get exactly the same type of things repeated reliably, trial after trial, and keeping track of time, that precisely. That seems hard. Now people will say that you can get TD type things to actually work without it. And that's a whole other set of discussions too. But I'm talking about the standard didactic view.
[01:01:44] Speaker C: Yeah, but even like in a natural living condition.
Even if you have like the exact same cue, the exact same stimulus, the context is never ever going to be the same.
[01:01:56] Speaker A: Exactly.
[01:01:56] Speaker C: There would never be perfectly repeatable stuff.
[01:01:59] Speaker A: Exactly. And so, and remember, you also have to keep track of time since every queue, right. Because you're trying to, and like obviously the time, since every queue is not going to be repeatably the same.
[01:02:09] Speaker C: Hard. It's hard.
[01:02:10] Speaker A: It's hard. It's a very hard problem.
And so if you're trying to learn this this way, then it is difficult.
And it turns out that if you are trying to do this in a prospective way, when you get any queue, you're trying to learn what is following that queue. You kind of necessarily have to do this sort of thing, like of keeping track of time step and the passage of time step by time step. Because what you need to do is whenever that thing is happening, you need to give more weight to the queue in that you've ascribed more predictive power to the queue. But when it's not happening, you need to downweight the prediction to the queue. So you kind of, because you just don't know when future things are going to happen, you sort of necessarily have to do this time step by time step for everything, just going for everything, going to infinity essentially.
[01:03:00] Speaker C: And your insight is that if something happens that is awesome, or that you want to remember or that you want to repeat, then it makes more sense to start looking back in time to see what was paired with that. With high correlation.
[01:03:15] Speaker A: Exactly.
[01:03:16] Speaker C: However, then you need to look at an infinite number of things in the past.
[01:03:21] Speaker A: Exactly.
[01:03:22] Speaker C: This one's zero, zero, one.
[01:03:24] Speaker A: Yes. So the problem doesn't entirely go away, different aspects of it. So sort of like the, in some sense, the core insight of our work is that there's maybe two steps to learning.
The first step is to know that something is possibly related to something else.
[01:03:46] Speaker C: Okay. Like a threshold.
[01:03:47] Speaker A: Yeah, exactly. Crosses and threshold. You're just simply trying to make connections. Does this connection exist? If you believe that that connection exists, then you can go in and then try to understand the properties of that connection, like what is a temporal delay there? What is the associated probability of reward, that kind of stuff. The critical thing to first know is just that is whether there's a connection. And if you're trying to do that, then you do have to keep track of the different queues in memory. Now here's where the retrospective view actually allows you to solve the salience problem in some sense, that you do need to keep all these queues in memory.
But you could, like when you're doing these backward sweepst, this is not part of our algorithm, but you could in principle add this. When you're doing these backward sweeps, you could do backward sweeps with varying levels of thresholds for the salience if you wanted. You only look at the most salient things as a thing that you could ascribe to. And if you don't find anything that's related that way, then you bring down the salience a little bit, and then you look back again and you look for search for things that might be proceeding, but in a salience way and a salient dependent way.
[01:04:54] Speaker C: It still sounds hard, but.
[01:04:55] Speaker A: Still sounds hard. But the point is that I think something of that sort must happen. I mean, not necessarily the retrospective, but something off the way of filtering the salience type things and storing things in memory must happen.
And the core thing that we're sort of going, the core advantage of going this way is that one we're using the fact that we have memory and we're storing things in memory, and then that allows you to actually look back for associations. And the key critical thing that this gets you in the base framework is that now all the associative components or the associative learning, where you're learning associations, it's all only triggered when events come, right. Rather than updating your associations time step by time step, every time step, which is a hard, every possible thing and every possible outcome, too. Right? It's not just, we know that. We simplify and talk about just reward learning. We know that animals have different predictions for different types of rewards.
We already also know that animals also learn cue to q relationships. So in reality, this is much more complex. Right. And so for all of those associations, you should do time step by time step updates going in the forward direction.
[01:06:07] Speaker C: I don't think even Ali can do that. I'm not sure. But Ali might be the only person, he's talented. I know that.
[01:06:15] Speaker A: Yes. The only animal on Earth.
[01:06:18] Speaker C: He is an animal. I give him that.
[01:06:25] Speaker A: The advantage going in the backward direction is that you can do this in an event based way.
Now, you don't need to do this every time step by time step, you just need to do it when something very meaningful happens. You just update backwards and take timestamps of, are things preceding this in a.
[01:06:42] Speaker C: Reliable way that's at least a few orders of magnitude easier?
[01:06:45] Speaker A: Exactly. Exactly.
That was the core insights. This doesn't solve the problem, core problem of how do you learn the time delays and stuff? Because I've sort of ignored that part here. So here the core aspect is you're just still looking for things, whether they precede meaningful things. Right. And you, of course, account for the time delays in the memory of what you store.
But knowing exactly this algorithm doesn't tell you, well, this is a number in terms of what is the delay between the q reward association. So that we pushed aside and said, that's a second step to the learning. You actually have to learn that. But it's much easier to learn that if you already know that this particular queue is very important and this particular reward is very important, and that association is actually where you're trying to measure the time deliberate.
[01:07:33] Speaker C: This is where something like replay might come in very handy. If you're just replaying offline over and over and over. That's sort of an auto, an auto learning system, if you're like. Because then you can match, I mean, if you're literally replaying from the event.
[01:07:50] Speaker A: Exactly.
[01:07:50] Speaker C: Makes it a little more feasible.
[01:07:52] Speaker A: Exactly. And replay might be a way that you could get this retrospective type thing to work.
[01:07:57] Speaker C: Right. Well, that's what I meant, is like that reverse replay, I guess, is what you.
[01:08:01] Speaker A: Exactly. Yeah.
Yeah. So anyways, so the core idea is basically just this, that the core advantage is that if you do this in an event triggered way, when you know something meaningful has happened, you look back, then the number of computations that you have to do might be fewer, at least in terms of the associated computation. You still have to keep track of some other things and that you need to do time step by time step. But those things are not associative. Those things specifically are the overall rates of different events. So you still need to keep track of overall rates. How often do queues happen in my life? And this particular cue, how often this reward happened in my life? This particular reward.
[01:08:40] Speaker C: And then you also have to do the inverse so that you can do the prospective.
[01:08:44] Speaker A: Exactly.
[01:08:45] Speaker C: Yeah.
[01:08:45] Speaker A: And that perspective, now the advantage is you don't need to compute the perspective every time step. You can just compute the perspective when you need it.
[01:08:53] Speaker C: Right. When it, when you find the thing that you think is most causally related to the event.
[01:08:58] Speaker A: Exactly. Exactly.
[01:09:00] Speaker C: Please correct me, because I, I'm sure I'm saying lots of things incorrectly.
[01:09:04] Speaker A: No, no, it's all correct. Yeah. And so that's sort of the core insight. And so that's how this sort of view came in. Right. Now, none of this is dopamine apart, right? I mean, so this is all independent of dopamine. If you will. This is algorithms. This is all, and this is even at the algorithmic level, we're not talking about which things you should learn from, et cetera. This is just simply, if you wanted to learn all pairwise relationships between every possible thing in the world, which is probably hard, then you could do it this way.
[01:09:39] Speaker C: Not only could you do it this way, this is a better way to do it. Exactly.
[01:09:42] Speaker A: This is the better way to do it. Just because it keeps track of times.
Again, to give credit to the other side. That's not to say that there's no possible way potentially, whereby you could do it in a prospective way, but it's just that if you do it in the prospective way, you will necessarily have to do this time bending stuff and compute that.
[01:10:01] Speaker C: So let's pause here then, and maybe I'll ask Ali first if you are aware of, or versed in the quote, unquote, controversy of this. Like, did this, in your view, make like a big, was there a big fuss with Vijay's papers? You know, the recent.
[01:10:22] Speaker B: No, I mean, the first time I heard it, I just. No purchase.
[01:10:25] Speaker C: I meant, I meant not from you, but from like the field. Right. You can, you pointed me to it as, you know, cheerleading it. So I know there was no, not a controversy from you, but are you aware of this controversy in the literature or have you not. Yeah, I mean, there's all, I mean, the dopamine, there's never anything with dopamine that there's not controversy in the literature. Right. Yes, but this in particular is a very specifically, what it's saying is, hey, you've all had it backwards, so maybe.
[01:11:00] Speaker A: I want to just add one sentence before we get to that. So one thing that I didn't describe in the algorithmic thing is just what exactly Dop mean, I think is doing.
[01:11:08] Speaker C: And let's, let's go ahead and do.
[01:11:10] Speaker A: That just very quickly, just before the controversy, because then.
[01:11:14] Speaker C: Oh, that. That is the controversy.
[01:11:16] Speaker A: Yeah, exactly.
[01:11:16] Speaker C: Because it's.
[01:11:17] Speaker A: Yeah.
Is that like, in addition to this retrospective thing, like, you also need to do one more thing, which is that this retrospective thing at this level, I just described something where you learn every possible association and maybe you don't want to do every possible thing just as intuitively we just described, and you only want to do this for meaningful outcomes.
The core idea with the dopamine was that there's this additional step to this retrospective thing that filters out and tells you what are the meaningful things.
[01:11:49] Speaker C: The dopamine tells you what's meaningful.
[01:11:51] Speaker A: Exactly.
And so that's the additional step. And the controversy is more related to that part and less so related to the retrospective perspective.
[01:12:03] Speaker C: Okay. But a reward prediction. So an error is inherently meaningful. So what is the controversy? Why is it controversial?
[01:12:14] Speaker A: So there's a lot of, I mean, you know, like, there's a lot of similarities between this sort of idea of, like, what is meaningful and what is our meaning?
[01:12:21] Speaker C: And I'll just interject and also say that we're using human language for these terms.
[01:12:26] Speaker A: Of course.
[01:12:26] Speaker C: Yeah, definitions are slippery and gets ridiculous.
[01:12:30] Speaker A: Exactly. And we're also simplifying it to a big extent. Right. And so, yes, so before addressing the controversy, the quick thing to say is that it just turns out that, I mean, just exactly as you intuited this meaningfulness thing and for prediction or thing just sound somewhat similar. I mean, they, they have, like, a lot. And that's exactly the core, the reason why, you know, like, we decided to look into this, which is that, like, it turns out that when you mathematically formalize this, there's a lot of similarities between this meaningfulness type thing and NRP. And so the idea was, well, if. If you've not done experiments that try to look at the. Where the differences are coming, then you wouldn't have known which one is actually. And then our argument was that maybe all of this evidence for RP might also be consistent with this other thing of the meaningfulness.
[01:13:24] Speaker C: You should do experiments that test that, perhaps.
[01:13:28] Speaker A: Maybe. Right.
Yeah.
[01:13:32] Speaker B: If I may ask, and not a softball question. So what defines the cue in your interpretation? Because, look, we are under barrage of sensory information. I'm an animal walking around. I'm getting like, continuous visual inputs. Auditory. Right. So how do you keep track of those events? I'm like continuously seeing things. A tree.
[01:13:57] Speaker A: Yes.
[01:13:57] Speaker B: I don't know. Right.
[01:13:58] Speaker A: So, absolutely.
[01:14:00] Speaker B: In an experimental setup, it's easy to define that, to have like four set of cues and then try to see which one is meaningful. But in the real world, how would you.
[01:14:09] Speaker A: This is a hard problem. Right. And this is a hard problem that I think basically all these theories sort of just shove away and then say some other smart region takes care of it. And so my version of that is to say that a queue is something that, like, the higher order sensory regions that define what the objects are. Sensory objects. So let's say area it for visual stuff defines what cues are like. Visual cues are those things that are given like, that are identified by those neurons, like objects. Sensory objects, if you will. And there's learning associated with that. And there's like filtering associated with that. And there's like complex operations like you take sensory input and then, you know, Ali turning this way versus Ali turning that way is still Ali. And that's a hard problem to solve. But that problem is typically studied by sensory neuroscientists, and they have come up with reasonable solutions for that. Right. And have shown that, like, there are neurons in the brain that can do that. Yeah.
[01:15:07] Speaker B: The correct answer is that the most important part of the brain, which is the downstream structure.
[01:15:12] Speaker A: No, upstream here. Yeah, upstream structure. It's sensory. Yeah. Sensory structures will take care of it. I mean, if you want the full solution to this whole problem, that's basically saying, well, how does the brain work? I mean, it's essentially impossible to describe it. I think the way that we start to formalize this, we at least start to now get into all the things that we're assuming. Right.
[01:15:38] Speaker B: Yeah, yeah. I'm asking because I'm very interested in that because I think that would relate to attention as well.
[01:15:44] Speaker A: 100%.
[01:15:45] Speaker B: And how, like, the dopamine system is now involved in attention and gating of incoming information.
[01:15:51] Speaker A: Exactly.
[01:15:51] Speaker B: Which I think, I mean, we will not get into that, Paul. But one of the most important things about dopamine is its role in attention.
You actually missed that in your ten command.
[01:16:05] Speaker A: Yeah.
[01:16:10] Speaker C: Well, I also didn't intentionally didn't include prospective or, sorry, retrospective.
Yeah. So. But it'll be on the list.
[01:16:18] Speaker A: Yeah.
[01:16:18] Speaker B: I mean, this is. I'm going on a tangent, but I think, I mean, this is another important thing about, like.
[01:16:23] Speaker C: Yeah, yeah. So let me just interject and say one of the things that I have appreciated about your work also, is that the way that you've approached it, at least. Well, the way that you approach it in the literature, which is super helpful, I think, in terms of thinking about how to tackle a problem in general, is that you list out some of the assumptions of TD learning and then point out how they're wrong. So that's a powerful way to build your own argument and say, like, here are the holes and here's how we can fill those holes.
[01:16:49] Speaker A: Yeah, I mean, you know, I wouldn't use the word wrong per se. I would just say hard, implausible is sort of incomplete or incomplete or. Yeah, exactly. Yeah.
[01:16:59] Speaker C: And good save there, by the way.
[01:17:04] Speaker A: Yeah. So, so anyways, so we get back to the controversy stuff. Ali, you want to take the controversy bit?
[01:17:10] Speaker C: But. Okay, so I was just asking the beginning, is the controversy due to the fact that so many people's careers or their reputations are at stake and they're just feeling not hurt by it, but defensive perhaps, because a lot of controversy begins by powerful people feeling defensive.
[01:17:30] Speaker A: I mean, you know, like, I know some of these powerful people, and I will say that, like, I, I don't think that that's it alone at the very least. I mean, maybe there's a component of it, right. But I, I wouldn't say that that's really the main driver. I think the main driver is that extraordinary claims require extraordinary evidence and the idea that.
[01:17:53] Speaker C: And papers in science.
[01:17:55] Speaker A: Yeah, and papers in science. Yeah, exactly. And so the dopamine story, the TD story is 25 years old at the time that we had our paper published, 1997 to 2022.
And so, like something that has lived on for 25 years with thousands and thousands of papers supporting the idea is.
[01:18:19] Speaker C: Bound to be wrong.
[01:18:23] Speaker A: It's not going to be overturned with one paper.
[01:18:28] Speaker B: But I think what was beautiful about this work, that it's not my work, so I can just say nice things about it, was that it looked at, like you mentioned, Paul looked at predictions of TDRL and some of them were not explained by actual data. Right. That we will get into it and then came up with an idea, this retrospective learning, that could explain both TDRL predictions and places where it would fail. Right. So I think that was the beauty of it. And that is, I think, the main controversy maybe.
[01:19:04] Speaker C: How is that different?
[01:19:05] Speaker B: Specific predictions, predictions that Vijay may go through some of them. I mean, there's like, I don't know, 13 in that paper, 14 something. Huge number. Right. But that predictions where TDRL fails to make predictions about what happened.
[01:19:23] Speaker C: But why would that be controversy as opposed to progress?
[01:19:29] Speaker A: It's a very good question. So why is that controversial? So here we got into the heart of TDRL and what TDRl is, right?
TDRL, I, the way that I describe it, I said there's a box and that's where the box in which the algorithm lives and there are some inputs that go into it and there's some output. Right. Now, which of these is TDRL? Right. Like TDRL is really the box where the computations are happening, but the things that go into it are not TDRL per se.
[01:20:00] Speaker C: Oh, okay.
[01:20:01] Speaker A: Right.
[01:20:01] Speaker C: So now you're doing TDRL.
[01:20:04] Speaker A: Exactly.
[01:20:05] Speaker C: Okay.
[01:20:06] Speaker A: I mean, the argument is if you're doing TDRL, that's not a single thing because the input that goes into the algorithm can be many different things.
[01:20:15] Speaker C: That's your argument.
[01:20:16] Speaker A: No, no, no. It's the controversy argument. It's the alternative argument. So it's the arguments from the TDRL folks as to why this is not, this should not make us toss out.
[01:20:26] Speaker C: TDRL because it doesn't matter what's going into the box, it's what the box is doing.
[01:20:32] Speaker A: Yes. And so because of that, the problem that comes down to it. So if I'm supporting TDRL, right, this is the counterargument that I have. It's like you have convincingly demonstrated that this TDRL box with a specific set of inputs is wrong and those predictions are wrong.
But how do you know that this TDRL box with a different set of inputs is wrong?
And this is the problem, and that's where it becomes almost philosophical, and it's a good thing to get into. So the argument is this, basically that now when it comes to these things that are, the inputs are states, right? I mean, it's a formal term that I use.
The way that you define states is inherently ambiguous when it comes to time delays, because there's no objective thing in the world that tells you exactly what state you're in. So you can define that in many different ways.
Now, people have defined them concretely in many different ways, and we did look at the concrete things that people did define it as and show that none of those concrete things actually fit with the data right now. So I think that from our perspective, we argue, well, at least the published concrete things we've looked at, and those don't fit the data. Now, the counterargument is, well, but that's not to say that, well, you've not done a good job of looking for things that are possibly the different state inputs. You could have had different state inputs where you could have looked for this, and then you could have also assumed different parameters of the TDRL algorithm and how sensitive they are to the TDR. And so that obviously is another aspect. Those are the defined free parameters within the box. There are free parameters. Those are, there are many different free parameters within the box. And there's the undefined things that are outside the box, which are the inputs, which technically those are infinite dimensional because they could be anything, basically.
[01:22:35] Speaker C: Yeah. But my guess is you would say that these are valid arguments.
[01:22:38] Speaker A: Yeah, I think that's a valid argument. To actually argue against TDRL as a family, you would need to show that any possible inputs with this algorithm would not fit. And my question to that is, well, how do you ever show that?
[01:22:55] Speaker C: Right.
[01:22:56] Speaker A: Right.
[01:22:57] Speaker C: Well, you need to keep track of all prior possible inputs as you move forward in time without knowing what the reward is. But every possible input Q. Sorry, I'm trying to bring it back to.
[01:23:12] Speaker A: And there may be different ways of keeping. You can store things in memory in many different ways.
So which way of storing things in memory should you be considering? And so our argument. So this is where I think that the TDRL thing field has. There's a little bit of a philosophical problem, and that relates to falsifiability as a concept in science and to the extent to which that is a core concept that we still stick by. Right. For scientific progress, for theories. In my view, TDRL is a framework.
It's not a theory. It's not a theory. It doesn't. I mean, if you were to ask folks that really defend it, I mean, I have done it to give me a list of predictions. Here are exactly the predictions where if you were to find these, I am telling. Well, that's it, right? I mean, there's no way that TDR can rescue it.
There's no clear answer to that question. So, like, how can you prove this theory wrong? Give me a way whereby you're 100% certain that if this was true, this theory is wrong, and I don't know the answer yet. And so until you know that, you can't, like, fully falsify a framework. Right.
And that's a valid point. But to me, then, the flip side to that is, well, if you cannot falsify a theory, how much does that aid in our understanding anyways?
Don't you want to have a theory that is very concrete where you can falsify it?
If you can't do that, then that's a framework, and it's useful. I'm not saying that it's useless. It's extremely useful as a way to think through things once you get the results and to come up with explanations for it. But it's all post hog explanations, and.
[01:24:59] Speaker C: It'S not the way that science should be done as popper envisioned it, or.
[01:25:03] Speaker A: At least definitely as it's definitely not the popular view. Right. And so this is where we get into scientific, the philosophy of science debate. Right. And so that's where I think. And a lot of the controversy in this field actually comes down to this particular problem.
[01:25:16] Speaker C: That's a lot of it. What's actually pretty beautiful about that is that you arrived at that through your algorithmic approach, you know? Yeah.
[01:25:26] Speaker A: I mean, and it comes, it came to it just from looking at the assumptions. Right. Like looking at the assumptions and like, which ones, which ones can be defended versus not. And that's when you realize that, well, one of the. Some of the assumptions actually are so flexible that you have so many degrees of freedom there. So then how exactly do you test.
[01:25:46] Speaker C: See, I think, however, that this same argument, look at the assumptions, can be applied to almost any topic in at least neuroscience and biology. Like earlier, when you said that you just had an intuition that something was missing about TDLR, and I was thinking, and then it took you 13 years to. To be able to vocalize what that intuition is. Well, I've had an intuition that there's something very wrong with almost all of neuroscience, and that means it's going to take me, like, a hundred years to even get close to vocalizing it, you know?
[01:26:17] Speaker A: Yeah.
[01:26:18] Speaker C: More tractable intuition. And I. So I'm envious of that.
[01:26:23] Speaker A: Yeah. I mean, you know, like, this also goes back to some of the Randy stuff, maybe, like, with memory, and, you know, like, Randy's obviously been arguing that, like, there's a lot that's wrong with neuroscience and all of how we think about memory is wrong, etcetera. But to actually be fair to folks on the opposing side, where, you know, like, on the TDR side, there's one more aspect of this that is controversial. The second aspect of this that is controversial is that, though, the retrospective to prospective conversion, that is fully within the view of the standard TDRL prospective view, and that it's all long run sums of things, events, and discounted sums. So it's all the same view. So that part is less controversial. I think the controversial bit is that the way that we handle the definition of what is meaningful is you can't write it out as a solution to a problem that you define, and then this is a problem that you're trying to solve, and this is exactly the solution. Right. It's more of an intuitive type approach to say, well, there's a set of core intuitions that whatever meaningfulness is, should abide by. And here's a way to mathematically formalize that. Right.
[01:27:37] Speaker C: Well, you have to operationally define meaning then. And then we're autumn, you know that. Because if not, then we are back into the unfalsifiability problem.
[01:27:45] Speaker A: Exactly. And so we have taken a very concrete way of defining it.
The problem with that is that that is not.
I mean, there are elements of this that have a lot of intuitive. And there's an intuitive derivation for it, but there's not a problem statement where we can say, well, this. If you want to maximize this objective criterion, this is exactly the way that you would define it.
[01:28:08] Speaker C: I see.
[01:28:09] Speaker A: And that's a valid concern.
[01:28:10] Speaker C: And there's not, there's not a formal theoretical, formal solution.
[01:28:15] Speaker A: Normative solution.
Yeah, normative solution. That's a fully valid concern.
In that sense. I think of this as the first step towards going in that direction.
And we're working on this, and who knows if we will end up coming up with something that is a fully normative sort of thing. And my suspicion is that if we work through that, the eventual solution that we come up with is not going to be exactly like the anchor prediction. It would have a lot of the intuitive features of it.
And so that's why a lot of the things that we test are those intuitive aspects of things. Not exactly like how much up or down? Like, it's, it's not the, you know, we're not looking for. Dopamine is 20% up from or, like 10% up from this than what it should be? We're looking at. Well, dopamine should be higher versus lower. It should be positive versus negative. And so, so that's the level of predictions that we're testing at the. And, you know, like, there's a completely different story from this retrospective view that is beyond the Jiang et al. Paper. We're finding other things that seem to be coming directly from the retrospective view where those happen to be true. Right. I mean, and those are experiments that we did explicitly to falsify our model. Right. And it turns out that it wasn't falsified, and so those things existed. And so, at least so far in our attempts, we have looked at some really wacky predictions of this retrospective view, and those wacky predictions seem to generally.
[01:29:48] Speaker C: Hold up, so far have not been falsified. So when you tweet about something like that, do you often use that raised hands emoji? Well, we're still not wrong.
[01:30:00] Speaker A: Yeah. No, it's a tricky one. Right. It's a tricky one because this is not a common approach in neuroscience.
[01:30:06] Speaker C: And so what's not, like, trying to falsify your own?
[01:30:09] Speaker A: Trying to falsify your own.
[01:30:11] Speaker C: Well, it's supposed to be.
[01:30:12] Speaker A: It's supposed to be, but unfortunately it's not. And so it's a tricky one because when we say that, like, we end up finding results that were consistent with our, with our prediction, wacky they may be. Yeah.
The way that at least some members of the audience, of the neuroscientific audience take it as, well, you just looked to prove your theory, right, and you are claiming this as proof for your theory, and.
[01:30:45] Speaker C: There'S no way to convince them otherwise. Right.
[01:30:46] Speaker A: Yeah. Like, how do you convince them otherwise? I mean, I can't give you, like, the full thread of the thinking that I had.
[01:30:53] Speaker C: Yeah. You need to prospectively write out all your future thoughts.
[01:30:57] Speaker A: Exactly. Yes.
[01:30:57] Speaker C: Then you can.
[01:30:58] Speaker A: Yeah, exactly. And so, I mean, I, like, you know, what I'll say is that we haven't discussed them, but, like, some of the things that we have found are very inspired by Randy's stuff, but some of the things. But not exactly Randy's stuff. There are key differences from Randy's stuff, too, but very inspired by them. But some of the things that we find. If my job is to try to find evidence supporting this theory, let me just say that that will not be the thing that I'll go and try to test, because my core intuitions are that there's no way that those could be. Some of those predictions could be true. Right. Like, starting out. And so, like, if I wanted to just find some evidence consistent with the framework, it's way easier to look at things that, you know, this thing could fit and also many other things could fit, so that at least I could say, well, that thing is consistent with our stuff. It's not uniquely consistent with our stuff, but at least it's consistent with our stuff. If I wanted to just build out evidence, that would be the approach that I would do. Right. Like, an purely strategic way. If you. If you want to.
If you want to, like, you know, like, if you want to test these things, it's. I feel like it's. I would be hard pressed to try to come up with a way to look at the experiments that we have done with not an intention of, at least an expectation that that would falsify our results.
Right. There's. Yeah.
[01:32:19] Speaker B: May I say that.
[01:32:20] Speaker C: Oh, sorry.
[01:32:21] Speaker B: The fact that we are just having this conversation. I mean, I'm biased, but would speak to the maturity of the dopamine learning field, because the majority of neuroscience, we don't even have these frameworks right here. I think we have a framework, and now we can start questioning some of the assumptions. So I think we've come a long way. And so.
[01:32:43] Speaker A: Yeah, just 100%.
[01:32:45] Speaker C: But it's, in general, a pretty friendly community. Right. The dopamine, not without controversy.
[01:32:51] Speaker A: I mean, there's definitely controversy. Right. But I think people. People talk. People talk respectfully to each other. That's. That's all you can hope for. Right.
[01:32:59] Speaker C: Well, thank you so much for spending so much time with me and, um. And for the careful elucidation of the history. I didn't really know that we're going to get such a history lesson. And I learned something. I'm not sure I know what dopamine does, but I. But I think it's. This is, like, super valuable for people who do think they know what dopamine does.
[01:33:17] Speaker A: Yeah.
[01:33:17] Speaker C: That it's not. It's not going to change popular science outlets who will still say, oh, I got a dopamine hit with that piece of cheesecake because it made me happy. The happiness. Yeah. But at least, hopefully it'll reach some people and they'll realize that man, stories, it's hard, and there are lots of different ways of approaching it. And I really appreciate the simplicity and the elegance of the solution that you have come up with. Vijay.
[01:33:43] Speaker A: Thank you.
[01:33:43] Speaker C: I appreciate it.
[01:33:45] Speaker A: Yeah. There's one thing that maybe I add, if you have time still very quickly. So one thing that I wanted to say is there's both a positive message and a negative message, I guess, like, in this trajectory, in that, like, all of the debate about dopamine that we've been having. Right. Big controversies in the field are not about the. All the details that Ali was talking about, all the variability across different regions, the external regulation of dopamine release, etcetera. Right. I mean, it's about the simple thing, like this one dimensional signal. What does it represent? Right. And we're throwing away a bunch of stuff, and even that simple one dimensional stuff, we haven't yet settled as a field. Right. So to me, the negative side of that is that that shows you that neuroscience as a field still has a lot of maturing to do.
It's very young. And the fact that maybe one of the most investigated questions with, like, where we're talking about, just literally a one dimensional signal is still not settled, actually says that while other things probably will also need to have these moments, now, the. The positive side of this is that this, I think, is the sign of a field that is actually starting to get to the direction where you're starting to have these debates.
And so I think that that was sort of, you know, like, it's a thing that, like, in general, we need across all of neuroscience, across all the different topics of neuroscience, that sense, the dove field is leading in that sense. Sense.
[01:35:11] Speaker C: So Vijay is going to end with celebrating our incompetence. I like it.
[01:35:18] Speaker A: I think it's a very. The main point that I'm making is that it's a very exciting time to be a neuroscientist in that even though we've collected a lot of data I think as a conceptual field, there's still so much to be done.
[01:35:32] Speaker C: I agree.
[01:35:35] Speaker B: My last line would be that we might not know what dopamine does and what dopamine is, but we we almost definitely know that dopamine is not pleasure. So one takeaway, dopamine is not the pleasure signal. And maximizing your dopamine would not, is not necessarily a good thing. So.
[01:35:54] Speaker A: I do second that.
[01:35:56] Speaker C: All right, so thanks guys, for coming on again, and good luck to you both.
[01:36:00] Speaker A: Yeah, thank you so much, Val. That was great.
[01:36:03] Speaker B: Thanks for having me. Yeah, it was lovely. Thank you.
[01:36:11] Speaker C: Brain inspired is powered by the transduce transmitter, an online publication that aims to deliver useful information, insights, and tools to build bridges across neuroscience and advanced research. Visit thetransmitter.org to explore the latest neuroscience news and perspectives written by journalists and scientists. If you value brain inspired, support it through Patreon to access full length episodes, join our discord community, and even influence who I invite to the podcast. Go to Braininspired Co to learn more. You're hearing music by the new year. Find them at the newyear.net. thank you for your support. See you next time. The stare of a boundless blank page led me into the snow, the covers up the path.
[01:37:04] Speaker A: Me where I.