# SERC Symposium 2023: Vivek Farias

and that was actually the talk so it's just uh pretty pretty simple there uh I'm I'm trying to there we go exactly uh okay so uh so so okay so the you know that this work is like uh it's more on the conceptual side than anything else okay uh but it's it's motivated by something that's that's really coming down the pike so I I'd like to kind of give you a sense of what's coming down the pike and then hopefully you know we can discuss these ideas um so uh you know high level uh you know as we sort of think about you know one model of sort of uh fairness and sort of algorithmic decision making is kind of the picture at the bottom right you have data uh you know you have some algorithm built on top of the data the algorithm takes some some sort of decision and in some sort of normative way we view that as either good or bad or whatever right um you know often when we have problems with the decision the algorithms make you can kind of Trace those problems all the way back to the data itself right so in particular uh you know I'm highlighting just one example from uh Precision oncology uh but this is pretty well known in this space uh where the data sets tend to be largely focused on certain uh ethnic groups and as such when you look at the efficacy of the algorithms you build well obviously you know they're gonna they're gonna exhibit uh sort of disparities so at a high level right we could kind of think about fixing this by sort of fixing the algorithm but you're looking at this and you're saying well that's not fixing the root cause because it's not right the root cause over here uh would be instead what you need to do is collect more data right and uh in particular uh you know there's a large community of people working on online learning right where as opposed to the data set being uh shall we say like static right there's this there's this sort of loop right your data is constantly being updated uh and you're you're actively seeking to figure out you know what data should I acquire what data Maybe I you know do I not need uh uh and and so forth okay um I am particularly interested in sort of the applications of this this sort of framework in modern clinical trials okay so one of the things as I said there's something coming down the pike right so one of the things that uh you know I believe is is just going to become sort of de facto in the next 10 years is a move from the current way of doing clinical trials to what are today called uh platform trials a platform trial is broadly speaking uh something like this where you know to the to let's say to those that are familiar with this area what you're doing is as opposed to kind of a fixed design experiment you're running something of a multi-armed bandit okay the idea being that you have a whole bunch of these treatments uh and you sort of explore and exploit uh with these treatments across your population okay so I'm being very fast and loosier why is this move going to happen this move is going to happen because actually what I was talking about at the top right today as drug companies actually think about you know drugs they want to actually put out right the indications for these drugs the target populations are getting Slimmer and Slimmer and Slimmer and Slimmer so for instance if you think about a blockbuster drug from a company like Merck like katruda right uh Merck is exploring actually through a platform trial right various uses of katruda in combination with a whole bunch of other things to Target very very fine you know parts of the population and a typical sort of trial design won't work okay so so this sort of online kind of learning thing uh you know is happening in lots of domains one of them clinical trials okay so high level right what's the upside of all of this the upside is uh you know you know if I'm if I'm kind of updating the data set continuously I can kind of collect the right data and hopefully I don't get into the problem we you know we had on the first slide right where we over index on certain groups under index on certain other groups um the downside of all of this is uh I'm I'm effectively also experimenting right and a natural question to ask is well if you're experimenting who Bears the cost of that experimentation right like you don't want to be the think of it from a clinical trial perspective the reason it's taken so long for platform trials to become a thing is that nobody wants to necessarily be at the whims of like an algorithm experimenting right you don't want to be that that unit that was uh you know the coin flip right um and so what I what I want to do in this work as I said it's sort of more conceptual than anything uh is I want to sort of formalize what fairness means in this sort of context okay I want to put out like a a proposal for perhaps a systematic way of thinking about you know who Bears the cost of exploration in this context okay so that's that's kind of the work in a nutshell oops um trying to figure out how to go back here there we go so for whatever reason the animation died but that's okay all right so this slide was meant to be animated but uh just to kind of get like a sense of like this is a cartoon of a platform trial okay so so think of it as follows right uh you have a patient that gets enrolled in the trial let's say that patient is that first person over there uh that patient has a whole bunch of information right that information could be you know things about the patient their medical history what have you let's call that X1 uh and then you know you have this platform trial running it's running a quote unquote Bandit think of it as some experiment right some experimental algorithm it chooses let's say in this case a dosage of Warfarin for this particular patient okay you see what happens with the patient by the way there's some issues there because sometimes it takes time to see what happens but let's put that aside right uh and then your second patient comes along right in your second patient's patient two over there you know with information X2 you decide you know uh should I give you know what should I give this first person and the game the game continues okay now uh if you sort of think about the efficacy of you know what you're doing we're going to say that that efficacy is sort of a function of the you know of the patient it's a function of X um and it's also a function of what you choose to do with that patient right so at a high level uh this is what's called a contextual abandoned problem okay so this is a contextual Bandit and just for Simplicity everything I'm going to talk about will be for a linear contextual Bandit but that there's nothing special about that okay now looking at this picture a little bit more right you'll notice the crosses and the the check marks right and so let's say after the fact after you've done this for a long time uh you know you you know what the right thing to do is right and it turns out that in hindsight you did the wrong thing for patient one and patient four you did the right thing for patients two three and five okay now I've also colored the patients here right like one you know those two patients are orange the other two are blue let's say the orange patients and the blue patients belong to certain groups let's say a racial group or something like that right so let's say the orange patients were you know looked at this they may come back and say hey wait a second like we bore the brunt of of of of of all of this experimentation right everybody else kind of benefited from this and that doesn't quite uh you know that doesn't quite feel uh that doesn't quite feel right okay so let's let's take this a little bit further okay and let's put some numbers to this so it turns out that you know there's there's a publicly available Warfarin data set that you can kind of play with and so what we did is we said let's simulate what a platform trial would look like using that Warfarin data set so we know what the ground truth is we know what's going to be good so we can kind of play out what might actually happen right um so if you look at this right um and you kind of normalize in an appropriate way the number of mistakes you made between these two racial groups group a and Group B turns out that the data set has more groups but I'm going to restrict the two groups a and Group B right if either group were on their own right they'd sort of see a normalized sort of rate of mistakes of 75 whatever that number is okay don't don't worry about the absolute scale of the number just look at the look at it relative to each other right now when they both they're both part of the same sort of experiment each group benefits from learning from the other group right and as such the number of mistakes you see for each group goes down for group a it goes down from 75 mistakes to something like six mistakes for Group B it goes down from 78 point something odd mistakes to 72 mistakes so group both groups benefited right uh but they benefited in a by the way all my animations are broken for some reason so I I apologize for the the jumpiness of this um anyways they both benefited but the extent of the benefit is quite different across these two groups right group is reduction in regret was let's call it 70 right 70 units of regret group B's reduction of regret was six okay so I'm going to normalize the regret on their own to zero okay just to kind of build a picture of this right so if I take that blue dot regret on its own and I normalize that to zero and I look at you know what the Improvement was for group a and Group B right you'll see that hey you know group a improved by 68.9 Group B improved by whatever 5.9 now natural the natural question you ask is is this Fair right in some sense and like a reasonable answer to Is it fair as well it depends on what what else was possible right so if the only other thing possible was let's say the Red Dot and you say yeah I mean this is great right like both groups do this is the best thing we can get for both these groups right uh but it may not just be the Red Dot right so conceptually you might say well hey look if I look across the space of algorithms uh the set of outcomes for this sort of utility gain for these two groups might look like anything in this sort of red set over here okay and now once we actually can kind of and I've drawn it to be a nice looking set and it doesn't actually have and it doesn't actually have to be nice looking right um but but conceptually right like the problem we've now reduced this problem to sort of saying look uh I can sort of think about fairness as a question of kind of picking a point over here in terms of what's achievable in terms of regret reduction across these groups all right but once I have this I can ask oh I can do all sorts of things right and this ties to I think one of the talks in the morning where you know somebody was talking about uh you know utilities across groups same story over here okay so now imagine for a quick second right that I can somehow enumerate the set I can somehow talk about this set right uh what I want to do is pick a canonical Point within that set now there are many ways of picking a canonical point just for the purposes of this discussion I'm going to pick one right but you don't have to pick this one you could pick something else okay so the one I'm going to pick is something called the Nash solution roughly speaking what does an ash solution do it's basically going to pick a point that says well at this point it's going to be impossible for me to sort of improve Somebody by let's say 10 without hurting somebody by more than a relative sort of ten percent okay it would have to be that at that point if somebody improved you know from their Baseline by ten percent somebody's gonna sort of get hurt by more than 10 okay that's that's kind of roughly what what goes on with this Nash point okay so let's just take a step back right the game over here was to sort of say we've got uh you know we've got all these groups in the platform trial right a bunch of different let's say a bunch of different races and we want to ask who Bears the cost of exploration who Bears the brunt of exploration right and the way I'm sort of proposing we might think about that right is to think about it from the perspective of across the space of all algorithms I can think about what's achievable in terms of regret reduction across all these groups and now I can fall back for instance on let's say a solution like the nas solution to pick a point within that group in a principled fashion okay now if I do this so what am I saying I think this is the last slide that actually has any math on it but it's not really meaningful math right so what am I saying roughly speaking what I'd like to do if if I want wanted to come up with sort of a fair algorithm it's quite simply pick an algorithm that in this set of outcomes achieved a point that maximized a certain objective in this case it's the sum of the logs of the increase of the utilities right if you thought about what a typical and now this is important if you thought about what a typical optimal algorithm might do in this case it would simply maximize the sum of the utilities that's equivalent to a regret optimal algorithm okay so so let's just take a pause here what are we doing we're saying look there's this vast literature on online learning right if I thought about what regret optimality means in that literature regret optimality when viewed through this lens is simply maximizing the sum of the utility gains okay and what I'm saying is Well that may or may not be unfair right when we look at that war for an example it doesn't look look so promising so perhaps what we need to do is to think about a better allocation of utilities a better split of that burden of exploration across these groups okay so the questions that I want to ask here and I'm not going to spend a lot of time on the answers if you're interested in this you can look at it right is question one you know can you even solve that first thing it turns out it's a resounding yes you can solve it in fact at some level the key ideas to solve this were done like 25 years ago okay so it's not that hard to solve a more interesting question actually to me personally is when you do this sort of regret optimal like is this even a problem right like am I making up a problem here so in particular if you did sort of like a typical platform trial the way they set up the run something that's regret optimal is that going to be unfair right is that naturally going to be unfair or are we going to luck out and is it just going to be fine right and then you can for instance ask are there trade-offs between these things so let's just focus on that second question right uh so how fair right are regret optimal policies right so basically I set up a platform trial I ignore issues of fairness right I just run the trial uh what could happen to all of the you know could we could we see outcomes that don't look so good um the rough answer and I can make this answer I mean there's empirical plenty of empirical evidence of this but there's like sort of very pretty theoretical evidence of this as well so it turns out that under any regret optimal policy okay under any regret optimal policy there's always going to be a group that essentially gets basically no improvement from being part of uh you know of this collective learning process effectively speaking what we're saying is there's always going to be a group that gets free written on okay now now it turns out which to me was very surprising as a result okay but it turns out this is this is true now the intuition behind this is actually really really simple right so so by the way what are we saying here what we're saying over here is if you just did a regret optimal algorithm write an algorithm that looks good from an online learning perspective it's likely going to look terrible from a fairness perspective okay now what is the intuition for this the intuition for this is actually really really simple and very cute okay so imagine for a quick second you had two groups A and B and three treatments okay so the treatments are Theta 2 Theta 1 and Theta three okay and let's say group a only has access to Theta 2 and Theta 1. and Group B has access to uh you know uh uh yes group a only has access to Theta 2 and Theta 1 Group B has access to Theta 2 and Theta 3. okay let's say furthermore right that we know what Theta 1 and theta 3 are with certainty so the only thing that's unknown the only thing we're experimenting with right and and this is by the way very realistic right think of that as a new drug being thrown into the platform trial okay so conceptually it's it's meaningful right so if Theta 2 is this new drug being thrown into the platform trial who's going to actually experiment on Theta 2. well it's not

shocking that it's going to be if at the end of the day because A's best alternative was Theta one B's best alternative was Theta three why would I take a risk with Group B I'd rather take the risk with group a okay and so this is kind of what drives this sort of unfairness right this is the the net of what drives this sort of unfairness okay so right so so Point number one if you just did something regret optimal in a platform trial you're going to get really unfair outcomes at the end of the day so if somebody kind of reviewed the results of the trial right after the fact you might you might well find that there is one group that actually bore the brunt of all the experimentation that's not good so now another question you could sort of ask is uh can we even find this right can we can we can we you know come up with Fair policies and uh short answer is yes what I want to show you is how well you know such a policy might actually do right so back to that sort of Warfarin picture and now my animations are working right uh this is this is what happened earlier right basically we saw that you know group a uh you know kind of improved by a lot Group B did not improve by a lot and so we could say well hey like if you kind of did this Fair sort of exploration uh you know what is that Improvement can you can we balance out the Improvement it turns out that you can compute this Nash thing exactly right and it turns out that what it does is at the end of the day it balances out the utility gain across these two groups the other thing worth noting is look at the total utility between the groups right you'd imagine that there's a price to pay for actually being fair right that is to say if I'm not doing something regret optimal there's going to be a hit to efficiency right there's going to be hit to regret what we see here is that hit the regret is actually not that large okay the hit the regret is actually not that large in this in this uh in this particular picture and that leads to kind of a third question which is you know are these fair you know these efficient policies Fair and the short answer over there is actually as long as you don't have too many groups and this Builds on kind of all the work that was I was involved in uh right if you don't have too many groups you can get uniform bounds for how for how fair they are okay so that's kind of what I wanted to talk about uh you know in summary um here are the sort of the key things I want you guys to take away right so one is that you know online online learning this process of collecting new data is sort of becoming increasingly common in sort of high-stakes decisions my focus was on platform trials where I sort of know something about that space right in that space a big question that's kind of getting asked as folks try to get more and more platform trials approved they're sort of who best this cost of exploration uh there were two big points that we saw one if you kind of stick with the current way of doing these things which is basically regret optimal policies these policies can be quite unfair and there's a structural reason for why this unfairness exists it's not like some random empirical fact right but then on the positive side Fair Solutions are actually reasonably efficient and you can actually do these things okay so let me let me actually quit here with 10 seconds left uh yep thank you [Applause] thank you question uh either of you uh thank you Eric very nice talk when I heard about uh this work on incentivizing Fair exploration across different groups I was wondering if the decision should not be with the algorithm designer but rather with people whether they want to engage in giving their data or engage in a clinical trial let's say and be paid for that like we paid for the group difference because you know if if I know that my data is very unique why why shouldn't I get paid for it to participate in an experiment yeah yeah totally so I I don't think these are mutually exclusive right so I think you can have sort of a market driven view of this which is I think what you're talking about right right and there are probably settings where a market driven view is is kind of the Right View right uh I think the satellites the setting I have in mind is is fundamentally kind of like a one where there is a sort of centralized decision maker right in this case the centralized decision maker is the body that's approving the trial design right and so it's kind of up to them to kind of figure out hey you know how do I want to actually do this and so there is a centralized decision so once you kind of have the that centralized decision maker in place you can fall back on sort of some of the things I was talking about right but I don't think they're mutually exclusive I think it depends on context thank you thank you

*2023-05-23 08:41*