How AI and RNA Tech is Transforming Drug Discovery | Inside Atomic AI

How AI and RNA Tech is Transforming Drug Discovery | Inside Atomic AI

Show Video

CRAIG Why   don't you go ahead and start by introducing  yourself, then we'll get into some questions? RAPHAEL Yeah. So briefly,   I'm Raphael Townshend, founder and CEO at Atomic  AI. My background is really coming more from the   AI space. Originally, I did my undergraduate at  UC Berkeley in electrical engineering and computer  

science. I started my PhD working in  computer vision but fairly quickly, actually,   transitioned- this was at Stanford, into working  on structural biology applications specifically.   So taking a lot of the tools that had made such  a difference in computer vision with things like   self-driving cars or in natural language  processing, with things like ChatGPT, seeing   if we could apply them now to the field of biology  and especially the structural biology space that I   was talking about. It's really about understanding  the shapes of molecules, kind of understanding the   shapes so you can better understand what they do.  So I started working in that space. A seven-year   PhD later- of banging my head against those kinds  of problems, I really started getting some good   success; including, I guess, what would be the  founding work of Atomic, which was basically this   highly accurate predictor of the three-dimensional  structure of RNA molecules known as AlphaFold for   RNA, which was a big breakthrough in the space  recently. So that work ended up featuring in the  

cover of Science in late 2021. From there, I  really started Atomic to continue leveraging and   developing those kinds of technologies to really  enable this next generation of RNA drug discovery. CRAIG And when you say,   AlphaFold for RNA, were you following  DeepMinds research and applying it   to RNA? Or is it just analogous to what  DeepMind was doing with protein folding? RAPHAEL Yeah. So I actually  

worked on the DeepMind team a few years back  so I'm well familiar with their work as well.   While there are certainly similarities, there's  also a need to redesign a lot of the algorithms,   the AI models from the ground up. So RNA, while  at a high level is a similar molecule to proteins,   there's a lot of intricacies that require  custom AI models built. For example,   RNA is a much more flexible molecule  than proteins. So really understanding   those dynamics of it is a key piece that you  need to build into these kinds of algorithms. CRAIG Yeah. So  

the algorithms that you worked on that became  Atomic AI, came out of your work at Stanford,   is that what you said? Or came out  of your work at Google DeepMind? RAPHAEL This is out of my work at Stanford,   actually. Before even working at DeepMind I had  already developed the core of these algorithms   that worked quite well on RNA specifically. It  was very cool to be making those breakthroughs and   seeing the AlphaFold, for proteins, breakthroughs  happening very shortly thereafter. It was a very   interesting time overall, is perhaps one fun  way to put it. The cool sign on the AI side for   RNA is that the amount of RNA data that was  publicly available is much smaller than the   amount available for proteins. So you needed to  design some very, sort of bespoke algorithms to   work on the limited data. The original science  paper, for example, was trained on just 18 RNAs  

total. So a very small number, right? But you  could see you could do surprisingly well given   that small number. The problem was definitely not  solved. I don't want to claim that it was solved   at that time. That plugged into a lot to what  we're doing at Atomic these days, which is really   building upon that advance. But, it really showed  the power of these custom kinds of algorithms. CRAIG Yeah.  

For people that don't follow biotech, because  a lot of the listeners are AI people; can you   explain very briefly what RNA is, why it's  important in drug discovery- everyone knows   that the COVID vaccine was an RNA vaccine  but they don't necessarily understand what   that means, and why understanding the shapes of  molecules would be important for RNA therapies? RAPHAEL Two great questions,   Craig. I would say first of all, RNA, maybe if  you remember the most basic biology lessons, the   central dogma in some ways; we have DNA encodes  the information, which then goes to RNA, which   then goes to proteins. For a long time, people  thought of proteins as the workhorses of the cell,   the things that did everything and RNA is just a  messenger that codes for the right proteins. Now,   it turns out that view is not quite right. In  fact, there's this vast world of RNA that's   kind of doing its own set of functions beyond just  coding for proteins. In fact, there's this really   nice hypothesis, the RNA world hypothesis, about  how all of life was first RNA-based, then DNA and   proteins came afterward. An interesting other stat  that's kind of fun to think about is, if you look  

at the human genome, about 1% to 2% of it codes  for protein. So about 1% to 2% of the human genome   becomes proteins but about 80% of the genome  becomes RNA at one point or another. So there's   about this huge world of RNA that never even  becomes proteins that we're just barely scratching   the surface of. So with all that context in mind,  we've got proteins- seen as the workhorse of the   cell. If you look at most drugs on the market  today, they're all going after proteins. It's been   seen as the main target. That's been great in many  dimensions but there's been a number of diseases  

that have been essentially undruggable at the  protein level. We've been trying to drug it for   40 years and we just can't go after some of these  proteins. And these are really high-value things.   There's this one protein called c-MYC that's  involved in 75% of human cancers that we've just   failed to get any drugs to. So the idea is, now  you've got your DNA, goes to RNA, goes to protein;  

you go one step earlier in the process and you  go after the RNA that codes for that protein,   for the c-MYC protein. In this way, you're sort  of going after these diseases that were previously   undruggable by increasing the attack surface and  giving yourself new ways of striking at them. So   that kind of tells you a little bit about the  RNA and why you care about RNA drug discovery,   specifically. There's a lot more complexity,  I'm simplifying at some level but that's one  

of the key messages there. The other piece of  why you care about the shapes of the molecules,   and it's very linked RNA in some ways, but it  also has this much broader kind of potential to;   it's fundamentally about the fact that the  shape of a molecule kind of determines what   it does. Structure determines function, is what  people say. And it's almost too obvious in some   ways. The shape of a bike is important to what it  does. If the wheels of the bike were in the wrong   place it wouldn't do a very good job moving you  around. Very similarly, the shape of a molecule   is really key to what it does. So you need to make  sure it's all in the right place to perform its  

function adequately. So sometimes when there's  a disease you can understand almost by looking   at the structure, looking at the physics driving  these things and understand from first principles,   what's going wrong. Then you can intelligently  design medicines to solve those issues. This is   a process known as rational design, which stands  in contrast to the traditional way of doing it;   which is more throwing things at a wall and  seeing what sticks, which is more known as   phenotypic screening- would be the technical term  there. And it's kind of seen as one of the future   directions of drug discovery and it's playing  an especially critical role in the RNA space   because there's a lot of things that we need  to really understand from first principles to   design well. So I'll stop there for a  second but hopefully that makes sense,   both on the RNA- why that matters to a huge degree  and why the shapes of molecules matter as well. CRAIG  Yeah. Given that context, can you just describe,  quickly, what the RNA vaccine for COVID was doing  

and how understanding the shape of the  RNA molecule played into that therapy? RAPHAEL Of course. The mRNA vaccines was one   of the first big breakthroughs in this RNA drug  discovery landscape, I think everyone got to see   that firsthand. Fundamentally, what the RNA side  of things is doing is it's coding for specific   protein that is part of the Coronavirus, the spike  protein. So if you remember, you've seen all these   graphics, you've got the virus particle and you've  got the spike protein sticking out of it. The RNA   is basically coding for that spike protein but  not the rest of the virus. So then that gets into  

your cells, your cells produce a lot of the spike  protein, and then your immune system essentially   learns to recognize that and says, [Oh, that's  a foreign thing that we've seen], and can train   itself right off the bat to find anything that  presents that spike protein in the future. So   then when the real virus shows up, it's already  trained, your immune system is already trained to   knock that down. Now, the part where a lot of the  structural piece comes in is actually for making   the next generation of those mRNA vaccines. In  particular, one big issue that's been presented   with the RNA vaccines is the need for cold chain  storage. The Maderna or Pfizer vaccines need to be   stored at very low temperatures to be transported  around the world. For example, getting that to  

low-income countries is a challenging proposition.  So what you'd like to do is you'd like to make   those vaccines more stable, the RNA molecules  more stable. In that case, what you're doing   is you're trying to find the shapes that are well  folded, that are more resistant to falling apart,   basically more stable overall. So in that way,  you're rationally designing the next generation  

to be a better version of the first gen. So  there's this nice interplay where there's all   sorts of different properties of these RNAs that  you can optimize through that kind of approach. CRAIG Yeah. Since I have you explaining   this stuff, you introduce an RNA molecule and  it then goes to the DNA? Explain the mechanism. RAPHAEL Of course. Biology is fascinating at some level  

and there's a whole bunch of hidden complexity in  there. Fundamentally in this case, actually, the   RNA is not becoming DNA at some point, it's just  staying as RNA. And then your cells machinery,   there's these other molecules known as ribosomes  that can translate that RNA into proteins for you.   It’s kind of cool; the ribosomes themselves are  actually mostly RNA molecules. You can kind of   see how it bootstraps itself. You have these RNA  molecules that are responsible for turning RNA  

molecules into proteins. That's kind of one of  the reasons people think that RNA might be the   first source of life. So anyways, you've got  these RNA molecules and then these ribosomes   come and translate them into proteins, into the  spike proteins. Eventually, the RNA molecules  

get degraded and thrown out of your cells but  before that point its produced enough of the   spike proteins for your immune system to recognize  them. And sometimes you need a couple of doses.   That's why there was a couple doses of some of the  COVID vaccines; is because not enough of the of   the spike protein gets produced the first time so  you need a second dose to produce some more of it. CRAIG Okay. So   with Atomic AI, you're designing RNA molecules or   you're understanding the shape of existing  RNA molecules? Yeah, talk me through that. RAPHAEL Yeah. Another great question, frankly, because   there's kind of these two broad categories of RNA  technologies- is how I think about them. They're  

the mRNA vaccine kind of category where it's on a  base, the medicine itself is the RNA. For example,   you're injecting some RNA to your body and it's  producing it. But then there's the other category,   which is RNA targeted, where you're targeting  the RNA that's already in your body. Then  

finding the medicine needs to hit that because  your body, as I said before, 80% of the human   genome becomes RNA at some point or another. So  there's a lot of that sitting around. So while you   can actually apply this fundamental technology to  both of these, the RNA-based medicines as well as   the RNA targeted ones– and what I was describing  about making the vaccines more stable is applying   it to the RNA base piece, we can also apply  our technology to the RNA-targeted piece   to understand the shapes of the RNA molecules  already in your body to then go and target those   selectively. That's actually the initial focus of  Atomic AI; is understanding the shapes with the   RNAs already in your body, folding all of those,  predicting the shapes of all of those molecules,   and then targeting those. But maybe this hopefully  paints the picture of how this technology can be   very broadly applicable, because even here you  can see how you could apply it to these two fairly   different RNA technologies but in both cases,  it can make a huge difference in drug discovery.

CRAIG Yeah.   And on the COVID vaccine, the vaccine is  introducing RNA molecules into the body. RAPHAEL Exactly. CRAIG In the targeting RNA that   exists already in the body, you're targeting  it to turn certain RNA molecules on and off,   or to modify the behavior? What  are you doing with the targeting? RAPHAEL Yeah. There's a huge   range of things you could do there. The thing that  we're fundamentally focused on is, as you say,   turning it on or off. The easiest thing you could  do in some ways is- a protein that you want to go  

after, you want to decrease the amount of that  protein, so you go and hit the RNA instead. And   you're like, [Let's go and destroy that RNA], and  then there'll be less than the protein and then   you'll have gone after this undruggable disease  that you couldn't hit at the protein level. CRAIG And why wouldn't you be able to hit at the protein   level? I mean, is it understanding the shape of  the protein or finding molecules? As I understand   it, the importance of understanding the shape  of a protein is, if you want to find a molecule   that will fit into a pocket in the protein, for  example- to prevent the protein from binding   to other things. Is it that sort of thing that–  you're looking for a molecule that will fit into   the RNA molecule and stop it from functioning?  What exactly are you doing at the shape level? RAPHAEL Yeah. I think you're exactly right; shape is the  

key in both the protein and the RNA level. The big  reason some of these proteins are undruggable is   because they're disordered. They're not actually  adopting any single shape, they're just completely   floppy. So they don't present any pockets for  you to hit in the first place. Like this c-MYC   protein that I was talking about, that’s involved  in 75% of human cancers, it's just disordered.  

There's no pockets for you to go after. So the  idea is, instead, you go after the RNA and you   try and hit the shapes at that level instead and  design things. It turns out that understanding   the shapes at the RNA level is really critical  as well, to get molecules that are selective,   basically hitting just that RNA and not a bunch  of other things, and functional- that do the thing   that you need it to do. In this case, degrade the  RNA or prevent it from making more of a protein. CRAIG Yeah.   And how many RNA molecules exist in the body  and how do you know which ones to go after? RAPHAEL There's far more RNA molecules than there are   proteins. If you're saying the number of distinct  kinds of RNA molecules, you'll remember that stat  

I was talking about, how there's like 80% of the  human genome becomes RNA but only like 2% becomes   proteins. So there's just this vast number of RNAs  and some of them don't even code for proteins,   they just do other things in your body. We're just  trying to understand, still, a lot of the biology   behind those. If you just think about the ones  that code for proteins for a second, the mRNAs,   the messenger RNAs- that's what that stands for-  the ones that code for the proteins, then there's   hundreds of 1000s of those, of distinct kinds of  mRNAs there. So that's just one small piece of   the RNA world but it already presents this huge  possibility of potential targets to go after.

CRAIG Yeah.   And in the case of the COVID vaccine,  they identified the mRNA that created the— RAPHAEL The spike protein,   exactly. You're exactly right. But then there's  other kinds of RNA. There's non-coding RNA;   RNA that doesn't code for proteins are known as  ncRNA. Or there's rRNA, which is ribosomal RNA.  

There's all these different categories,  and kind of fun- people just stick little   letters in front of RNA or after the word  RNA to indicate these different categories. CRAIG Right. Is Atomic AI's mission   to define the shape— as has been the case with  AlphaFold- they just came out with AlphaFold 3, RAPHAEL Yeah, very exciting, CRAIG to define the   shape of all of the existing mRNA molecules?  Or, are you selecting certain mRNA molecules   that you know are related to certain diseases  and just focusing on understanding their shape? RAPHAEL I mean, the long-term dream   is really about building a map of every RNA that  exists as well as enabling the design of new RNAs   that we've never seen before. You've sort of seen  what AlphaFold can do in the protein landscape;  

people are using it for all sorts of things.  The idea is to bring that revolution to the   RNA space as well. One simple way I think  about Atomic, it's like, we're trying to   combine that big RNA breakthrough with the  COVID vaccines. And there's been a number of   other RNA technologies that have come to fruition  in the last few years, with the AI breakthroughs,   like the AlphaFolds of the world. So there's that  dream in the long term. Our immediate focus is   on going after very specific RNAs and showing the  potential on a few test cases first. Showing like,   [Hey, you can do this here to get to this  point and design these new drugs that are   very exciting.] That then paints the map of  what you can do in a much broader landscape.

CRAIG Yeah. And we'll   talk about the AI behind it in a minute but  what are the targets that you're focused on   right now? I mean, you mentioned this c-MYC  protein. Is that one that you're working on? RAPHAEL So we haven't disclosed   our targets, are specific ones at this point, but  I would say they very much fall in that category.   There's a number of these undruggable protein  targets that you're trying to hit at the RNA   level instead. And a number of these targets  are in the cancer space- is a big area that   we're looking at. I can continue using c-MYC  as an example actually but it's a stand-in for  

many other possible proteins you could use.  That's essentially known as a transcription   factor. It regulates how much of every other  protein is made. So you get too much c-MYC,   you get too much of every other protein, you  get uncontrolled replication of your cells,   you get cancer; so you really want to decrease  the amount of it. There's many proteins like   this so you really just want to decrease the  amount of it to control cancer spread. So that's   one big area of focus. The second big area is  neuroscience, diseases like neurodegenerative   diseases basically. So think about things  like Alzheimer's or Parkinson's in that case.

CRAIG Right. Where,   again, you're attacking the RNA that builds  the proteins that are causing the disease. RAPHAEL Exactly CRAIG  At the therapeutic level, if you understand  the shape of those RNA molecules and you   create a molecule that interrupts that  mRNA molecule, do you then just inject   the therapeutic into the bloodstream and  it finds the mRNA? How does that work? RAPHAEL It's a great question.   So the technology– what you could use to target  the RNA, there's a whole bunch of different ways   you could do that, different modalities, is what  people call those. The one that we're focused on   today is small molecules. Small molecule is like  20 atoms, whatever; they're small basically. And  

it's very easy for them to get around and  in your body is really the key thing there;   delivery is easy. It's really classic, any drugs  on the market or small-molecule drugs, most of   them target proteins but we're doing RNA-targeted  small molecules basically. The nice thing about   these is you can oftentimes just take them orally,  you can just take them as a pill. That makes it   really easy and then you can get across your  body in different ways. So you've seen a lot   of these new technologies like the mRNA vaccines  or others where you need an injection or sometimes   you even need a surgery or something like that  to get them. The beauty of this kind of approach  

is that you can then bring that back to just  being a pill that you can take again and you're   going after these diseases that you don't really  have other nice ways of hitting them otherwise. CRAIG Yeah. We spoke about   Insilico and I've interviewed Alex Z.; I won't  try and remember how to pronounce his last name,   on the program but they're looking at the  universe of molecules, small molecules,   and trying to narrow the search space based,  as I understand it, on the properties of those   molecules before putting them into trials. So  you raise your potential success rate at the   trial level? Are you talking to them at all? It  seems like this would be a more precise way of   narrowing the search space if you understood the  shapes of the molecules. Or, are they doing that? RAPHAEL Yeah, I mean,   I haven't talked to Alex recently or anything. I  mean, there's this huge space of exciting groups  

pushing these kinds of approaches, I would say.  I think they are very much using AlphaFold-like   approaches as well. I believe I've seen that  work but at the protein level primarily because   AlphaFold for proteins already exists and has  been handed out to the world, and everyone   can use that for what they have. I think that  there's this recognition across the field that,   [Oh, understanding the shapes of these molecules  is really powerful and can let us do a lot of   things.] From our standpoint, we're trying to  make the same thing happen in the RNA space,   again. I can say, I don't know about Insilico  medicine specifically, but I can tell you   firsthand, there's a lot of groups that are  quite interested in understanding the shapes   of RNA molecules to make this dream of rational  design happen there as well. I definitely think  

there's a recognition across the field that these  AlphaFold type approaches, these rational design   type of approaches are really the next wave, the  next generation of these kinds of approaches. CRAIG Yeah. So now, the AI behind Atomic,   I interviewed Oriol Vinyals on the program about  AlphaFold. It was maybe a year or two ago. To my  

understanding, it was an extension of their  work in AlphaZero, where it's a combination   of search and reinforcement learning to come  up with, sort of, candidates. Then they had a   second system that ranked the candidates and then  would narrow that further for testing. Is that   essentially what you're doing with RNA? Why don't  you walk us through the system that you've built? RAPHAEL Of course. So some of the initial   systems were very much along those lines, like  three or four years ago, I would say, you create   a bunch of candidates and then you rank them  using the scoring functions. I think since then  

we've actually dramatically overhauled a lot we've  built, and to be fair, a lot of the field has been   moving in this direction more generally. And we've  been building these big transformer-based models,   first of all, these things that have made such a  big difference in the power of technologies like   ChatGPT. Then we've used those to directly  generate the structures of these molecules   through these generative AI approaches, might  be one way to think about that. So in this case,   you can just take an RNA sequence and it directly  can produce a three-dimensional structure or even   a set of 3D structures if you think that it's  dynamic and might adopt different shapes over   time. So that piece has been very interesting.  Part of what we've needed to do to enable that is,  

these transformer base models are very  data-hungry. You couldn't really do the   18 data point thing we did before, where you only  trained on 18 RNA structures to build this kind of   thing. But, we recognize that it was critical  that we move to those kinds of architectures   for the long run to really crack a lot of  this problem. The other piece of Atomic,   I've been talking a lot about the algorithms, is  we also have our own in-house wet labs that we   use to generate our own data at a very large scale  to train these AI RNA models. In some ways we have   these top-tier AI folks working but then we're  also building the right data that's purpose-built   for these kinds of models. And this is really  a broader trend that I'm quite excited about,  

which is this integrated lab, sort of computation  approach, where you have this iterative cycle   where you're generating data that then makes  the AI better that can then feed into more data,   that's more targeted to further improve the AI  to get that, like, virtuous cycle of improvement. CRAIG So can we   take the example of an undruggable protein-  and you understand the mRNA that generates,   I guess- is that the right word?  that protein, or builds that protein,   or has the instruction set to create the protein  and you want to interrupt that mRNA molecule. The   first step then is to understand the shape of the  RNA molecule. So at that point, what do you do?

RAPHAEL It's actually quite interesting   because in theory, you don't need to know the  shape to start trying to go after it. You could   just take your RNA and start throwing molecules at  it and see what happens. In fact, that's what the   first generation of companies in this space did  about a decade ago, at this point. The issue there   is that the molecules were generally not potent,  first of all. They didn't do what they needed   to do. It would stick to the RNA but it wouldn't  destroy it or decrease the amount of protein etc.  On top of that, they're oftentimes not selective.  So they bond to that RNA but they also bond to  

many other RNAs at the same time. So there's  issues of potency and selectivity. That's really   where the shape starts coming into play. Because  you want to find the unique 3D shapes, the unique   pocket where you could get a molecule to stick  there and then it will. You could optimize it   to bind to that location, not others. On top of  that, if there's a well-defined shape somewhere;   nature doesn't waste effort generally. So those  things are generally the pieces that are actually  

functional and doing something interesting,  versus some generic piece of RNA might not   actually be doing any interesting function.  So shape then becomes critical to get over   these barriers of function and cell activity.  To give you an example of what this does then,   is you could find a shape in the RNA molecule  that's responsible for keeping it stable,   I mentioned stability before, and you can  destabilize that structure with a small   molecule. It's kind of like it binds in there  and it makes it less structured, basically.   Then that lets enzymes, other proteins that are  responsible for chopping up this RNA molecule,   have an easier time to doing so. So you're  destabilizing it, you're more prone to getting   chopped up, and that decreases the amount of it,  decreasing the amount of protein- cures cancer.

CRAIG  But how do you discover the shape? I mean,  that's essentially what you're doing, right? RAPHAEL Exactly. This actually gets at the reason why   folks are so excited about things like alphaFold  in general; because the traditional way that you   find these shapes is through these very expensive,  very slow experimental techniques. They have names   like X-ray crystallography, or cryo-electron  microscopy, etc. And we could get into the   technical details of how they work but really,  the key to remember is that these things can   take months or years to solve a single structure.  I have a good friend of mine, he spent his entire  

PhD solving a single protein structure using these  techniques. And these machines, the cheap machines   in some ways, cost millions of dollars. So if  you can take this process that takes months or   years to get a single shape and then use these AI  approaches to bring it down to minutes or seconds   instead, that's a big deal. People really care  about that, basically. So what we're doing is  

we can, instead of relying on these expensive  techniques and trying to run those over time,   which over the past couple of decades have solved  maybe a thousand RNA structures total, depending   on how you count. That's actually generous, maybe  it's a couple hundred; you can instead take these   AI approaches and just map out everything at once  and find these interesting structures, you're   finding the pieces that are folding into nice  pockets that are targetable in the first place.   So one way that I talked about what we're doing is  we're essentially taking the space of all the RNAs   in your body and identifying which parts of it are  the most structured and targetable through drugs. CRAIG Okay,   how do you do that? Is there a  library of mRNA shapes already? RAPHAEL You asked me   how you train those kinds of models in the first  place; there's one thing. So there's an existing   library of shapes that have been solved through  these expensive experimental techniques, as I   was talking about before. Maybe you have a couple  hundred of those. So that's the starting point,  

that's your gold standard. Now, as I was just  saying, these AI approaches are very data-hungry   in general. And you can get some initial bang for  your buck through being clever on the algorithm   design; that was the initial science paper. But,  eventually you run into the bitter lesson of,   you just need more data and more compute  to really get over some of these things.   So this is actually where RNA itself ends up  being quite an interesting molecule because   you can design experiments that are very high  throughput, that give you lots of measurements,   even parallel for RNA specifically. That's  because you can connect it to DNA sequencing   and the cost of DNA sequencing has fallen off  a cliff over the last couple of decades. So we  

can design these experiments that can measure  tens of millions of RNAs in a single shot. So   let me explain how one of these techniques works  just to paint the picture for a second. So you   have your RNA molecule and you expose it to a  chemical and that chemical will go and nip the   RNA at different points, it'll kind of damage it,  basically. Then you can convert that pretty easily   back to DNA. Then you can run it through your  DNA sequencer and the parts that got damaged,   basically, will show up as mutations in the DNA  sequencer. It'll be errors in the DNA sequencer.  

So now you can very easily pick up where these  chemicals damaged the RNA molecule. The key is   that the places where it gets damaged are  very linked to the shape of that molecule.   One very simple way of thinking about it is that  the outer parts of the RNA are going to get more   damaged than the inner parts. So now you've got  these measurements that are telling you something   about the structure of these RNA molecules. And  because it's DNA sequencing link, you could just   run this in parallel on a huge number. On top of  that, it's actually really easy to make at least   shorter RNA molecules at a very large scale.  This process known as oligo synthesis, lets  

you just do it synthetically and create, like,  millions of them at once. Versus for proteins, you   can’t actually sequence those. You can't run those  through DNA sequences very easily. On top of that,   making them is harder, you kind of have to have  the cells produce the proteins for you. So in some   ways, the fact that we're operating at the RNA  level makes this job easier because you can make   and measure the RNA molecules at a much higher  throughput than you ever could for proteins.

CRAIG And can you then model the RNA   shape in a visualization the  way that you can with AlphaFold? RAPHAEL Exactly. So you end up with a shape   of an RNA molecule. I can even show you sound or  something like that. How I would think about it   is- an RNA is just made of atoms at the end of the  day, like a protein or anything else and it's just   the atoms in 3D space. It's like, okay, you've  got a carbon over here, you've got a nitrogen over   here and they're bonded together, right? So you  can look at those and you can sort of see how it's   structured. You can even start simulating them,  you could just run the laws of physics on that.   That's another process that's super interesting  and can tell you how it flops around over time.

CRAIG And then   to create the small molecule to bind  to the RNA, once you understand the   shape of the RNA molecule, do you search  through some search space for an existing   molecule that has a corresponding shape? Or,  do you then synthesize a unique molecule? RAPHAEL Yeah. I mean, there's a   number of different ways you could do it. One of  the common ways is really this process known as   docking basically, where you've got the shape and  you're just trying different molecules and saying,   [Do they fit? Does it interact physically  well with the other molecule?] So you could   search for these very large spaces of possible  molecules, including things that have never   been synthesized before and then say, [This  one looks good. Let's go and make this one],  

then actually test it in the lab. The idea is,  you're kind of searching through this massive   space and then narrowing it down. That's one  common way. There's a few different ways you   could use these things but fundamentally,  you've got the shape, you understand what   you're trying to go after, and then you're trying  to find molecules that interact with that shape. CRAIG Which do you do, which does Atomic do? RAPHAEL So we do a lot of this docking type of approach,   as I mentioned, fitting the molecules in there.  Then we combine that with more traditional kind   of screening methods as well, which once you've  identified where the shape is, you could just   isolate that shape and then you can throw a bunch  of molecules in a lab setting at it as well. So  

generally, we apply both of those techniques  together, the joint computational experimental   piece. You can even combine those together  because if you've run it in the lab, you can   then feed that back into your AI and do a better  job of docking, for example, the next time around. CRAIG Yeah. You were talking about transformer   models. So this isn't search and reinforcement  learning, you're generating molecule shapes. RAPHAEL Exactly CRAIG And then   if you find one that computationally seems  to fit, then you synthesize it and test it? RAPHAEL Yeah. You get to some RNAs,  

you generate their shapes, you screen  small molecules against it computationally;   if something looks promising, you go and  synthesize that, you test that first in   cells and then in animals and you keep  pushing towards the clinic on that front. CRAIG Yeah.   Then eventually into human trials, I presume? RAPHAEL Exactly CRAIG Where are   you in that whole process? Where is Atomic today? RAPHAEL It's actually an exciting time   for Atomic because we're just starting to test  in animals for the first time here. I would say,   it's still what I call the preclinical place.  We're definitely early stage in many ways   but this is really the first time that  we're going beyond cells. We've seen  

our technology work well within cells and  now we're trying to get the next layer up,   the next higher-level organism in some ways,  and really putting a lot of the platform to   the test. And we're anticipating getting a lot of  that initial data in the not-too-distant future.   Personally, it's a cool moment because I've  been working in the space for ten-plus years   at this point, and it's like, okay, the dream  is starting to become a reality in some ways. CRAIG Yeah. I haven't looked   at AlphaFold 3 yet. I just saw the announcements.  What's different with AlphaFold 3? And, are you   adopting whatever changes they made in AlphaFold–  I've forgotten, did you call it AlphaFold RNA, or? RAPHAEL Our core model is known as   ATOM-1. It's an RNA foundation model is what we  call that. I should plug the name a little bit,  

I suppose. I think we're pretty excited about  AlphaFold 3 overall. I think this space as a   whole has been moving really quickly. There's  always these new advances coming from different   groups. And you always make sure you read through  the papers and understand and integrate the pieces   that are useful from these different advances. I  think the key that's happened with Alpha 3 is that   they've expanded their modeling from just proteins  to much broader states of molecules. That includes  

RNA as well, to be clear. And they've seen some  pretty good success in at least some of these new   molecules in doing this kind of modeling. For  example, they've seen pretty good success at   modeling protein-small molecule interactions,  a little bit more like what Insilico Medicine   might do, of modeling protein-small molecule  interactions. They've done a pretty good job   at DNA or things like that. However, RNA is one  of the areas that still has room for improvement,   at least based on their studies, because they  still don't have state-of-the-art as compared   to more traditional methods there. Fundamentally,  that comes down to the fact that there isn't that   much RNA data out there that's public. This is  one of the key bets of Atomic. You can do very  

well in certain areas where there's a lot of data  but in others, you really need to invest carefully   in collecting the right kind of data as well. So  I think there's a lot of very useful components   that have come out of the AlphaFold 3 kinds of  approaches and it's a fun paper to read overall.   I know a lot of the team members they're quite  well and I'm pretty excited for them. It's really   taking those pieces, combining it with the data  that we have already that we spent the last three   years collecting in-house to really try and create  these continued breakthroughs in the RNA space. CRAIG Yeah. How many RNA molecules   have you modeled successfully, to the point that  you can synthesize molecules that will fit or bind   with them whether or not there's therapeutic  interaction? How many have you done so far? RAPHAEL Yeah, it's a super interesting question.  

I would say, there's different levels of  validation you can do. You can make the molecules   and then you could test them in certain ways. One  answer to that is tens of millions or hundreds of   millions for the highest level validation where  we made these things, and we've tested them,   found their structures, etc. But then, the  number of things that we're testing in animals,   which is several steps later down the process,  we're just getting our first one there. So you  

can think of it as a funnel that’s starting from  this huge number and where each step validation is   getting smaller and smaller. But, it's the tip of  the spear there. One stat that I like to give is,   there's a level of accuracy you need to do this  kind of rational design approach. It's like, you   want your structure to be this close to correct  for you to be able to model the interactions with   a small molecule that could bind there. If your  pocket is completely wrong, it's not going to   really help. What we've seen over the last couple  of years is that on average, the structures that   we're making are sufficiently accurate now to be  able to do that rational design kind of approach.   I'm not saying that they all can do it but the  original science paper that we put out, maybe   it was like 5% or 10%. Don't quote me on that, I  don't remember the exact number but it was a small  

number basically, a relatively small amount there.  But since then, over 50% of those structures are   now sufficiently accurate to do this kind  of approach on, which personally is the Holy   Grail in some ways, as far as I'm concerned. It’s  like, [Wow, we can actually use this reliably.] CRAIG Yeah. And once you get the process down,  

is it a matter of just running this iterative  loop until you reach a level of accuracy? RAPHAEL Exactly.   You need the right level of accuracy. And  I don't want to overly simplify either,   you can definitely turn the crank and eventually,  you'll get there. The one thing that's interesting   and difficult in biotech is that biology is  complicated and there's 1,000,001 things that   could go wrong in various ways. You get a molecule  that's very potent and selective, for example,   but if it doesn't circulate well through your  body, then it's not going to do much anyway.  

So there's a lot of pieces that you need to  put together. We're not replacing the entire   drug discovery and development process wholesale  here but we're really honing in on some of the   key aspects, some of the critical bottlenecks  that have hit the field and made those better.   We're not replacing animal testing, as an  example, we're not replacing the clinical   trials themselves but we are getting to faster,  better molecules to run through those things. CRAIG Yeah. And your focus right   now is on narrowing to molecules that you can test  or is it on the more general, as you were saying,   the dream is to eventually model all RNA  molecules? Or, are they happening in tandem? RAPHAEL It's really in tandem, in some ways. How I think   about it is, you got the long-term bets, then the  midterm kind of things, and the short-term pieces   that we're doing. Especially for science-heavy  company like Atomic, you need that balance because  

you want to be looking at that near term of,  [Let's actually show this thing can deliver on the   promise in some cases but then also enable a much  broader space at the same time.] So we continue to   build what I described as RNA foundation models,  collecting these very large data sets to build   accurate models of RNA structure, of RNA function,  etc; enable RNA design, that's an active area of   research here at Atomic. On the other hand, we're  also advancing our first programs into testing in   animals narrowing the search space, as you say,  to find those molecules and testing them. And I   think this is a little bit of my own philosophy  as well, which is, you want to be applying the   technology that you have or putting it to the  test as much as possible. I'm a big believer  

in building the thing that is useful, that will  actually move the needle, as opposed to designing   it in isolation and then figuring out how to  apply it. Because I think that by applying it and   looking at where it's useful versus not, then you  can guide further efforts in that direction and   really build the foundation models that are useful  and that are really going to make a difference. CRAIG Yeah. But the activity,   you've got this animal trial; if it's successful,  do that move into human trials? I understand the   long-term goal but is the company at this point  focused on generating data that then can train   better models? Or is it– I guess I asked this  already, is it on coming up with therapeutics? RAPHAEL Yeah. The boring answer is really both,   frankly, is what I'm getting at in some ways. And  these two things are very linked, in some ways,   actually. I think you're getting at a very  important point, though, because of these first  

trials that we're doing, where we're testing in  animals, for example, that's going to generate a   small amount of data. And you can only feed that  back in but oftentimes, the kind of data that's   really critically useful for training these big  AI models looks fairly different than the data you   get out of any given drug discovery program. Maybe  concretely, the way I could paint this for you is   that we have a team that's dedicated to generating  data for the AI models specifically. And we have   another team that's dedicated to pushing  forward the drug discovery programs. There's   a lot of cross-interaction, cross-pollination  between those two but there are specific folks   with very specific mandates at Atomic along  both those lines of what you just described. CRAIG Yeah. And remind me when did you form Atomic?

RAPHAEL Yeah. It's been three years   that we've been going now. We're like 25 people  today, half AI scientists, software engineers,   etc, half biologists, medicinal chemists side  of things. I've been saying this “both” thing  

a lot here and it's fundamentally we're trying  to build this interdisciplinary organization   that has that expertise across these different  spans. We're also co-located in the Bay Area to   enable that interplay and interchange of ideas.  So I fundamentally believe that a lot of the   key innovation for a place like Atomic happens  in that white space between established fields   in some ways. It's like, you want to build  that new field there at the intersection. CRAIG Yeah,   I was just in DC talking to the director  of the National Science Foundation. How  

much are you depending on government grants?  This sounds like something the NSF or other   government organizations would want to fund.  How much you depending on venture capital? RAPHAEL Yeah. So at this stage,   we're mostly venture capital-based, however,  you're completely right. There's a major   government angle to this whole thing.  In fact, it's funny that you asked the   question on government; I literally met a  Secretary of State Antony Blinken on Monday,   talking about AI applications for biotech  specifically and making this argument that   some of these large data sets you need to collect  to build these foundation models are like really   key. To build the best foundation models  you need the best data sets that require  

sustained investment, long-term investment, the  kind that the US government is uniquely suited   to provide. So you're actually hitting on a very  interesting point there, that I think is very much   an area of excitement for Atomic and  this AI for biotech field in general. CRAIG Yeah.   I didn't realize that ARPA,  is that how you pronounce it? RAPHAEL Yeah, ARPA CRAIG Now there’s ARPA-H RAPHAEL Right, ARPA-H CRAIG -who’s focused   specifically on healthcare applications. I  also didn't realize until this conference   that there is a national security commission on  biotech now. Are you involved with them at all? RAPHAEL I haven't   been involved to this date with some of those  discussions I have been involved with others. I  

would say that the way that I would describe  it is, there's this realization across the   government in some ways that biotech is a  major area of innovation that needs focused   investment. And I think that the US wants  to continue being the leader on the global   stage there specifically. So it's thinking  through carefully how to back these kinds   of things and how to potentially increase that  investment in there. So I think there's a lot  

of interesting conversations, I mentioned  the one I had on Monday on this front,   as an example. Especially in light of these new  AI applications, you've had these executive orders   get signed from Biden, the White House, really  increasing the focus on these areas overall. CRAIG Yeah. Do   you have any trouble with funding or is there  plenty of money for initiatives like yours? RAPHAEL I would say AI continues   to be a very strong space for investment as a  whole. I think the biotech market specifically   has been going through a bit of a rough patch  since the pandemic. However, AI for biotech is   the bright spot across that landscape, would  be one way that I think about it. Because,  

it's really showing, [Hey, we've seen what  ChatGPT can do, we've seen what AlphaFold   can do.] There's a lot of excitement in some  ways and it's almost like a reverse hype thing,   is one way to think about it. The people that are  closest to it are oftentimes the ones that are   the most excited about the whole thing, which is  kind of cool to see. Especially on the AlphaFold  

side of things. ChatGPT, now everyone knows  about that and is excited about that already;   I don't want to say that that hasn’t gotten  into the public consciousness at his point. CRAIG Yeah. To that point–   and he's on the commission, I can't remember  his name; he’s the founder of Ginkgo Bioworks.

RAPHAEL Very nice CRAIG Yeah, he was saying that he thinks within   two or three years, and maybe it'll be atomic AI,  there's going to be a ChatGPT moment for biotech. RAPHAEL I would very much agree with that.   I think that there's going to be some really  exciting developments in the next few years.

CRAIG Yeah. Is   there anything I haven't asked that I should ask? RAPHAEL  I think this was really good. I covered most of  the points that I wanted to hit here; trying to   integrate the wet with the dry lab together  to make that cycle, these RNA breakthroughs,   why we should care about RNA specifically- is  key, Atomic is really trying to bring RNA and AI   together to usher in that new generation. I think  we covered it actually, great set of questions. CRAIG I do have one   question. What kind of compute demands do you  have? Is there plenty of compute for what you   need? Again, I had this conversation  with a guy named Brian Spears at the   Lawrence Livermore National Laboratory, who's  doing work on RNA shape discovery but they've got   the most powerful computers in the world at their  fingertips. How are you handling compute needs? RAPHAEL Yeah. I think we always need  

more compute, is really the honest answer. We've  bought a bunch of GPUs that we have on-premise,   basically. So we always have access to and  that lets us do a baseline of work basically,   over there. And these are H100, basically,  which are like the top-of-the-line GPUs that   you really need for this kind of work. On top of  that, we burst up the cloud; when we're doing a  

big training job, then we'd go to AWS, GCP, or  what have you and train is there. Now, the issue   is always getting quota on these things, getting  enough allocation on the cloud providers to run   these kinds of things. In theory they have a lot  but you can’t always get access to it if there's a   lot of folks training at the same time. So we're  always hunting for more computers, is maybe the   simple way of putting it. I used to work at the  Department of Energy a little bit during my PhD,   using the Summit supercomputer, which is, I don't  know, some ridiculous amount of H100s; and it'd be   very nice to just have that at our fingertips at  some level, like 27,000 H100s or whatever it was.

2024-08-13 09:09

Show Video

Other news