How AI and RNA Tech is Transforming Drug Discovery | Inside Atomic AI
CRAIG Why don't you go ahead and start by introducing yourself, then we'll get into some questions? RAPHAEL Yeah. So briefly, I'm Raphael Townshend, founder and CEO at Atomic AI. My background is really coming more from the AI space. Originally, I did my undergraduate at UC Berkeley in electrical engineering and computer
science. I started my PhD working in computer vision but fairly quickly, actually, transitioned- this was at Stanford, into working on structural biology applications specifically. So taking a lot of the tools that had made such a difference in computer vision with things like self-driving cars or in natural language processing, with things like ChatGPT, seeing if we could apply them now to the field of biology and especially the structural biology space that I was talking about. It's really about understanding the shapes of molecules, kind of understanding the shapes so you can better understand what they do. So I started working in that space. A seven-year PhD later- of banging my head against those kinds of problems, I really started getting some good success; including, I guess, what would be the founding work of Atomic, which was basically this highly accurate predictor of the three-dimensional structure of RNA molecules known as AlphaFold for RNA, which was a big breakthrough in the space recently. So that work ended up featuring in the
cover of Science in late 2021. From there, I really started Atomic to continue leveraging and developing those kinds of technologies to really enable this next generation of RNA drug discovery. CRAIG And when you say, AlphaFold for RNA, were you following DeepMinds research and applying it to RNA? Or is it just analogous to what DeepMind was doing with protein folding? RAPHAEL Yeah. So I actually
worked on the DeepMind team a few years back so I'm well familiar with their work as well. While there are certainly similarities, there's also a need to redesign a lot of the algorithms, the AI models from the ground up. So RNA, while at a high level is a similar molecule to proteins, there's a lot of intricacies that require custom AI models built. For example, RNA is a much more flexible molecule than proteins. So really understanding those dynamics of it is a key piece that you need to build into these kinds of algorithms. CRAIG Yeah. So
the algorithms that you worked on that became Atomic AI, came out of your work at Stanford, is that what you said? Or came out of your work at Google DeepMind? RAPHAEL This is out of my work at Stanford, actually. Before even working at DeepMind I had already developed the core of these algorithms that worked quite well on RNA specifically. It was very cool to be making those breakthroughs and seeing the AlphaFold, for proteins, breakthroughs happening very shortly thereafter. It was a very interesting time overall, is perhaps one fun way to put it. The cool sign on the AI side for RNA is that the amount of RNA data that was publicly available is much smaller than the amount available for proteins. So you needed to design some very, sort of bespoke algorithms to work on the limited data. The original science paper, for example, was trained on just 18 RNAs
total. So a very small number, right? But you could see you could do surprisingly well given that small number. The problem was definitely not solved. I don't want to claim that it was solved at that time. That plugged into a lot to what we're doing at Atomic these days, which is really building upon that advance. But, it really showed the power of these custom kinds of algorithms. CRAIG Yeah.
For people that don't follow biotech, because a lot of the listeners are AI people; can you explain very briefly what RNA is, why it's important in drug discovery- everyone knows that the COVID vaccine was an RNA vaccine but they don't necessarily understand what that means, and why understanding the shapes of molecules would be important for RNA therapies? RAPHAEL Two great questions, Craig. I would say first of all, RNA, maybe if you remember the most basic biology lessons, the central dogma in some ways; we have DNA encodes the information, which then goes to RNA, which then goes to proteins. For a long time, people thought of proteins as the workhorses of the cell, the things that did everything and RNA is just a messenger that codes for the right proteins. Now, it turns out that view is not quite right. In fact, there's this vast world of RNA that's kind of doing its own set of functions beyond just coding for proteins. In fact, there's this really nice hypothesis, the RNA world hypothesis, about how all of life was first RNA-based, then DNA and proteins came afterward. An interesting other stat that's kind of fun to think about is, if you look
at the human genome, about 1% to 2% of it codes for protein. So about 1% to 2% of the human genome becomes proteins but about 80% of the genome becomes RNA at one point or another. So there's about this huge world of RNA that never even becomes proteins that we're just barely scratching the surface of. So with all that context in mind, we've got proteins- seen as the workhorse of the cell. If you look at most drugs on the market today, they're all going after proteins. It's been seen as the main target. That's been great in many dimensions but there's been a number of diseases
that have been essentially undruggable at the protein level. We've been trying to drug it for 40 years and we just can't go after some of these proteins. And these are really high-value things. There's this one protein called c-MYC that's involved in 75% of human cancers that we've just failed to get any drugs to. So the idea is, now you've got your DNA, goes to RNA, goes to protein;
you go one step earlier in the process and you go after the RNA that codes for that protein, for the c-MYC protein. In this way, you're sort of going after these diseases that were previously undruggable by increasing the attack surface and giving yourself new ways of striking at them. So that kind of tells you a little bit about the RNA and why you care about RNA drug discovery, specifically. There's a lot more complexity, I'm simplifying at some level but that's one
of the key messages there. The other piece of why you care about the shapes of the molecules, and it's very linked RNA in some ways, but it also has this much broader kind of potential to; it's fundamentally about the fact that the shape of a molecule kind of determines what it does. Structure determines function, is what people say. And it's almost too obvious in some ways. The shape of a bike is important to what it does. If the wheels of the bike were in the wrong place it wouldn't do a very good job moving you around. Very similarly, the shape of a molecule is really key to what it does. So you need to make sure it's all in the right place to perform its
function adequately. So sometimes when there's a disease you can understand almost by looking at the structure, looking at the physics driving these things and understand from first principles, what's going wrong. Then you can intelligently design medicines to solve those issues. This is a process known as rational design, which stands in contrast to the traditional way of doing it; which is more throwing things at a wall and seeing what sticks, which is more known as phenotypic screening- would be the technical term there. And it's kind of seen as one of the future directions of drug discovery and it's playing an especially critical role in the RNA space because there's a lot of things that we need to really understand from first principles to design well. So I'll stop there for a second but hopefully that makes sense, both on the RNA- why that matters to a huge degree and why the shapes of molecules matter as well. CRAIG Yeah. Given that context, can you just describe, quickly, what the RNA vaccine for COVID was doing
and how understanding the shape of the RNA molecule played into that therapy? RAPHAEL Of course. The mRNA vaccines was one of the first big breakthroughs in this RNA drug discovery landscape, I think everyone got to see that firsthand. Fundamentally, what the RNA side of things is doing is it's coding for specific protein that is part of the Coronavirus, the spike protein. So if you remember, you've seen all these graphics, you've got the virus particle and you've got the spike protein sticking out of it. The RNA is basically coding for that spike protein but not the rest of the virus. So then that gets into
your cells, your cells produce a lot of the spike protein, and then your immune system essentially learns to recognize that and says, [Oh, that's a foreign thing that we've seen], and can train itself right off the bat to find anything that presents that spike protein in the future. So then when the real virus shows up, it's already trained, your immune system is already trained to knock that down. Now, the part where a lot of the structural piece comes in is actually for making the next generation of those mRNA vaccines. In particular, one big issue that's been presented with the RNA vaccines is the need for cold chain storage. The Maderna or Pfizer vaccines need to be stored at very low temperatures to be transported around the world. For example, getting that to
low-income countries is a challenging proposition. So what you'd like to do is you'd like to make those vaccines more stable, the RNA molecules more stable. In that case, what you're doing is you're trying to find the shapes that are well folded, that are more resistant to falling apart, basically more stable overall. So in that way, you're rationally designing the next generation
to be a better version of the first gen. So there's this nice interplay where there's all sorts of different properties of these RNAs that you can optimize through that kind of approach. CRAIG Yeah. Since I have you explaining this stuff, you introduce an RNA molecule and it then goes to the DNA? Explain the mechanism. RAPHAEL Of course. Biology is fascinating at some level
and there's a whole bunch of hidden complexity in there. Fundamentally in this case, actually, the RNA is not becoming DNA at some point, it's just staying as RNA. And then your cells machinery, there's these other molecules known as ribosomes that can translate that RNA into proteins for you. It’s kind of cool; the ribosomes themselves are actually mostly RNA molecules. You can kind of see how it bootstraps itself. You have these RNA molecules that are responsible for turning RNA
molecules into proteins. That's kind of one of the reasons people think that RNA might be the first source of life. So anyways, you've got these RNA molecules and then these ribosomes come and translate them into proteins, into the spike proteins. Eventually, the RNA molecules
get degraded and thrown out of your cells but before that point its produced enough of the spike proteins for your immune system to recognize them. And sometimes you need a couple of doses. That's why there was a couple doses of some of the COVID vaccines; is because not enough of the of the spike protein gets produced the first time so you need a second dose to produce some more of it. CRAIG Okay. So with Atomic AI, you're designing RNA molecules or you're understanding the shape of existing RNA molecules? Yeah, talk me through that. RAPHAEL Yeah. Another great question, frankly, because there's kind of these two broad categories of RNA technologies- is how I think about them. They're
the mRNA vaccine kind of category where it's on a base, the medicine itself is the RNA. For example, you're injecting some RNA to your body and it's producing it. But then there's the other category, which is RNA targeted, where you're targeting the RNA that's already in your body. Then
finding the medicine needs to hit that because your body, as I said before, 80% of the human genome becomes RNA at some point or another. So there's a lot of that sitting around. So while you can actually apply this fundamental technology to both of these, the RNA-based medicines as well as the RNA targeted ones– and what I was describing about making the vaccines more stable is applying it to the RNA base piece, we can also apply our technology to the RNA-targeted piece to understand the shapes of the RNA molecules already in your body to then go and target those selectively. That's actually the initial focus of Atomic AI; is understanding the shapes with the RNAs already in your body, folding all of those, predicting the shapes of all of those molecules, and then targeting those. But maybe this hopefully paints the picture of how this technology can be very broadly applicable, because even here you can see how you could apply it to these two fairly different RNA technologies but in both cases, it can make a huge difference in drug discovery.
CRAIG Yeah. And on the COVID vaccine, the vaccine is introducing RNA molecules into the body. RAPHAEL Exactly. CRAIG In the targeting RNA that exists already in the body, you're targeting it to turn certain RNA molecules on and off, or to modify the behavior? What are you doing with the targeting? RAPHAEL Yeah. There's a huge range of things you could do there. The thing that we're fundamentally focused on is, as you say, turning it on or off. The easiest thing you could do in some ways is- a protein that you want to go
after, you want to decrease the amount of that protein, so you go and hit the RNA instead. And you're like, [Let's go and destroy that RNA], and then there'll be less than the protein and then you'll have gone after this undruggable disease that you couldn't hit at the protein level. CRAIG And why wouldn't you be able to hit at the protein level? I mean, is it understanding the shape of the protein or finding molecules? As I understand it, the importance of understanding the shape of a protein is, if you want to find a molecule that will fit into a pocket in the protein, for example- to prevent the protein from binding to other things. Is it that sort of thing that– you're looking for a molecule that will fit into the RNA molecule and stop it from functioning? What exactly are you doing at the shape level? RAPHAEL Yeah. I think you're exactly right; shape is the
key in both the protein and the RNA level. The big reason some of these proteins are undruggable is because they're disordered. They're not actually adopting any single shape, they're just completely floppy. So they don't present any pockets for you to hit in the first place. Like this c-MYC protein that I was talking about, that’s involved in 75% of human cancers, it's just disordered.
There's no pockets for you to go after. So the idea is, instead, you go after the RNA and you try and hit the shapes at that level instead and design things. It turns out that understanding the shapes at the RNA level is really critical as well, to get molecules that are selective, basically hitting just that RNA and not a bunch of other things, and functional- that do the thing that you need it to do. In this case, degrade the RNA or prevent it from making more of a protein. CRAIG Yeah. And how many RNA molecules exist in the body and how do you know which ones to go after? RAPHAEL There's far more RNA molecules than there are proteins. If you're saying the number of distinct kinds of RNA molecules, you'll remember that stat
I was talking about, how there's like 80% of the human genome becomes RNA but only like 2% becomes proteins. So there's just this vast number of RNAs and some of them don't even code for proteins, they just do other things in your body. We're just trying to understand, still, a lot of the biology behind those. If you just think about the ones that code for proteins for a second, the mRNAs, the messenger RNAs- that's what that stands for- the ones that code for the proteins, then there's hundreds of 1000s of those, of distinct kinds of mRNAs there. So that's just one small piece of the RNA world but it already presents this huge possibility of potential targets to go after.
CRAIG Yeah. And in the case of the COVID vaccine, they identified the mRNA that created the— RAPHAEL The spike protein, exactly. You're exactly right. But then there's other kinds of RNA. There's non-coding RNA; RNA that doesn't code for proteins are known as ncRNA. Or there's rRNA, which is ribosomal RNA.
There's all these different categories, and kind of fun- people just stick little letters in front of RNA or after the word RNA to indicate these different categories. CRAIG Right. Is Atomic AI's mission to define the shape— as has been the case with AlphaFold- they just came out with AlphaFold 3, RAPHAEL Yeah, very exciting, CRAIG to define the shape of all of the existing mRNA molecules? Or, are you selecting certain mRNA molecules that you know are related to certain diseases and just focusing on understanding their shape? RAPHAEL I mean, the long-term dream is really about building a map of every RNA that exists as well as enabling the design of new RNAs that we've never seen before. You've sort of seen what AlphaFold can do in the protein landscape;
people are using it for all sorts of things. The idea is to bring that revolution to the RNA space as well. One simple way I think about Atomic, it's like, we're trying to combine that big RNA breakthrough with the COVID vaccines. And there's been a number of other RNA technologies that have come to fruition in the last few years, with the AI breakthroughs, like the AlphaFolds of the world. So there's that dream in the long term. Our immediate focus is on going after very specific RNAs and showing the potential on a few test cases first. Showing like, [Hey, you can do this here to get to this point and design these new drugs that are very exciting.] That then paints the map of what you can do in a much broader landscape.
CRAIG Yeah. And we'll talk about the AI behind it in a minute but what are the targets that you're focused on right now? I mean, you mentioned this c-MYC protein. Is that one that you're working on? RAPHAEL So we haven't disclosed our targets, are specific ones at this point, but I would say they very much fall in that category. There's a number of these undruggable protein targets that you're trying to hit at the RNA level instead. And a number of these targets are in the cancer space- is a big area that we're looking at. I can continue using c-MYC as an example actually but it's a stand-in for
many other possible proteins you could use. That's essentially known as a transcription factor. It regulates how much of every other protein is made. So you get too much c-MYC, you get too much of every other protein, you get uncontrolled replication of your cells, you get cancer; so you really want to decrease the amount of it. There's many proteins like this so you really just want to decrease the amount of it to control cancer spread. So that's one big area of focus. The second big area is neuroscience, diseases like neurodegenerative diseases basically. So think about things like Alzheimer's or Parkinson's in that case.
CRAIG Right. Where, again, you're attacking the RNA that builds the proteins that are causing the disease. RAPHAEL Exactly CRAIG At the therapeutic level, if you understand the shape of those RNA molecules and you create a molecule that interrupts that mRNA molecule, do you then just inject the therapeutic into the bloodstream and it finds the mRNA? How does that work? RAPHAEL It's a great question. So the technology– what you could use to target the RNA, there's a whole bunch of different ways you could do that, different modalities, is what people call those. The one that we're focused on today is small molecules. Small molecule is like 20 atoms, whatever; they're small basically. And
it's very easy for them to get around and in your body is really the key thing there; delivery is easy. It's really classic, any drugs on the market or small-molecule drugs, most of them target proteins but we're doing RNA-targeted small molecules basically. The nice thing about these is you can oftentimes just take them orally, you can just take them as a pill. That makes it really easy and then you can get across your body in different ways. So you've seen a lot of these new technologies like the mRNA vaccines or others where you need an injection or sometimes you even need a surgery or something like that to get them. The beauty of this kind of approach
is that you can then bring that back to just being a pill that you can take again and you're going after these diseases that you don't really have other nice ways of hitting them otherwise. CRAIG Yeah. We spoke about Insilico and I've interviewed Alex Z.; I won't try and remember how to pronounce his last name, on the program but they're looking at the universe of molecules, small molecules, and trying to narrow the search space based, as I understand it, on the properties of those molecules before putting them into trials. So you raise your potential success rate at the trial level? Are you talking to them at all? It seems like this would be a more precise way of narrowing the search space if you understood the shapes of the molecules. Or, are they doing that? RAPHAEL Yeah, I mean, I haven't talked to Alex recently or anything. I mean, there's this huge space of exciting groups
pushing these kinds of approaches, I would say. I think they are very much using AlphaFold-like approaches as well. I believe I've seen that work but at the protein level primarily because AlphaFold for proteins already exists and has been handed out to the world, and everyone can use that for what they have. I think that there's this recognition across the field that, [Oh, understanding the shapes of these molecules is really powerful and can let us do a lot of things.] From our standpoint, we're trying to make the same thing happen in the RNA space, again. I can say, I don't know about Insilico medicine specifically, but I can tell you firsthand, there's a lot of groups that are quite interested in understanding the shapes of RNA molecules to make this dream of rational design happen there as well. I definitely think
there's a recognition across the field that these AlphaFold type approaches, these rational design type of approaches are really the next wave, the next generation of these kinds of approaches. CRAIG Yeah. So now, the AI behind Atomic, I interviewed Oriol Vinyals on the program about AlphaFold. It was maybe a year or two ago. To my
understanding, it was an extension of their work in AlphaZero, where it's a combination of search and reinforcement learning to come up with, sort of, candidates. Then they had a second system that ranked the candidates and then would narrow that further for testing. Is that essentially what you're doing with RNA? Why don't you walk us through the system that you've built? RAPHAEL Of course. So some of the initial systems were very much along those lines, like three or four years ago, I would say, you create a bunch of candidates and then you rank them using the scoring functions. I think since then
we've actually dramatically overhauled a lot we've built, and to be fair, a lot of the field has been moving in this direction more generally. And we've been building these big transformer-based models, first of all, these things that have made such a big difference in the power of technologies like ChatGPT. Then we've used those to directly generate the structures of these molecules through these generative AI approaches, might be one way to think about that. So in this case, you can just take an RNA sequence and it directly can produce a three-dimensional structure or even a set of 3D structures if you think that it's dynamic and might adopt different shapes over time. So that piece has been very interesting. Part of what we've needed to do to enable that is,
these transformer base models are very data-hungry. You couldn't really do the 18 data point thing we did before, where you only trained on 18 RNA structures to build this kind of thing. But, we recognize that it was critical that we move to those kinds of architectures for the long run to really crack a lot of this problem. The other piece of Atomic, I've been talking a lot about the algorithms, is we also have our own in-house wet labs that we use to generate our own data at a very large scale to train these AI RNA models. In some ways we have these top-tier AI folks working but then we're also building the right data that's purpose-built for these kinds of models. And this is really a broader trend that I'm quite excited about,
which is this integrated lab, sort of computation approach, where you have this iterative cycle where you're generating data that then makes the AI better that can then feed into more data, that's more targeted to further improve the AI to get that, like, virtuous cycle of improvement. CRAIG So can we take the example of an undruggable protein- and you understand the mRNA that generates, I guess- is that the right word? that protein, or builds that protein, or has the instruction set to create the protein and you want to interrupt that mRNA molecule. The first step then is to understand the shape of the RNA molecule. So at that point, what do you do?
RAPHAEL It's actually quite interesting because in theory, you don't need to know the shape to start trying to go after it. You could just take your RNA and start throwing molecules at it and see what happens. In fact, that's what the first generation of companies in this space did about a decade ago, at this point. The issue there is that the molecules were generally not potent, first of all. They didn't do what they needed to do. It would stick to the RNA but it wouldn't destroy it or decrease the amount of protein etc. On top of that, they're oftentimes not selective. So they bond to that RNA but they also bond to
many other RNAs at the same time. So there's issues of potency and selectivity. That's really where the shape starts coming into play. Because you want to find the unique 3D shapes, the unique pocket where you could get a molecule to stick there and then it will. You could optimize it to bind to that location, not others. On top of that, if there's a well-defined shape somewhere; nature doesn't waste effort generally. So those things are generally the pieces that are actually
functional and doing something interesting, versus some generic piece of RNA might not actually be doing any interesting function. So shape then becomes critical to get over these barriers of function and cell activity. To give you an example of what this does then, is you could find a shape in the RNA molecule that's responsible for keeping it stable, I mentioned stability before, and you can destabilize that structure with a small molecule. It's kind of like it binds in there and it makes it less structured, basically. Then that lets enzymes, other proteins that are responsible for chopping up this RNA molecule, have an easier time to doing so. So you're destabilizing it, you're more prone to getting chopped up, and that decreases the amount of it, decreasing the amount of protein- cures cancer.
CRAIG But how do you discover the shape? I mean, that's essentially what you're doing, right? RAPHAEL Exactly. This actually gets at the reason why folks are so excited about things like alphaFold in general; because the traditional way that you find these shapes is through these very expensive, very slow experimental techniques. They have names like X-ray crystallography, or cryo-electron microscopy, etc. And we could get into the technical details of how they work but really, the key to remember is that these things can take months or years to solve a single structure. I have a good friend of mine, he spent his entire
PhD solving a single protein structure using these techniques. And these machines, the cheap machines in some ways, cost millions of dollars. So if you can take this process that takes months or years to get a single shape and then use these AI approaches to bring it down to minutes or seconds instead, that's a big deal. People really care about that, basically. So what we're doing is
we can, instead of relying on these expensive techniques and trying to run those over time, which over the past couple of decades have solved maybe a thousand RNA structures total, depending on how you count. That's actually generous, maybe it's a couple hundred; you can instead take these AI approaches and just map out everything at once and find these interesting structures, you're finding the pieces that are folding into nice pockets that are targetable in the first place. So one way that I talked about what we're doing is we're essentially taking the space of all the RNAs in your body and identifying which parts of it are the most structured and targetable through drugs. CRAIG Okay, how do you do that? Is there a library of mRNA shapes already? RAPHAEL You asked me how you train those kinds of models in the first place; there's one thing. So there's an existing library of shapes that have been solved through these expensive experimental techniques, as I was talking about before. Maybe you have a couple hundred of those. So that's the starting point,
that's your gold standard. Now, as I was just saying, these AI approaches are very data-hungry in general. And you can get some initial bang for your buck through being clever on the algorithm design; that was the initial science paper. But, eventually you run into the bitter lesson of, you just need more data and more compute to really get over some of these things. So this is actually where RNA itself ends up being quite an interesting molecule because you can design experiments that are very high throughput, that give you lots of measurements, even parallel for RNA specifically. That's because you can connect it to DNA sequencing and the cost of DNA sequencing has fallen off a cliff over the last couple of decades. So we
can design these experiments that can measure tens of millions of RNAs in a single shot. So let me explain how one of these techniques works just to paint the picture for a second. So you have your RNA molecule and you expose it to a chemical and that chemical will go and nip the RNA at different points, it'll kind of damage it, basically. Then you can convert that pretty easily back to DNA. Then you can run it through your DNA sequencer and the parts that got damaged, basically, will show up as mutations in the DNA sequencer. It'll be errors in the DNA sequencer.
So now you can very easily pick up where these chemicals damaged the RNA molecule. The key is that the places where it gets damaged are very linked to the shape of that molecule. One very simple way of thinking about it is that the outer parts of the RNA are going to get more damaged than the inner parts. So now you've got these measurements that are telling you something about the structure of these RNA molecules. And because it's DNA sequencing link, you could just run this in parallel on a huge number. On top of that, it's actually really easy to make at least shorter RNA molecules at a very large scale. This process known as oligo synthesis, lets
you just do it synthetically and create, like, millions of them at once. Versus for proteins, you can’t actually sequence those. You can't run those through DNA sequences very easily. On top of that, making them is harder, you kind of have to have the cells produce the proteins for you. So in some ways, the fact that we're operating at the RNA level makes this job easier because you can make and measure the RNA molecules at a much higher throughput than you ever could for proteins.
CRAIG And can you then model the RNA shape in a visualization the way that you can with AlphaFold? RAPHAEL Exactly. So you end up with a shape of an RNA molecule. I can even show you sound or something like that. How I would think about it is- an RNA is just made of atoms at the end of the day, like a protein or anything else and it's just the atoms in 3D space. It's like, okay, you've got a carbon over here, you've got a nitrogen over here and they're bonded together, right? So you can look at those and you can sort of see how it's structured. You can even start simulating them, you could just run the laws of physics on that. That's another process that's super interesting and can tell you how it flops around over time.
CRAIG And then to create the small molecule to bind to the RNA, once you understand the shape of the RNA molecule, do you search through some search space for an existing molecule that has a corresponding shape? Or, do you then synthesize a unique molecule? RAPHAEL Yeah. I mean, there's a number of different ways you could do it. One of the common ways is really this process known as docking basically, where you've got the shape and you're just trying different molecules and saying, [Do they fit? Does it interact physically well with the other molecule?] So you could search for these very large spaces of possible molecules, including things that have never been synthesized before and then say, [This one looks good. Let's go and make this one],
then actually test it in the lab. The idea is, you're kind of searching through this massive space and then narrowing it down. That's one common way. There's a few different ways you could use these things but fundamentally, you've got the shape, you understand what you're trying to go after, and then you're trying to find molecules that interact with that shape. CRAIG Which do you do, which does Atomic do? RAPHAEL So we do a lot of this docking type of approach, as I mentioned, fitting the molecules in there. Then we combine that with more traditional kind of screening methods as well, which once you've identified where the shape is, you could just isolate that shape and then you can throw a bunch of molecules in a lab setting at it as well. So
generally, we apply both of those techniques together, the joint computational experimental piece. You can even combine those together because if you've run it in the lab, you can then feed that back into your AI and do a better job of docking, for example, the next time around. CRAIG Yeah. You were talking about transformer models. So this isn't search and reinforcement learning, you're generating molecule shapes. RAPHAEL Exactly CRAIG And then if you find one that computationally seems to fit, then you synthesize it and test it? RAPHAEL Yeah. You get to some RNAs,
you generate their shapes, you screen small molecules against it computationally; if something looks promising, you go and synthesize that, you test that first in cells and then in animals and you keep pushing towards the clinic on that front. CRAIG Yeah. Then eventually into human trials, I presume? RAPHAEL Exactly CRAIG Where are you in that whole process? Where is Atomic today? RAPHAEL It's actually an exciting time for Atomic because we're just starting to test in animals for the first time here. I would say, it's still what I call the preclinical place. We're definitely early stage in many ways but this is really the first time that we're going beyond cells. We've seen
our technology work well within cells and now we're trying to get the next layer up, the next higher-level organism in some ways, and really putting a lot of the platform to the test. And we're anticipating getting a lot of that initial data in the not-too-distant future. Personally, it's a cool moment because I've been working in the space for ten-plus years at this point, and it's like, okay, the dream is starting to become a reality in some ways. CRAIG Yeah. I haven't looked at AlphaFold 3 yet. I just saw the announcements. What's different with AlphaFold 3? And, are you adopting whatever changes they made in AlphaFold– I've forgotten, did you call it AlphaFold RNA, or? RAPHAEL Our core model is known as ATOM-1. It's an RNA foundation model is what we call that. I should plug the name a little bit,
I suppose. I think we're pretty excited about AlphaFold 3 overall. I think this space as a whole has been moving really quickly. There's always these new advances coming from different groups. And you always make sure you read through the papers and understand and integrate the pieces that are useful from these different advances. I think the key that's happened with Alpha 3 is that they've expanded their modeling from just proteins to much broader states of molecules. That includes
RNA as well, to be clear. And they've seen some pretty good success in at least some of these new molecules in doing this kind of modeling. For example, they've seen pretty good success at modeling protein-small molecule interactions, a little bit more like what Insilico Medicine might do, of modeling protein-small molecule interactions. They've done a pretty good job at DNA or things like that. However, RNA is one of the areas that still has room for improvement, at least based on their studies, because they still don't have state-of-the-art as compared to more traditional methods there. Fundamentally, that comes down to the fact that there isn't that much RNA data out there that's public. This is one of the key bets of Atomic. You can do very
well in certain areas where there's a lot of data but in others, you really need to invest carefully in collecting the right kind of data as well. So I think there's a lot of very useful components that have come out of the AlphaFold 3 kinds of approaches and it's a fun paper to read overall. I know a lot of the team members they're quite well and I'm pretty excited for them. It's really taking those pieces, combining it with the data that we have already that we spent the last three years collecting in-house to really try and create these continued breakthroughs in the RNA space. CRAIG Yeah. How many RNA molecules have you modeled successfully, to the point that you can synthesize molecules that will fit or bind with them whether or not there's therapeutic interaction? How many have you done so far? RAPHAEL Yeah, it's a super interesting question.
I would say, there's different levels of validation you can do. You can make the molecules and then you could test them in certain ways. One answer to that is tens of millions or hundreds of millions for the highest level validation where we made these things, and we've tested them, found their structures, etc. But then, the number of things that we're testing in animals, which is several steps later down the process, we're just getting our first one there. So you
can think of it as a funnel that’s starting from this huge number and where each step validation is getting smaller and smaller. But, it's the tip of the spear there. One stat that I like to give is, there's a level of accuracy you need to do this kind of rational design approach. It's like, you want your structure to be this close to correct for you to be able to model the interactions with a small molecule that could bind there. If your pocket is completely wrong, it's not going to really help. What we've seen over the last couple of years is that on average, the structures that we're making are sufficiently accurate now to be able to do that rational design kind of approach. I'm not saying that they all can do it but the original science paper that we put out, maybe it was like 5% or 10%. Don't quote me on that, I don't remember the exact number but it was a small
number basically, a relatively small amount there. But since then, over 50% of those structures are now sufficiently accurate to do this kind of approach on, which personally is the Holy Grail in some ways, as far as I'm concerned. It’s like, [Wow, we can actually use this reliably.] CRAIG Yeah. And once you get the process down,
is it a matter of just running this iterative loop until you reach a level of accuracy? RAPHAEL Exactly. You need the right level of accuracy. And I don't want to overly simplify either, you can definitely turn the crank and eventually, you'll get there. The one thing that's interesting and difficult in biotech is that biology is complicated and there's 1,000,001 things that could go wrong in various ways. You get a molecule that's very potent and selective, for example, but if it doesn't circulate well through your body, then it's not going to do much anyway.
So there's a lot of pieces that you need to put together. We're not replacing the entire drug discovery and development process wholesale here but we're really honing in on some of the key aspects, some of the critical bottlenecks that have hit the field and made those better. We're not replacing animal testing, as an example, we're not replacing the clinical trials themselves but we are getting to faster, better molecules to run through those things. CRAIG Yeah. And your focus right now is on narrowing to molecules that you can test or is it on the more general, as you were saying, the dream is to eventually model all RNA molecules? Or, are they happening in tandem? RAPHAEL It's really in tandem, in some ways. How I think about it is, you got the long-term bets, then the midterm kind of things, and the short-term pieces that we're doing. Especially for science-heavy company like Atomic, you need that balance because
you want to be looking at that near term of, [Let's actually show this thing can deliver on the promise in some cases but then also enable a much broader space at the same time.] So we continue to build what I described as RNA foundation models, collecting these very large data sets to build accurate models of RNA structure, of RNA function, etc; enable RNA design, that's an active area of research here at Atomic. On the other hand, we're also advancing our first programs into testing in animals narrowing the search space, as you say, to find those molecules and testing them. And I think this is a little bit of my own philosophy as well, which is, you want to be applying the technology that you have or putting it to the test as much as possible. I'm a big believer
in building the thing that is useful, that will actually move the needle, as opposed to designing it in isolation and then figuring out how to apply it. Because I think that by applying it and looking at where it's useful versus not, then you can guide further efforts in that direction and really build the foundation models that are useful and that are really going to make a difference. CRAIG Yeah. But the activity, you've got this animal trial; if it's successful, do that move into human trials? I understand the long-term goal but is the company at this point focused on generating data that then can train better models? Or is it– I guess I asked this already, is it on coming up with therapeutics? RAPHAEL Yeah. The boring answer is really both, frankly, is what I'm getting at in some ways. And these two things are very linked, in some ways, actually. I think you're getting at a very important point, though, because of these first
trials that we're doing, where we're testing in animals, for example, that's going to generate a small amount of data. And you can only feed that back in but oftentimes, the kind of data that's really critically useful for training these big AI models looks fairly different than the data you get out of any given drug discovery program. Maybe concretely, the way I could paint this for you is that we have a team that's dedicated to generating data for the AI models specifically. And we have another team that's dedicated to pushing forward the drug discovery programs. There's a lot of cross-interaction, cross-pollination between those two but there are specific folks with very specific mandates at Atomic along both those lines of what you just described. CRAIG Yeah. And remind me when did you form Atomic?
RAPHAEL Yeah. It's been three years that we've been going now. We're like 25 people today, half AI scientists, software engineers, etc, half biologists, medicinal chemists side of things. I've been saying this “both” thing
a lot here and it's fundamentally we're trying to build this interdisciplinary organization that has that expertise across these different spans. We're also co-located in the Bay Area to enable that interplay and interchange of ideas. So I fundamentally believe that a lot of the key innovation for a place like Atomic happens in that white space between established fields in some ways. It's like, you want to build that new field there at the intersection. CRAIG Yeah, I was just in DC talking to the director of the National Science Foundation. How
much are you depending on government grants? This sounds like something the NSF or other government organizations would want to fund. How much you depending on venture capital? RAPHAEL Yeah. So at this stage, we're mostly venture capital-based, however, you're completely right. There's a major government angle to this whole thing. In fact, it's funny that you asked the question on government; I literally met a Secretary of State Antony Blinken on Monday, talking about AI applications for biotech specifically and making this argument that some of these large data sets you need to collect to build these foundation models are like really key. To build the best foundation models you need the best data sets that require
sustained investment, long-term investment, the kind that the US government is uniquely suited to provide. So you're actually hitting on a very interesting point there, that I think is very much an area of excitement for Atomic and this AI for biotech field in general. CRAIG Yeah. I didn't realize that ARPA, is that how you pronounce it? RAPHAEL Yeah, ARPA CRAIG Now there’s ARPA-H RAPHAEL Right, ARPA-H CRAIG -who’s focused specifically on healthcare applications. I also didn't realize until this conference that there is a national security commission on biotech now. Are you involved with them at all? RAPHAEL I haven't been involved to this date with some of those discussions I have been involved with others. I
would say that the way that I would describe it is, there's this realization across the government in some ways that biotech is a major area of innovation that needs focused investment. And I think that the US wants to continue being the leader on the global stage there specifically. So it's thinking through carefully how to back these kinds of things and how to potentially increase that investment in there. So I think there's a lot
of interesting conversations, I mentioned the one I had on Monday on this front, as an example. Especially in light of these new AI applications, you've had these executive orders get signed from Biden, the White House, really increasing the focus on these areas overall. CRAIG Yeah. Do you have any trouble with funding or is there plenty of money for initiatives like yours? RAPHAEL I would say AI continues to be a very strong space for investment as a whole. I think the biotech market specifically has been going through a bit of a rough patch since the pandemic. However, AI for biotech is the bright spot across that landscape, would be one way that I think about it. Because,
it's really showing, [Hey, we've seen what ChatGPT can do, we've seen what AlphaFold can do.] There's a lot of excitement in some ways and it's almost like a reverse hype thing, is one way to think about it. The people that are closest to it are oftentimes the ones that are the most excited about the whole thing, which is kind of cool to see. Especially on the AlphaFold
side of things. ChatGPT, now everyone knows about that and is excited about that already; I don't want to say that that hasn’t gotten into the public consciousness at his point. CRAIG Yeah. To that point– and he's on the commission, I can't remember his name; he’s the founder of Ginkgo Bioworks.
RAPHAEL Very nice CRAIG Yeah, he was saying that he thinks within two or three years, and maybe it'll be atomic AI, there's going to be a ChatGPT moment for biotech. RAPHAEL I would very much agree with that. I think that there's going to be some really exciting developments in the next few years.
CRAIG Yeah. Is there anything I haven't asked that I should ask? RAPHAEL I think this was really good. I covered most of the points that I wanted to hit here; trying to integrate the wet with the dry lab together to make that cycle, these RNA breakthroughs, why we should care about RNA specifically- is key, Atomic is really trying to bring RNA and AI together to usher in that new generation. I think we covered it actually, great set of questions. CRAIG I do have one question. What kind of compute demands do you have? Is there plenty of compute for what you need? Again, I had this conversation with a guy named Brian Spears at the Lawrence Livermore National Laboratory, who's doing work on RNA shape discovery but they've got the most powerful computers in the world at their fingertips. How are you handling compute needs? RAPHAEL Yeah. I think we always need
more compute, is really the honest answer. We've bought a bunch of GPUs that we have on-premise, basically. So we always have access to and that lets us do a baseline of work basically, over there. And these are H100, basically, which are like the top-of-the-line GPUs that you really need for this kind of work. On top of that, we burst up the cloud; when we're doing a
big training job, then we'd go to AWS, GCP, or what have you and train is there. Now, the issue is always getting quota on these things, getting enough allocation on the cloud providers to run these kinds of things. In theory they have a lot but you can’t always get access to it if there's a lot of folks training at the same time. So we're always hunting for more computers, is maybe the simple way of putting it. I used to work at the Department of Energy a little bit during my PhD, using the Summit supercomputer, which is, I don't know, some ridiculous amount of H100s; and it'd be very nice to just have that at our fingertips at some level, like 27,000 H100s or whatever it was.
2024-08-13 09:09