Transforming Your Business with Cloud TPUs (Cloud Next '18)
Hello. Everyone thank you very much for coming this. Is my first time speaking at a movie theater I think your seats are fully, automated so, recline. Get comfortable, I'd also like to welcome everyone who's listening online I'm, here to help you transform your business with cloud GPUs, so. This. Talk is about supercomputers from machine learning these. Multiple generations of custom, Asics, boards, systems. And whole supercomputers. That we've designed here at Google and are making available through. The cloud I've, got lots, of interesting things to say about our recent product announcements, some new performance, and cost information and more but, before we get to that I just want to set the context, here, for those of you who are new to the field are new to hardware, acceleration especially. In the cloud. So. People, have been calling, the end of Moore's law for decades now so I'm. A little nervous about this but I think it's finally over, sequential. Processing, is finally, plateauing, so you can't just wait, for new processors, to come along and take your sequential, program and make it magically run faster. And we're. Reaching this plateau after. This amazing, wild ride just. At the moment that businesses, have to manage more data than ever before, now, you're all familiar with these different kinds of data. In the, images and video category you, might have billions, of images of user-generated content, or live. Video streams for, video. Calls and, short. Videos that people are sharing with each other or sharing. To show products, or real, estate or other, spaces, you've, got product imagery for retail and e-commerce you'll, hear a little bit more about that later do you a spatial imagery in ever higher, resolution, medical. Imagery navigational. Imagery for autonomous vehicles robotics, manufacturing. For quality control and more, in. The text speech and audio domain, there, are messaging systems with people sending uncountably.
Many Messages, every day product. Reviews forum comments, that you need to moderate, and rank and route, new. Voice interfaces, that are taking the world by storm support. Requests, that can come in many modalities whether in-person. Text. Voice calls even video you. Get lots of call center audio accumulating, and you want to get business insights out of it and then, music recommendation. Is just a an, entire world onto itself and these, are just a sample, of the, types of data that's accumulating, across all of your businesses, that. Could. Be very valuable for you to process. With this cutting edge of machine learning and these cutting edge supercomputers, so. The, good news is there's, been tremendous progress, working. With unstructured data especially. Over the past few years as you. Can see here looking, at image recognition in this, case on the image net standard, benchmark, there. Have been dramatic, increases in the top one accuracy, in just, five years and, the accuracy continues, to increase especially if some of these new techniques like learning, to learn or Otto amel and. It's. Not just image recognition this is not just about image, net these. New, deep neural networks and a sort of broader class of computationally. Intensive, new machine learning algorithms provide a unified, approach across, many different application, domains ranging. From imagery to speech to text to scenes and, this. Is a huge change from, when I was in grad school during. My PhD in computer vision the. Techniques weren't very similar between, say computer vision and natural, language processing whereas, now you can use a single, computational. Framework whether, that's tensorflow or another framework of your choice to. Express. Many. Different, model architectures, that all are built under the same building, blocks and those, are the kinds of building blocks that we can now accelerate, with, this new custom hardware. So. How. Do we manage these profound, increases. In compute requirements, for the most valuable new machine, learning applications, well as. You. Can see here going, back to the image recognition challenge. On imagenet, the. Vertical. Axis here is accuracy, of these models. Sort of a selection of interesting models over the past few years and then, the horizontal axis is computational, cost and as. You can see to, get to those highest levels of accuracy you. Tend to need more and more computation. Processing. Each image and also to train these models even on this data set of a fixed size with image net, and so, in some applications, you, can get away with a lower, accuracy, and that's fine but, in other applications when, you think about autonomous, driving or medical imaging or others, every. Additional, percentage point of accuracy is literally, saving lives, and so, there's there's real urgency, to. Get at this. Space, that's currently bounded. By the computation, we can afford to apply, so. Openly, I recently published a blog post called AI and compute where they did an analysis. Similar to this one but now it's on a log scale you can see the vertical axis is computation. Cost on a log scale and then here they've plotted this over time and highlighted. Several models. That are especially well known that. Have provided. New capabilities, over the past few years starting. With Aleks net which kicked off this most recent, wave, of excitement, about machine learning and then, you see even. Better image classification models, as I, showed on the previous slide, require. Even more compute, but, again it's not just restricted, to image recognition we have speech, recognition models, like deep speech to in here we, have machine translation, models requiring even more compute to, deliver these amazing. Breakthrough results in neural machine translation, we. Have real-time game playing that opening eye is working on and there will be more this August there and then also world. Champion, go playing programs, up here demanding tremendous, amounts of compute for the self play to learn the game and learn the strategies, all on its own so. If you, step back and look at this picture what you really see is that, to. Get at these valuable. New capabilities, you, need to throw as much compute as you can at the problem and that's what's been driving all this progress over time so, opening. I called out a couple of important. Insights. That I'd like to highlight here and I, won't read all these out I recommend, that you look at the blog post in its entirety but. I'd say that first, of all the. Simple rule seems, to hold across a wide variety of domains that more compute, seems.
To Lead predictably to better performance, as they, say here in the in the blog post and. It's complementary to algorithmic advances, so it's great to try new algorithms, and new ideas and as I'll say later there's, still lots of progress to be made that way but. That's. Complementary, to also having an enormous amount of compute to, drive accuracy, as far as you can go with, the best ideas that you've uncovered and, now. In the law right here this is the most interesting. Insight, that the number two track is not the, speed of your single accelerator, or the capacity of your biggest data center but it's something in between it's. The amount of compute, that you can productively apply to, train a single, model this. Is what the folks at opening I believe is most likely to correlate to how powerful the very best models in the field are and it's, really this that's driving us not just to build new chips or new, single. Machine systems, but these connected, supercomputers, that we call TPU pods because, those are enabling us to apply the. Largest, possible amount of compute to train a single, model that. Can also be subdivided to to Train many models at once so. How. Does this overview. Relate to your business well. There. Are two perspectives that I think you might find valuable, you, can come at this from the point of view of the people that you have in your organization or, that you're hoping to hire or acquire. And bring in their companies you, can also think about this from the perspective of data and I'll go, through these in turn so. First of all people if. You, look around in your organization, and you have people. Who call themselves machine, learning engineers or machine. Learning researchers. It's. Obvious, that they need an enormous amount of compute they're typically just budget, limited no, matter how much compute you have with. All these models that are so compute hungry and these enormous data sets they'll rapidly, find a way to use the compute that they have available and they'll ask you for more you're probably already hearing them ask for, as much compute as they can get, also. Data scientists, though we, and other teams across the industry are working hard to make it easier and easier to apply these techniques so. Even. If you're not ready to write a model in raw tensorflow.
That's. Okay because there are more and more open source reference models, available that, deliver, high performance, and can easily be adapted and can, fit into sort of a more standard data science workflow too even. For folks who don't have a PhD in machine learning that's not required. Furthermore. Over in product development, it's. An open, question how to take all these new capabilities that. Are just newly being, suppose with machine learning and use them to unlock new kinds of experiences. And services, I think we're just in the early stages of that at. Google i/o this year we showed a demo of Google duplex, which, is a new way of interacting with, businesses, through the assistant that's, just one example of, what I believe is going to be a whole, forest. Of new opportunities. For new products, to take these capabilities, and integrate them in a natural way into our lives. Once. You have those products, and they're out in production your, compute needs to scale with your number of users if you have something that's hosted server siding you have a billion users you've got to handle, billions, of requests, coming in simultaneously, and that requires a lot of compute and, in. Between research. To production and you. Know even once a model is live it's really important to do continuous. Integration and, testing machine. Learning models look a lot different from the software that you're probably used to testing, it's. Subtle changes, in your input data or any. Layer of your compute stack all the way down can lead to changes in accuracy, of your model and so it's, important, and what we do at Google is we're, retraining. These models all the time on every new version of our software and testing. For regressions, and passing, as much of that as we. Can along to you through our open source reference models and I'll say more about that later but you'll need to do this too for, the products that you develop, and and launch. And I, listed, enthusiastic, interns here because I think it's important not just to, make the compute. Available, to. The, people with the most seniority, or the, the, most. Obvious. Titles, in your organization, like, the ones that I've listed before this, whole field is moving so fast there, are so many students being trained both formally and informally that. It's possible that someone who's just joined your company as an intern who's, sufficiently, motivated, could deliver an enormous, breakthrough if they, have access to this level of compute and can. Work with the data that you have and talk, to your product teams so I think it's really important to make this new, large scale compute widely, available both. Inside, your organizations and outside. And that's what we're trying to help all of you do. So. In the. Past the. Key determinant, of progress was typically waiting, for your code to compile. Right, there's, that famous xkcd, cartoon about this now, in, this, new world of working with enormous datasets, and training these gigantic, accurate ML models and then serving them the, key determinant of progress of your team is how long it takes for. Them to wait for. The machine learning models to train and for. Production models this can currently take weeks, or even months, and so. Imagine if your code took months, to compile it's really important to pull that in two, hours. Minutes, even, seconds, to, make your team's more productive let them try them multiple things at once get, results, and iterate, instead. Of just waiting for days or, weeks or, longer so, now, looking at this from the data perspective now. The. Kinds of ideal data sets that I described earlier for cutting-edge ml typically. Have hundreds of thousands of data points millions. Of data points or even billions of data points in them and in. Most, but not all of, these, cutting-edge domains it's important to have high quality labels. Or at least the ability to create them so. Typically. With, user-generated content, photos, videos, you, might have tags that your users are already applying in. Other cases when you're collecting geospatial, imagery you, can build a team to label it or do some sort of augmented. Approach, to labeling it and there many labeling services, coming, online now but. There are also all these sensors, going out whether, it's in mobile, phones or autonomous vehicles or new kinds of robots or factories, or elsewhere. And those, are all streaming data in that, can be valuable, for cutting-edge, machine learning, simulations. Too can. Be a valuable way to generate data whether game environments, or robotic simulations, or other, types of simulators and I'm. Call it here explicitly at the bottom bottom agents, and robots, interacting, with an environment, you're, currently seeing this mostly on the research side now but, I think it won't be long before you start seeing this in real production systems, but.
These Are just starting. Points, for you to look around your business at all the data that you have or could easily collect and look, for data that's related to the prediction problems that matter right. And once, you find a large data set there that's. A good starting point for figuring out how to take this technology and have an immediate impact on your business so. Suppose. Have the people you have the data what do you do next well I. Recommend. That you access the fastest hardware that you can and ideally, in the cloud and you might say well what's. The advantage of going to the cloud I could just buy some accelerators, and plug them in here we already have a data center that's where our data is. Why. Should I change well. The. Advantages. Especially of Google cloud over. On Prem hosting, of your own accelerators, are that, you, get a lot of flexibility, when. You're planning for your own data center they're these enormous, lead times the. Capital expenses, it's, very complicated to set things up and these new accelerators, are power dense their, networking dense they, change, a lot of things about the, way that you need to plan your data center we, announced at Google i/o that our version 3 GPUs, are actually liquid cooled so you, know if you think that that's a signal, of where the field is going do you want to bring liquid cooling at your data center it's complicated, whereas, in the cloud you can just mix and match and software the components that you need you, can provision quickly, and scale up as your, needs change or scale back down if a burst of traffic has passed and now, that you're in a short law there's. Built-in security, I think one theme that you've heard throughout this conference is that Google cloud is the, leader in security, and we have powerful tools for data access user, control, and more, encourage. You to consult the other sessions just to see how. Deep Google's commitment is to security, in the cloud and. As I mentioned at the beginning reducing. Capital expenses is really fantastic on, demand, pricing and preemptable pricing lets you pay only for what you need and scale up and down so. Now. Let's talk about these TP use and, these. TPS are only in Google cloud. Our. Version. One is actually not in the cloud this was in our data center since 2015 it was our first foray. Into the space when. Jeff Dean worried that if every Android user talked to their phones for just three minutes a day the speech recognition algorithm, that we were running at the time would, have required us to double our number of data centers so that wasn't going to work and so there was this crash project, to develop an accelerator that you now interact, with every time you run, a search or you use google photos or a street view or a bunch, of Google's other services, are running either on this or it's successors. Now. Here. Is the, cloud TPU this, is an example of Google's second-generation. Teepee use in a system that's, widely available today, no, signup required no waiting this is now generally, available it's in the United States it's, in Europe it's, in Asia and you. Can find out more by going to G KO / cloud, GPU, the. Nice thing about these teepee use is that they, weren't just single, machine systems, though like I mentioned before they're, designed to be connected, together into, these enormous supercomputers. That we call TPU pods and the, exciting thing here, that's, been mentioned obliquely but I want to make very, clear is that our cloud TPU pod with, these v2 TPU is now, available, in alpha, and you'll hear, a little bit more later about how people have already been using it to accelerate their businesses. Furthermore. As you, may have heard in the keynotes v3. Is also, in alpha and so, if you're, interested in trying, out the next generation, of performance, contact. Cloud sales or cloud support and get. Access to these amazing machines.
Also. We revealed that these v3, TPU is just like the v2 are. Designed to be connected together into even larger liquid-cooled supercomputers. That we revealed at Google i/o so. What. I'm trying to show you here is that we're deeply committed to making relentless, progress, in performance. In cost, in scale, of these. Kinds of accelerators for both inference, and training. And to, pass those benefits along to you and in. Businesses. In universities. As individuals. In startups all around, the world so as, you, can see here we've, gone from 92 tera ops and inference only to, 180, teraflops, for. Training and inference with. The club TPU v2 and. Then, with v3 we're, now going with a single device that you see pictured here up to, 420. Teraflops. We've, also doubled the memory and this is really important, if batch. Sizes have been a limitation, for you or you want to go to larger models that are difficult to fit on a single, device that you have today now suddenly your device has 420, teraflops. And 128. Gigabytes, of RAM, extremely. High bandwidth ram actually, so, that's, an alpha check it out. Looking. At the pods the. Pod, that's available today, eleven-and-a-half petaflop, s-- four terabytes, of ram and this mesh network that makes. For really efficient communication, without, any code changes on your part and then, the TPU v3 pod that we announced at i/o goes. Above a hundred petaflop, s-- with, 32, terabytes of ram so this. Is not just sticking, a lot more of the, same machine together it's a whole new chip architecture in addition, to this larger scale system, that really delivers these results. Furthermore. I want to call out that as of, yesterday. Here. At cloud next cloud. GPU pricing has gotten even better, we've, got promotional, pricing that we've introduced, across all of our regions that, take the price for renting a cloud, TPU, in u.s., central for example down to four dollars and fifty cents an hour and the, amazing thing is if you can tolerate checkpointing. And occasionally, having to recover your work if it gets interrupted which, is an easy thing to do in machine learning you, can use preemptable TP use for just a dollar thirty five an hour this. Is an unprecedented. Step, to make machine learning more affordable for everyone and so we hope that this helps you both, get, started with our new platform comfortably, and for. Those of you in these large organizations to, scale up way, beyond anything you're able to contemplate now.
So. What's. The user experience for these club GPUs well. Today. The. Best way to program these cloud TV users with tensorflow and in particular. Tensorflow. Is a is, a wide. And, vast open. Source framework for machine learning it's the most popular in the world on github it's one of the top five projects, and github overall, and. Programming. Tip used to get the highest performance, you'll, typically use a subset, of the tensorflow api's TF, data, to manage your input processing, layers. Estimators. And then you go through the excel a compiler, over there in the lower right to give. You a sense of what this looks like and by the way these these api's are not specific, to CPUs. If you implement, your models this way our intention. Over time is for, it to require less and less change for, you to just flip a configuration, flag and run, your code on GPU, or on, CPU, so, we're really trying to get to the point where you can write, your machine learning models once and then, transparently. Advantage of whatever hardware you have available and deploy, on whatever hardware you want out at the edge whether, that's in a phone or in a vehicle or you. Know in some other edge, application. So. Focusing. On layers and estimators for a second though here's, some sample code of a typical. Computer. Vision model unmodified. For TPU and then. Here's an example of the types of modifications, you have to make today and again we're trying to reduce these over time just a couple lines have changed here around, the optimizer, and the, estimator, and so. I want, to emphasize though that you're not just limited. To tensorflow, programming, tepees a lot. Of people think oh well what, if I use some other framework or what if I want flexibility, to try multiple. Ways of interacting with these accelerators, that's, just fine and in. Fact as of ten to one point 10 we. Announced, here at cloud next that, we have initial, chaos integration, with cloud GPUs, and this, gives you even more flexibility, and convenience and it gives you an incremental, path for taking the chaos code that you have today running. It on the TPU and then, gradually, optimizing. The performance maybe of the input pipeline, or other parts of it to, get to unlock more and more of the performance, that these cloud GPUs can deliver so. If, you're choosing between one. Or the other the way to think about this is on the Left TPU, estimator gives, you a peak performance right. It scales two pods with zero code changes which is truly amazing if you've ever tried to write a distributed, system this, is a fantastic, convenience. It's. Got some Erinn robustness, you, can easily export models for inference on CPUs GPUs or GPUs and then. Kaos has. Sort of a complementary advantage, with easy development. Non-static. Shapes numpy, input, and then experimental, TS data integration so definitely. Recommend trying them both and. Seeing. Which one better suits your use case. Cloud. GPUs are integrated, tightly with tensor board which. Lets, you track what your model is doing and we, have this fantastic, profiler. Tab that, lets you take captured, TPU profiles, and examine. Them in enormous, detail people have given us tremendous feedback, positive, feedback on how, valuable this is to understand, what their models are doing on this, new hardware this is this, is something, that people really love that are using GPUs today for. Example we even have a frame here that calls out in red I don't know if you can read it but it's telling you that your program is not input, bounded so. This, is a great time to optimize, the mapping of your model, onto the hardware whereas in other cases it might say your, input bounded don't, worry about the model optimize, your input pipeline because you're not able to feed data fast enough to keep up with these accelerators. I'd. Also like to emphasize that cloud GPUs are different in another, way that gives you more flexibility, there network-attached, rather than being directly attached to your virtual machines and what, that means is you the customer, here on the left this, particular customers here in the room is. Connected. To an arbitrary, compute. Engine VM, in the center there that can be whatever you want large small and it can actually be very small, because, all the heavy compute is happening behind the scenes on the cloud TPU so that can be an n1, standard, 2 you don't need a big VM, to drive and then. Communicating. Over network, with, this cloud GPU, ordinary. Open-source tensorflow is able to drive your computation, you, know reading and data from GCS or elsewhere now, we've got cloud BigTable, integrations, and G RPC integrations, and other ways of getting data into these devices but.
Also That cloud CPU doesn't, have to just be a single device it can be a whole slice of a pod and so. This extra network step means, you don't have to fight with drivers, anymore you. Just use these machine images that we provide and you can scale up and down easily, from an infrastructure, point of view as well as from a code point of view so. Let, me say a little bit more about scaling up on these pods so. From, a samples. Per second point of view these. Pods, are designed, to scale and so you see perfect, linear scaling, behavior in terms of in this case with resident a, while, back how many images were processed per second as, we scale up the number of devices connected. With our special. Superfast network, I. Just. Want to emphasize again this means distributed. Training of machine learning models can finally be easy there. Are fantastic contributions. Outside. It's a tensor flow there's hoar Avadh there's other things which. If you have a bespoke, cluster, with InfiniBand that can really, scale. All the expertise to operate it you can achieve fantastic, results but, this, is I think. The easiest way to achieve, this, level of scale in any public cloud today. Furthermore. You. Might wonder well but if I make the batch size that large is. My model still going to converge and the answer is it depends but, the research is looking very promising there's very active research on this right now and what, you're seeing here is on the time axis that, orange dot is a single, device training. A model to, a certain accuracy and then you can see here as we've, tuned, the hyper parameters, and gone to a larger, and larger slice, of the pod you can still hit that same accuracy but in much less time, so. I'm thrilled. That eBay. Who's, been using these cloud debut pods has been able to take advantage of this benefit, all ready for their visual search models to achieve enormous, speed ups on much, larger data sets than imagenet and one. Of our colleagues from ebay is here in the audience today so. The, challenge, is visual. Search which is an enormous part of the ebay experience, customers, all over the world are, listening more than 1 billion products, and needs to be able to find what, they want when they want it but. Even, with the training image set that's a subset of that 55, million images training. A model with on-premise, hardware was taking months and it. Was impossible to iterate and try, out new things as quickly as the computer vision team wanted to. So. To scale further ebay, switched to these cloud TPU pods that are now available to everybody in alpha with, the goal of delivering, a new image classification model optimized. For TP use and then finding the right hyper parameter, settings for their extremely, large data. Set. I'm. Happy, to report that the. Results have been fantastic. And the. Progress is continuing, so in, some cases it's hard to do an apples-to-apples comparison. Between something that was trained for months and months with. Something, that's trained in just a few days but, in some cases the speed up was almost a hundred x over. The hardware system that they've been using previously and, when. You pick the best model of the ones that were tested the, accuracy, boost was 10% which. Is enormous for a production model so. What this means is that eBay is now closer to continuous, machine learning training, multiple, models each week to keep product search results fresh pulling. In new advances, from, cutting-edge, research that's being published in the open all the time we. Think this is a really promising step towards reinventing. The way that, eBay's customers, visually. Shop and experience, the site. So. Let. Me say a little bit more about performance since I know that that's something that's interesting to all of you who are pushing. Up against the limits of whatever setup you currently have, now. It's. Important, to measure performance carefully. Often. You'll see. Isolated. Statistics, out of context, thrown, around on the internet there's many samples per second or this. Many, times faster or what-have-you it's. Really, ideal it's, not always possible but it's really ideal to focus. On real-world data on time. To accuracy, and cost, in an, apples-to-apples comparison. Across two different systems. And so. When. I say real-world data what I mean here is that often you'll see results that are achieved with synthetic, data which just means random numbers fed through the device, to see how fast they can run it's like lifting, a car off the pavement and then flooring the accelerator and measuring, how fast the wheels spin it's cool makes, a loud noise but. What you really care about is the whole system performance, of a. Real data set flowing, into your system without any bottlenecks and being, processed to get to the results that you want or serve the, models that matter to your customers, also.
You'll. Often hear people report, results without saying, anything about convergence, to the expected accuracy, but, ultimately that's what matters, to you, in your business if you're, doing a pedestrian, detector, for autonomous driving it really is important, how, accurate, that pedestrian, detector model is and, even. A loss of a percentage point is something. That you take very seriously and want to avoid so. Finally. We're, trying to move away from chip. To chip or, system to system comparisons. Because they're going to be increasingly meaningless, since there's this Cambrian, explosion of, new hardware. Architectures, to deal, with the end of Moore's law and instead, we're trying to look at the total cost of one, system versus, another in the public clouds that are accessible to everybody. Finally. We're investing a lot of energy and making ml benchmarks, reproducible. Via, open source implementations, so you, don't have to trust my word for it, can take anything. That you're about to see run, it yourselves on cloud GPUs and see. If you measure, the same results. The. Best and most. Carefully, designed contests we've seen so far has, been organized, by some Sanford researchers, it's called Don bench and they, really focused. On these metrics that I just mentioned time. To accuracy, and total, cost. Especially. On image net but also on CFR. 10 and question-answering. Data set and so. I'm happy to report that cloud, GPUs, came, in number one in, Don, bench for image net training cost, hitting. Their accuracy, in just 7.5 hours for under $50, but, now the, field is moving so fast this, was earlier this year already. With. The new promotional, pricing, that, comes down to 34, dollars and with. Preemptable x' that's, just over $10 which is kind of amazing so. But. Wait there's more, if. We look at the res net results here which at that time we're achieving something. Like three thousand two hundred fifty images. A second and hitting. The accuracy that you expect in just under nine hours that. Costs just under $60. Now. That sounds forty dollars running the exact same code or, almost. Under $13, at preemptable prices. The. Pod as I promised scales up linearly so a half pod was able to get the job done not in hours but. In 30 minutes and actually, that includes a lot of overhead that doesn't really make a lot of sense at this, scale because. You end up having, to checkpoint every few seconds as you're, getting through the Don Budge challenge because you wouldn't do in real life so if you eliminate. The checkpoint overhead that's just twenty three point nine minutes, to, get to this level of accuracy that's, really what I mean about enabling teams to iterate faster, than they ever could before. Good. Question, talk to sales. So. Pricing. Is not announced for alpha but I highly, encourage you to talk to our sales team to get a sense of what this will cost. Furthermore. I'd like to say that. At. Dawn bench there was a third place entry that was very clever it was from a team called fast AI. And they, had. An entry at $72, at the time that, had to clever tricks one was progressively, scaling, up the. Image size during training because you don't need the high-resolution at the very beginning it turns out and they also use the more aggressive learning, rate schedule and so, after the contest we. Reproduce, these algorithmic, changes on the cloud TPU and open sources of changes and that. Takes. Resonant, 50 training cost on cloud TPU from, the, new 40. 83, down to 17, and that. Is affordable even with predictable on-demand pricing, but with preemptable pricing the cost is now down to just $5. So, we're. Obviously trying to drag this as close to zero as we can because, we know that you. Need to train, these models over and over again to make progress and you want to train a lot of them in parallel to, do a gigantic hyper parameter search or neural architecture search, on every. New product that you're going to release, so. To. Experience this performance like I mentioned before client, GPUs are now GA and widely available start.
With Our QuickStart or our tutorials. Use, one of our reference models, and, hopefully. This will be helpful to you. So. Where, do we go from Dawn bench well there's, a new benchmark. Contest, in town that's even more comprehensive and challenging, that's, organized, across many different. Institutions companies. Universities and, you can find more information at ml perf and the, LPers. Org. I. Think, this is going to be really exciting with the results for, a version 0.5, of the benchmark late in this October. So. How. Do you start using these cloudy views or TP pods today well I recommend. That you start with, one of our reference models, and I, want to emphasize that we have a wide range of these reference models that, are great, starting points even if you're heavily, invested, in some model in-house it's, useful to step back and say well wait what, is the underlying task here whether, it's object, detection or speech recognition or language. Processing, or something else and see if we have a reference model that, corresponds, to that task because. Then just as a baseline you can take your data connect, it to our reference model and here's. A whole range of our reference models and, immediately. Get a result in just a few days maybe even an after, before. You think about the, more intensive, process of taking your existing model and porting, it over to a new system and optimizing, it so. As you can see here we've got image. Recognition object. Detection machine, translation, language modeling, speech, recognition and even image generation some of this cutting-edge stuff over, on the right and we're adding new models all the time so. I'll highlight a few of these and provide, some new performance, numbers and sort of us a mini case study on object detection, so. You've, heard a lot about image classification I'd. Like to just call out that our amoeba, netd model that one dawn bench is actually. An architecture, that was, discovered through neural architecture search, on. The. TPU so, it's a model that was designed. From scratch on, the TPU made. Available open source for, all of you and it achieves fantastic, accuracy on the image net and also, on larger data sets. Object. Detection is really interesting because here you're processing larger, images and trying to localize many. Different, objects within them and so, back. At Google i/o in May I showed, this result but I've updated the prices here for our new promotional. Pricing your processing 896, by 896, images and at that time we achieved 37. Average precision on cocoa which is pretty good in just, six hours and now that would cost you only $28, or just nine dollars with preemptable but. With. Tentacle 1.9, we've made even, greater advances, in. Efficiency. And that means if you're willing to let this run just a little bit longer you. Can get to thirty seven point seven AP on cocoa so that's up from 37 that's pretty cool it's, not that much more expensive. But. Now this. Is just without, any code changes, going, from TPU v to the TPU v 3 because remember this is a family of processors we want to make it as easy as possible for, you, to scale up from, whether, it's one device up to a pod or from one generation to the next so, here Retina, net with tensorflow 1.9, can. Achieve that same 37.7. Average, precision on cocoa in just four point one hours so it's 1.5. X faster with no code changes and we've barely begun to optimize for the new platform. What's. Even more interesting than that is what. Happens when you have additional, memory available right, because the memory doubles from v2 to v3 so suddenly, without, a lot of engineering effort you can process 1024, by 1024 images. Which, I know is something that's really interesting to a lot of autonomous driving companies, or medical. Imaging companies that want to process the largest images they can suddenly. You're, up to 38.4 average precision remember up from 37, originally, in just five point two hours and now. That. You're able to iterate at this larger scale and larger speed, you try out some data augmentation, techniques and that gets you to 41 average precision in just, 10 hours letting it run a little longer so, this is a great concrete, example, of the general principle I introduced before the, faster, hardware and that also larger memory in this case gives.
You The opportunity for more experimentation, and even, higher accuracy for the models that matter most to your business. Furthermore. I'm thrilled to announce that image segmentation is, coming soon so we've got the deep black v3, model coming mask, our CNN which I know is a popular, model out there and more, for, those of you who don't just want bounding boxes but you want to know pixel by pixel where. Each object is in an image. I'll. Just, emphasize, that, TPS, are not only about computer. Vision here's. A machine translation with transformer, achieving, a near state-of-the-art, blue score on, WMT. 14 from english german 6.2. Hours under $30, under, $10, for the preemptable it's fantastic, language, modeling, it's, more, compute-intensive to, train on the. Language model 1 billion word data set but, this is the kind of thing that lets you do things like smart reply or, smart compose that's, helpful for messaging apps and any kind of conversational. Interaction where, you want to save people time predicting, what's next and this, is an example of one of several models that's in tensor to tensor there's, a link down here at the bottom to, is a numeral and it's all one word and tensor. To tensor is built on top of tensor flow it's, great for research mixing and matching datasets model architectures, training, processes, and you'll. See a few others like this one also there so, speech recognition here we've got a SR, transformer, giving. You a great word error rate in just 13 hours 7.9, word. Error rate and you, can get down to 7.3, if you're willing to let it train a little longer, and again preemptable pricing, just under 20 bucks. Finally. Well. Almost. Finally question answering this, was the winner of the Don Ben's question answering contest, and the. Team has continued to make improvements such that this now trades in just eight minutes I've written estimated, training cost here because this is now so fast we may have to rethink the other overheads, to see how to measure this cost. Accurately. It's too small to even register, and. That question, answering model now has an open source implementation, available. Which is great so. Also. At the cutting edge of research image generation bits. Per dimension is the key metric here and, lower. Is actually better so this two point nine eight is close to the two point nine to the state-of-the-art for. Just one hundred thirty eight dollars or forty four with preemptable. Pricing. You can generate, these images from scratch that look weirdly like the real-life but not quite so I think this is a really interesting research. Domain so. Please. Consider. Starting, with a reference model even, if you, love the model that you've already built even if you sunk an enormous amount of Energon, earing effort into it it's, really valuable to, just lay down a quick baseline with. A public, data set or with one of your in-house data sets using these reference models to, then guide your, engineering, choices, whether it's porting your existing model maybe, merging, it with some a reference code or starting. With a reference, model fresh and then building from there. So. These, reference models are high-performance they're open-source, they're cutting edge you know some of the latest and greatest research like amoeba net is immediately pushed. Into these this pool of reference models that are open source we're. Doing a lot of continuous, testing for performance, and accuracy, to save you as much of that trouble as we can because we know it's complicated and expensive and, these, reference models that you get up and running quickly training, on your own data but, I, want, to emphasize you're, not limited, to reference models. For. Example here's, another case study of architecture, search, that a group at Stanford, did where. They, actually did parallel, runs using, hundreds. Of cloud TP use simultaneously, each blue dot in this. Image is a. Run, training. An image, net scale. Convolutional. Are ten on a TPU, so, this, is a lot of computation, here they, were searching for a model there's an archive link in the lower right task-driven. Convolutional.
Recurrent Models of the visual system is the, paper and they, were looking for a model that was a better fit for the types of dynamics that you record if you put, electrodes. In the brain and actually listen to the primate visual cortex as it's, seeing, and, so. Here's, a diagram of the sort of space that they were searching and it, turns out that across this this population, of many different models they, found that these red connections, were selected for during the search versus, the others and then. When, they went back and tried to fit, these models, to some of the signals that they were actually recording, in. Primates, they, found that these convolutional. Rnns were, a much better fit, for, the neural signals in v4, in IT, than. Typical. Just. Ordinary convolutional, feed-forward models, that, you see. Typically. In the literature today so, I think this is an interesting, new research direction and it's, a group that was able to do this from scratch on their own using. Hundreds, of cloud GPUs, so. You. Can not only implement your own models you can search for models automatically, using cloud GPUs so. I wanted, to share a little bit more information about, the hardware just give you a flavor of what makes these accelerators, different, from. Other, types of processors that are available in the market, so. The. Cloud keep you that, I mentioned before consists. Of this device here which under, those heat sinks has four separate, processors, and all, of this 64. Gigabytes of high bandwidth memory with, a very high memory bandwidth among, the, processors, and, it's. Connected, to a host server like. You see here with PCI v3 and so. Let's, zoom in on just one of these chips in the, cloud TPU device well. That chip has a layout like this it's. Got two, cores and. It's. Kind of scale your unit it's got a vector unit and most importantly, it's, got this matrix, unit which is this gigantic systolic, array that's, an old idea and hardware architecture but its newly relevant, now that so many of these cutting-edge models use, dense linear algebra it's, really the perfect fit for, the systolic array. One. Interesting thing about this, is all the gray is that. While it's doing float32 accumulate. Like you're used to it's, actually using a different, filing point format behind the scenes with no code changes for you necessary, although you can take, further advantage of be float16 if you want so. It's using be float16 and the multiplies and, let. Me tell you a little bit more about what that means because I think this is one of the unique advantages of, the cloud CPU so.
Float32. I Triple. E floating point has, this enormous range and those. 32 bits are divided as you see here with you know the exponent, and then the mantissa lots of bits for the mantissa to, track these fine differences, between numbers, that's, important, for high-performance. Computing where you're tracking, fine details in a simulation, but it turns out it doesn't matter that much in machine learning, so. Some vendors, have, gone to I Triple E float. 16, which you can see here which, reduces. Both the exponent, and the mantissa and, what, that means is you end up reducing the range and now, if you're programming with float 16 you have to completely, rewrite your model in a different way on a model by model basis it's, very time consuming and, subtle because, you have to keep all of your activations, and the rank ranges, you might have to do lost rescaling, we're trying in tensorflow to make this more convenient but it's really hard and so, we, actually use a different floating-point, format, in the TPU that, we call B float, 16, for, the brain team which is where we started this and there, we preserve, the 8 bits of exponent, and just chop off the mantissa turns, out this is almost a drop-in replacement for float 32 across. This huge range of machine learning models that we see you. Only have to be careful in a few, small situations, maybe atom has a parameter that you got to be careful, to explicitly cast the float 32 because there are so, many nines after the decimal point but in general it's, it's much easier to work with b float 16 then, with float 32 and, then that you, know reduces, your memory usage and avoids. Memory bottlenecks, it's it's fantastic. And. If you want to see how B float16 works. A lot of our reference models are already using, it transparently, so you can see some examples in. You, know the, github tensorflow TPU repository, so. Just, to illustrate briefly, how, this matrix unit actually does, magic here's, a 3x3, example but imagine this is one 1:28. Which is what we really have on the device so. What's, happening as the data streaming in is we're, getting a lot of great reuse, of these, intermediate results, within the systolic array in this, staggered, fashion, that's, producing, the output that we want and so as long as your problem has some dense. Linear algebra core, that. Can be mapped, efficiently. To, these systolic arrays which. We found is possible, across this wide range of applications, that I called out earlier, this. Is going to be a great fit for. Your. Machine, learning both, training and also inference. So. Stepping. Back for just a second we've covered a lot I just. Want to emphasize cloud. Cpv2 it's. Now GA it's, widely available it's, in the u.s. it's in, Asia it's in Europe we've, got regular pricing, we've got preemptable pricing, it's, an incredibly affordable way to start, with one of our state-of-the-art reference, models and explore, this frontier if you haven't yet or if, you're one of these folks part. Of or leading a team of these, cutting-edge machine learning engineers and researchers they'll, thank you for having access to orders of magnitude more compute, than you could possibly have afforded before, we've. Also gone through how, to measure performance carefully. And looked, at some of these applications. Across, all these different domains not just image recognition but, also speech language.
Image, Generation and one, other thing I'd like to highlight here is that. Cloud, two views I've focused on using cloud two views in. Google. Compute engine but. That's, just the lowest level of the stack where we're giving you maximum flexibility. And control we're. Also working to integrate cloud to be used with all of the rest of Google cloud on the storage side which I haven't written here with, GCS, but also with BigTable and with our betrayed G RPC services. And, then here, with, cloud ml engine if you, want to use a higher level managed, service to keep track of all your training jobs and, kubernetes. Gke. There's. A really interesting talk, tomorrow on mini, go training. And alphago, zero like model, on. Hundreds. Of clouds, of use using, gke it's going to be a fantastic talk so check it out and I look for the recording afterwards so, we're. Really introducing. A new. Platform. For machine learning it's it. Works well with tensorflow but also with Karis and we're looking at other frameworks as well and we. Hope it'll be valuable to you as you're developing cutting-edge, new machine. Learning enabled, applications, for. Your, customers all around the world so thanks. Very much.