Minigo: Building a Go AI with Kubernetes and TensorFlow (Cloud Next '18)

Minigo: Building a Go AI with Kubernetes and TensorFlow (Cloud Next '18)

Show Video

All. Right hi, everyone thanks for joining me hope. You all are having a good week this. Is going to be a talk about min ago which is kind of the story of a 20%, project it has gotten. A little out of control goodness, there's a lot of you out there so. Thanks all for coming, so. Let's start I guess with the question of what is mini go and, mini. Go is a, open, source independent, implementation, of the alphago 0 algorithm, so can I get a quick show of hands of everybody who's heard of alphago, that. Is a lot of a lot of hands that's great um so, the alphabet 0 algorithm, was published, in, nature it's, the first, go. AI, to, beat a professional human, and. So this talk we're gonna cover what, is mini, go, what. Is go itself, a quick, little introduction to machine learning because this is a cloud convention, I don't want to assume that we're gonna do sort of a whirlwind. Tour of the machine learning and how it applies to mini go but. What we're really here to talk about is how we were able to bring, this to scale on the Google cloud platform, and. Then I'm gonna do demo and talk a little bit about the things that we learned and hopefully this will be useful so to. Start. What. Is mini go where can you find it it's on github you can check it out there it's under tensorflow github. Calm or there I guess github calm slash tensorflow /, mini go feel. Free to check it out send pull requests again this is not an. Official, Google project I need to emphasize right here at the beginning that mini go is not alphago, we, are not affiliated with deep mind we're not associated with deep mind and I've coded this entirely. From. From. Just their published work I don't have access to their source code and I haven't used their source code to verify it so so. Why did I do this at all if deep mind has, created. Alphago. Why, why did I need to make mini go well for. Starters I'm a go player in, addition to being a software engineer at Google I also. Volunteer. For the American NGO Association, where I serve on the board of directors and have worked for American NGO for almost. A dozen years now I've. Been playing go for a substantial. Portion of my life and I love it and. Alphago. Is amazing, imagine, if you're a go player and. An. Alien. Intelligence has, been created, and it happens to speak your language and, that's. Go players we were really excited about alphago, but of course alphago, retired so. I felt, like maybe I should take a shot at trying to create this so there's. A lot of things that I think that as go players we can do. And learn from alphago. And so, there's. Sort. Of a rich vein here to explore and to mine so our goals for min ago were, to, reproduce. The results and, provide a clear legible, example, of the program if you do check it out on github and I hope that you do it's, only about 2,000, lines of Python which is pretty, cool for a. World-changing. Algorithm. I hope. Also that this can showcase and demonstrate what. The. Power, of Google, cloud can do because this. Is basically a couple of Engineers in their 20%, time and their spare time trying, to implement this and the, only way that we were going to be able to do that was by. Using. The longest lever arm we can find, and so leverage, is a verb kind of means a different thing but quite literally here I mean it has a a force. Multiplier for what is a very small software, engineering team, so, where, did what did me to go come from the first so, alphago, and deepmind. Has. Been described in three papers, the first paper describes. Alphago. And that, is the. Mastering. The game of go with deep neural networks and. With. Deep neural networks and research it was published in nature and. My. Friend Brian Lee wrote. Something called mugo or micro go which. Was an, implementation, of that first version of the paper he. Didn't he had sort, of a skeleton, implementation, of it it implemented, one half of the network but not the other half and. Was. Sort of like a proof of concept so.

After. Alphago, their, second, paper was called alphago, zero and alphago. Zero describes, a. An. Algorithm. That started, from nothing from, random, noise and. So. And. Then went, on to teach itself how to play go so, mixing, in alphago, zeroes, paper, plus, my friend Brian's. Implementation. Of mugo we, get min, ago and if, you're curious about why mini goes logo, is a happy, looking robot falling. Off of a ladder all. Will become clear I will explain. Why he looks completely. At ease with this situation where he's falling. Off of a ladder alright so let's. Talk about what the game ago is how many people have played go anybody, here, wow that's a lot of people how many people like to play go, yes. That's exactly what, I like to hear I I love, go I've been playing go for a long time I really like it so we're gonna do a quick quick demo of what go is this. Is what, it looks like at the beginning people. Take turns putting stones down on the board trying to surround territory. When. We talk about capturing, capturing. Looks like this when stones are completely surrounded they get taken off the board that, also works on larger, chains of stones so you can see that groups. That are connected orthogonal, II share, their fate. Where, they will hang. Or hang, or stand together as it were. The. Winner is decided by as you try to divide, up the board. Whoever, has more, territory. Wins. That's, it so it's like you're drawing up lines on a map you're carving up I get this you get that you. Can see on that on the right, diagram. There that white has surrounded all the triangle points and black has surrounded all the squared, points. This, means that go is not really a sort of absolutist, game where you have to like capture the enemy king or. Completely. Destroy the enemy our enemy it's more like you're, negotiating. An agreement well you just want to get a little bit more than, the other person. And this is an example of what a whole game might look like so this is this is mini go in action you can see it. Sketching. Out territory, starting in the corners and this, pattern that you see happening right there that is called a ladder, ladders. Are an. Interesting example of why go is hard so. This is a fairly straightforward pattern, you can see it develop again it's. A really obvious pattern, it's a pattern that you, know toddlers, can probably follow.

And Predict. But it's an example where. Go. Has a very long horizon effect where the result, of that ladder could decide the outcome of the game. And it may require looking 80 to 90 moves ahead and, so why, why is that hard so. A. Ladder, is this great example of a game where you have. Where. Even with a branching, factor of only looking at two moves for each of those cases you're. Already looking at you know a t-square, or two, to the 80th excuse, me possible, positions so that very high branching factor makes things really difficult the, games are really long the, end condition, is really hard to describe so with go you. Want, with. Chess when, the King is captured a checkmated the game is over and everybody. Can see and agree this game has ended with, go the game is only over when both players agree, that, there's nothing left on the board worth contesting, this. Is pretty difficult if you're trying to teach a computer how the game is over in fact just scoring the board. Just. Knowing that it is time to score the board is a really hard problem and, then, lastly, and this is possibly the most important, part is that it's really hard and go to determine, who's winning in the, middle of the game. This, is, one. Of the so, this. Problem. Who's winning who's going to win one, of the hardest problems we have dealing, with the branching factor that there's so many possible, moves at every point one, of the really difficult problems that we need to solve and. So let's. Take a step back and let's talk about how we're going to approach this with machine learning so this, is five, slides about machine learning obviously. This is not an exhaustive. Explanation. I should, mention also this is probably good time none of us have PhDs, on the mini go team so. There's probably people in this audience who understand this better than I do but. Bear. With me I hope that this will be enlightening. For folks who have maybe not done any machine learning before at all so, real, quick inference, for neural networks so the, basic idea is we're going to put in an input, and get out an output and that thing in the middle is the. Model, that we talked about we, don't really want to worry too much about what it is except, for. We. Need to know a couple of things about it the first is that it's a bunch of math that is differentiable. Or close, enough to differentiable, and the second, is that it's really slow like. Slow on the order of milliseconds. To evaluate, and why does that matter that it takes milliseconds. That seems fast well because you maybe need to do thousands, of those before you can decide on a movie play so. We're gonna consider neural networks for inference, inference, meaning this forward path where, we start at our input and we get out our decision. That's, what inference is it's also called a forward pass, we. Need to know that it's slow and it's differentiable. Alright. So how. Do we create that model. That thing in the middle well, what we're going to do is we're going to try and quantify the error where we have, inputs, that we know and outputs.

That We should have gotten and, we're going to take a look at the difference between those and try and change the model which. Is differentiable so we know which direction we need to push the different values in that model. To. Make the error smaller so, that's basically, machine learning in a nutshell release, the stochastic gradient descent form, of machine learning, but. Yeah it's a pile of linear algebra and we're gonna try to tweak, it, and. We can repeat this process where, we try and make the error slightly, smaller until. We run out of data. That. Function, that model, in the middle could be a function of many many millions of variables, so we're essentially trying to minimize the loss of you. Know million variable, functions, and. That's. Pretty complicated but luckily there are pretty good abstractions, for doing all this stuff, so. Let's keep going so what is inference from min ago and how are we going to use that to solve this problem, so inference, for min ago in this case means, two. Questions. What, moves should we play who. Do we think is going to win so, given, that board that you see on the Left that's our input and we're, gonna ask mini go for a probability. Distribution of. Where. It thinks, the next boom is going to be and. We're also going to ask it for. A number that, expresses, who it thinks is going to win, so, min ago uses negative 1 for if white is gonna win and positive, 1 if black is gonna win 0, says, way. Too close to call perfectly balanced, and anywhere, in there from negative 1 to 1 is it, quantifying, who it thinks is going to win. So. Given. Those two outputs. What. Is the, move, that we think is going to be played who. Do we think is going to win we. Do something called Monte, Carlo tree search, that, my, friend Brian Lee who I mentioned wrote mugo did, a great talk at PyCon, 20:18, called a deep dive into Monte Carlo tree search with. Code also. Our code is on, github Monte. Carlo tree search is really. Great super, easy to read and highly recommended so. That deep dive in a Monte Carlo tree search is, great. So. By, doing this, self play with Monte Carlo tree search we're. Able to do. To. Quantify, our error, so, we asked min ago for what moved to play and who was ahead and now, we can define our error as the sum, of the difference, between the. Original estimate. For what moved to play and the. Moves that were actually, explored, by tree search so, we use that estimate, of who's going to win to decide whether we continue to explore that move so we'll look, at this move then we'll look at the next most likely one under it and if, it's better or worse maybe we won't explore, that parent, move does that make sense kind of maybe can. We can we just sort of acknowledge that this could happen and we, can dig into the details at some later point all. Right so that's Monte Carlo tree search which, move in who's ahead the two components, which move and who's ahead are called the policy, and the, value outputs, of the neural network so policy, is a, given, a board state where do I think the next play is going to be and value. Is how do I value this board. All. Right so, let's. Do the quick recap that was our fifth slide so this is our summary slide this is how I get six slides this. Is a reinforcement, learning recap, the lightning version. Of reinforcement, learning is using. Our. Data, to make models, using. Our models then to make more data and now, we never run out of data and we can keep doing this as much as we want so what's the model the model is that pile of math that we had in the middle we, want to be able to measure that loss by comparing, it with answers, that we know or with, answers that we think we know we. Train a model by minimizing, that error so. And min. Ago uses, the policy and value to, do, tree search to try and refine its original, estimates. Does that make sense. Maybe. Somebody, ya know. Okay. Well we're just gonna move on the. Reinforcement, learning loop pretty, simple it's this nice virtuous, cycle that we can set up we're self, play makes better data and we, use that in training to make better models and they, reinforce, each other the problem is that that I've, made one of those errors substantially. Larger, and. That's not an accident because the. Idea, is that using doing. Tree search involves, doing many of these inferences, or forward, passes, we. Want to do as many readouts as we need to be able to ensure that we're improving our policy, and that, means doing maybe. Hundreds, or thousands, of inferences. Per. Each move now, if we can train on each move that means we're doing hundreds or thousands of inferences, per, training, data, point that we make so, at a minimum we're going to have this. This. Ratio where we need to do hundreds, to one in terms, of making inferences versus.

Making Training steps, does that does that sort. Of setup the problem here so, in. Order to train adequately I need to do this for millions of games and I know that a game will take on. The order of minutes even. When I'm using GPUs, or TP use so, my question becomes how do i scale, this. Now. Happily this is a problem that is to. Borrow, someone's phrase embarrassingly. Parallel. Setting. Having a having. Two copies of mini go play each other does not involve any, other, copies of mini go which means that I can spin up as, many of these as I can assuming I have a good way to shift, to. Shard. This all out or scale this all out and so that brings us to the. Part of the talk that I think well here for which is using. This on Google clouds so. We. Used kubernetes. To do this but we started, by making, the smallest, possible units, of work that we could so our training job was pretty straightforward and we made our work our job use. Cloud storage as a way to sort of coordinate what, we needed to do, this. Worked really, well cloud, storage as it turns out is. Incredibly. Flexible and, you, can abuse, it in all sorts of interesting ways, we. Were. Able to use cloud storage to, get this sort of tracer bullet while we're sort of deciding, do we need to write centralized, server do these need to talk to a server at all and. Cloud storage basically. Scaled to the point that we didn't really bother so. It's. It stood up wonderfully and we've been able to use that, probably. Beyond the the, limits that we certainly thought we could so then, what we did was we turn this into containers. So we used, we. Use docker to build containers, for that self lay worker and we, used kubernetes. To. Scale. This out to, thousands. Of nodes so. We did this we could start off we could fetch the latest model we could write out our data we could shut down and start up again and, each. Since, each time we're fetching the latest version of the model we're, always able to coordinate. And make sure that we're generating new data with the latest and best model that we have. So. What. We originally did was originally wrote this on a smaller, board size so the, original. Diagrams, that I showed you at the beginning we're go being played on a 19, by 19 board, that. Is the full-size game, of Go Go is also played on a 9 by 9 board and. This. 9 by 9 board is a lot simpler the, model for that is about, 250. Times faster, so we were able to sort of verify, correctness, using, that smaller model on a smaller board and I'm using correctness. Very, loosely because it turns, out machine. Learning is very good at finding patterns in your data or. Patterns. In your data that you didn't mean to put there it's, very good at covering. Up for mistakes that you make so we, saw something get better on our smaller, sized board that, had really glaring. Horrible. Bugs that. We wouldn't discover until later but more on that the. Next thing we did so as after we did this and verified, our correctness, on the smaller board we realized that holy, cow the full-size model is going to be 250, times slower we need to add accelerators. And so we were able to hook. Up GPUs, with a minimum, of us we were running a cluster of about 2,100. GPUs. With. No, problem, at all we using. GPUs on kubernetes is fantastic. It means we don't have to worry about drivers, we, didn't have to make any code changes, what ran on our workstation, will run perfectly on. On, kubernetes. On kubernetes, engine and, this. Was a this, was pretty, great success but it was still a little, slow we had some benchmark, performance numbers that we were trying to hit so the numbers, in. The alphago paper, detailed. That they were able to perform sixteen hundred inferences, in point four seconds.

We. Were about. 40, times slower than that which, is okay for a solution that was Python and for, our accelerators, which were not TP use. But. 40 times slower meant, 40. Times slower they were able to, Train. Play five million games in three days and, if, we are 40 times slower than now I'm looking at doing that. This taking three months so. That's a, little challenging I needed to find a way to do this more faster so. Before we do that let's take it a little step and say that as I'm setting. This up on GPUs and I'm realizing this is going to take months to run I really wanted to find ways to verify that everything. Was working so with containers it was really easy for me to make. Variations on the jobs I was running and really try to. Run. Other sorts of evaluation, matches, it was really easy to use the kubernetes engine API to. Spin up jobs that were variants so I could test different version of models make sure that I'm making progress. This, was a really important thing that I'm going to come back to later and, when I say measuring performance I am measuring performance on my tasks which is are my models actually getting, better at playing go, that's. A real question so. Early. On I. Knew. That this was going to take three months at, the point that cloud, tea pews were, being. Developed I said hey you know this might work really well if I could try running this on cloud TP use and the, cloud TPU team was pretty enthusiastic and, they said yeah sure go right ahead um, but. They were so, much faster that it meant really. Really. Written my pipeline, for. Reference the 2000, sheepies that I was using we're a few generations old so I'm, deliberately, avoiding, making any sort of direct. Numerical, comparison. But. Suffice to say that what. Was previously fine, in Python, was, now no longer fine, using. A cloud TPU there was no way that the code was going to be fast enough. Which meant that if I was going to be able to use these TP use effectively I would need to, seriously. Rethink how, this pipeline was going to run, when. You're planning on something taking three months and now you're looking at maybe a week or two you. Have some very, different constraints, about how long you can take to pre-process, your, input data how long you can take to you, know lazily, push out, the. Results, of the new, models all that sort of thing so what. We had to do is we had to rewrite 40p use this. Code is Monte. Carlo tree search this is the short pseudocode. For Monte Carlo tree search and I'd like to draw you attention to that line that says neural net evaluate. Leaf, dot game state because. That is the one that. Suddenly. This this whole thing needed to go in parallel, and. It, needed to go in parallel a lot, faster, than it could and so. This, was the part that the engine, that would do this rapidly, was the part that we needed to rewrite so. That's, what it looks like on a single, threaded version this is pretty, close to what the Python code actually looks like in. Mini. Go today, and. So. Rewriting it for TP use involves. Involved, breaking this out into a multi-threaded. Version. Friend. Of mine volunteered. To do the C++, rewrite and. He. He's, probably not gonna like me telling this story but, he was able to write this complete multi-threaded, implementation.

No. Problem no bugs, writing, a doing, a full rewrite multi-threaded. Code no. Bugs no race. Conditions, no problems, like that but, he had forgotten to actually increment, a loop and was left with a brutal. Bug that took days to debug, I was like I'm so bad I'm like you just did a complete rewrite, multi-threaded. Code that, had no bugs the first time you, couldn't be harden yourself of that anyway so, this is what it looked like we ended up building. A container out of our C++, port that would spin up a Python engine and we would do these RPC, calls when. Our queue had enough nodes in, it to run them all through the model and this did great it also meant that our C++, code no longer depended on tensorflow so we didn't have to have any trouble building tensorflow. Which. Made, our builds a lot quicker which is pretty cool. And. Once we had this engine now we could change to, using deployments, on kubernetes, engine previously. When we were using GPUs, we use the batch API, quite. Show hands people have used kubernetes before, in this room cool. Oh that's great the batch API worked really well with two, thousand, nodes we, had, some interesting limits. Around, tracking. Completions, and what we ended up doing was not bothering we ended up writing. Our. Job. As a. Job, where we set something like a hundred thousand completions, and what, that would actually do is it would throw away completions. After about a thousand, which, meant that it would just run, constantly, which was what we wanted the. Completions. Writing. It as a batch API worked really well because that meant that it would just. Always retry. It would handle the retry logic for. Us. That. All worked really well but, now with cloud GPUs we didn't want to pay. The price of having the. Containers. Go down and come up again so we wrote them to be long-running jobs as, deployments. And this. All worked really well we. Didn't need to track completions, anymore because we were writing these as long-running jobs and so. This. All just basically, worked out of the box. One. Of the. Advantages. Of mini go is that we were able to test. This so. So many go is entirely public it's all available on github which, meant that throughout. This entire project I was following the, exact same path that external. Customers would follow so as although, I'm a Googler I basically was, going through the exact same process that anybody in this room would have if, they tried to go through it and that was a pretty great experience, honestly I hit, way fewer, and and also a lot of these were while. These products were pre-alpha or you, know getting ready for alpha and the experience was honestly. Amazing cloud, TP use on kubernetes engine had. Basically, worked out of the box ditto. For the GPUs like I said the driver problems, were not even. A thing that I had to worry about so if, you're thinking about doing ml. Or accelerated. Computing, on kubernetes, engine I basically. Only have good things to say about it so. So. That's pretty great so, do a quick demo of min ago I, think. There's a way for me to make this play but I don't know what it is, there. We go I like. It so, this is min ago thinking. About, this, particular game this is that small sized board I think this was actually running in, CPU, only on a laptop it. May even have been a Chromebook, so, this is running completely unaccelerated, on that smaller size of go that. I was telling you about oops, and. It doesn't, loop which is unfortunate, but, you can see the, variations, that it's considering, in the upper right what, it thinks is the most likely move to happen, in the left side and this is a pretty great tool if your go player you can really dig in and see what it's thinking about why what. It likes what it doesn't like pretty. Great alright. So talking. About our results. Alright. So results, okay so talking. About the conclusions, and the lessons learned I, kind, of want to talk. A little bit about the trends here that we've had which is moving. From GPUs. To TPS and in this case these were older GPUs to the, new the, newest GPUs, has. Been eye-opening. The, ability, for us to do more work with basically, the same amount of electricity means. That, us. Users have our incentives, aligned with cloud providers, to use the, newest and fastest, hardware. Basically. There's no reason that.

We Want people, to use the older versions of the hardware because the newer versions of the hardware will, do the same work will do more work with the same wattage. Which. Means that we really want you to use the newest thing that you can so. As long as this holds, you, there's, going to be this pressure for, us to make, our code go as fast as possible and this is fantastic. In particular, when, you're doing research if you're trying to dial. In these solutions you need to be able to experiment as quickly as you can, you. Need to be able to try things that might have been too expensive. To try otherwise and. So having this, push. To use the best price per performance and, having the price per performance. Be. Use, the fastest, thing not use a slower thing means, that we can really dial in, the. Models that we're trying to build that we can really. You, know, iterate. Faster, on how we're going to test the various ideas we have for how to develop models but. What that means is that means you should plan for your pipeline, to get faster again we had originally. Bracketed. That this would take us a couple months and now, we're looking at getting it done in a week or two and. That, changed. Some of our core assumptions, about how long we'd have to do a lot. Of the different steps in that pipeline and it seems like that's only going to continue so definitely you, want to plan accordingly that. Is the cloud edge TPU, which is the little, USB gum, stick on. Which we're going to make mini go run which, I'm really excited about. In. Relation, to this idea that your pipeline is going to be faster, we. Really want to try and make our models more. Like cattle and less like pets have you guys, familiar with a cattle not pets metaphor, from containers some. Folks may be getting the idea the idea being that containers, lets you. Reproduce. The exact, build of the software you're running you, can say that my server is, interchangeable. That I can get more of them I don't need to give it a name it's not special it's, not a big deal if it dies or goes down or, breaks. And I have to rebuild, it in, the same way we'd like our neural network models to be like that as well where if, it takes you three months to do it you get kind of possessive, like, you really don't want to see anything happen to it but. If you can make a new one in a day or two now. You have this freedom and ability to experiment. And try things and. It, means that you can keep those changes. Much better isolated, with, our early runs on GPUs, that were, taking. Months it was very tempting, to constantly, monitor the dashboard, try, and tweak some knobs try and turn some dials here and there you, know and worry and fret over it but it also meant that at the end of that process how. Could I reproduce, exactly that, sequence, of things that happen really, hard, having. It all go faster, means that it's all a lot more repeatable and that's pretty great. Another. Thing that's pretty interesting. Here is that these reinforcement, learning systems, really. Resemble distributed, systems so I imagine, a lot of folks here work in the cloud and they're familiar with the idea. That you're, in. Visually well-behaved, systems, stop, being well-behaved, in tandem, and. That's, pretty. Interesting, from. A number of reasons I think that there's sort of two ways that we go about analyzing. Why, our systems are doing the wrong thing one, of them is from. The code up where, we can look at the code and understand. Exactly what will happen and the, other way is from the, behaviors, that we see down, where. We have to think about what's happening, and you. Know generate, a hypothesis. And say, okay if you, know the load balancer, is doing this then we would expect this and I can test that this way maybe, you can't solve that just by looking at the load balancers, code and the server's you know you, have to you have to generate a hypothesis, and test it and that those, are two very distinct, steps I think it's inductive. Versus deductive, and. In. A, very similar way neural, networks and developing neural networks have, the same problem, where we're not you. Can't directly, point at a line of code for why model.

Makes A decision it does, but. You need instead. You created. Code that made that model which, means more, of the. Top-down. Type, of, debugging. You, still need to be able to look at specific, code and try and figure out what's going on but, in this way these reinforcement, learning systems, really resemble a lot of the distributed. Systems and the idea. That debugging, is a process of generating hypotheses. Figuring. Out how to test a hypothesis, in isolation. You, know verifying, okay well I did this I expected, this this other thing happened what, does that mean for my intuition, about how these systems are working all, right so let's. Sort, of wrap things up a bit. There's. Two. Major parts of this. The. Machine learning part and the software, engineering part and I'm trying, to focus in this talk mostly on the how part how. Did we get to 90, petaflop s-- you, know how did we monitor health how do we get the performance, that we need how do we make it work I haven't, touched basically, any of the machine learning parts why is the, network this way why are these constants, like this why is the network structured, like this. You know those why questions those. Are pretty tough but. We can't really answer those wise until, we have the house nailed, down so, if you have your neural network your machine learning problems that you're trying to figure out the, answers to these questions obviously, you know those are going to be domain-specific but. When you say okay well how am I going to deploy. And test these applications, how am I going to test my hypotheses, you, know the answer there has been cloud. Platform that's worked fantastically. So. Let's talk a little bit about the things that did work really well so I've included a little stack driver dashboard, that we used stack, driver was fantastic, we could just throw things to standard output and you. Know reg X through them without any trouble. I mentioned, cloud storage also scaling, way. Beyond the level that we talked, at so our early clusters were putting out maybe 50,000. Games a day with, a full-size model, with. TPS we're putting out over a million games a day and. Still having to download and handle, them all and. Cloud, storage has just been, a champ at some point we're thinking about doing integration with cloud BigTable, but, that's still TBD and in the meantime cloud storage has just been, we. Just abused it it's. Really wonderful to, Bernese, Engine and cloud GPUs as I mentioned those basically worked out of the box if you're interested in using them it's, a it's a pretty great experience, definitely sign up what you need to do tensor. Board 1.9, has. The. Latest release of tensor board will. Let you serve your tensor board dashboards right out of cloud storage which is great if, you are trying to use the cloud to use the profiling tools are easy. To use they're, useful. To dig into and find out how you can get the most performance out of those TVs, so. All. Of these parts, of you. Know using the cloud stack, went really well as. For, those wise I. Love. This quote, here from Alex or pan, reinforcement. Learning is really hard, he. Has this quote if, it's, if it turns out random I have no idea if it's, something I did or if I just got unlucky right we have some of these algorithms in the state of the art and machine learning right now that. May. Be a correctly, working, version with, a different, random seed will not converge on a solution and that's kind of astonishing. That here's a successful, result that's been published that only, works 30%, of the time. Obviously. Alphago zero seems to be much more robust, but. It, kind, of points, out the challenges, in, testing. These hypotheses, and the need to be able to iterate. And isolate. And and sort of formulate, hypotheses, and see, if they work. This. Is kind. Of an amazing. Quote. That that I really liked to. Think about for. Me personally I've been following. In the footsteps of published papers where I know what, is possible because. Deep. Mine had published their papers, it, would be much harder for me to like. Just trying to imagine if I was in their shoes where, I get a model that turns out pretty good but. Not great, and I, say to myself well can, it be better or, is this the best I can do with this approach right. There's a very big difference between exploring. The unknown and reproducing, the known in my, case if I, hit a ceiling I can compare that ceiling to the ceiling that they described, and say okay well maybe I clearly have a bug whereas, for somebody doing new research for the first time it's, going to be much harder for you to say is this, the ceiling or do. I have bugs you, really need to be able to be diligent. And controlled, and and.

Make Sure that you are able to isolate the parts of your system and make sure that each of them is doing what you think does that kind of make sense great. Awesome and, lastly, I kind of want to quote. About. Why I'm really excited as a go player that all of these things are finally taking shape, I started. Min ago and after. The alphago paper was published which I think was, November. Of last year and, since. Then Facebook, has announced an open-source version they've, just released a model. $0.10, and other. Chinese companies, have been, working on models which they've released with various degrees of openness. And. It's. Been very exciting to have these and an open source project called Leela zero has, also, been done where they've been trying to crowdsource all of the GPU compute needed and. It's been really excellent. To have, all of these different folks, try to reproduce the paper with varying, amounts of success and its, really wonderful as a go player to have access, to these essentially, Oracle's. Go. Players like to think of playing a game of go as having. A conversation talking. With someone we have a great proverb, that playing. A game of Go with someone is like living with them for a year and. In that case we have this. New thing that is. Saying. New, creative, ideas to us that we haven't really understood before so if. You are interested, in learning more about the game of Go definitely. Check it out online there's a lot more resources and, hopefully, it's going to be a lot easier to learn now that we have. Ways. To understand, it we. Do try to hang stories, on our moves that we play and. It's. Going to be a little bit easier for us to sort of do that and understand that as we have better tools to dig into so, thank. You all very much a big thank you to the. Folks who have helped Jenna donate, their time to work on min ago, that's. Tom Seth Brian and Josh they, all have, been instrumental in making me go possible. You.

2018-08-01 17:30

Show Video

Comments:

Fourth.

Bullshit

Second

First.

Other news