ISC 2023 - Energy Efficiency and Next Gen Cooling
foreign [Music] welcome to thecube's coverage of ISC high performance 2023 where we're covering all the things HPC machine learning AI high performance analytics Quantum Computing and more and one of the most important topics in the HPC Community is next Generation Cooling and Energy Efficiency and we're joined here by David Hardy poweredge cooling product manager at Dell Tim shed engineering Technologies Office of the ctio and Mohan Kumar with an Intel fellow gentlemen thanks for joining me today so the big topic is Power Cooling but how do you get more power with to power all these CPUs gpus processors to get the the power that's needed and at the same time sustainable we'll start we'll start with Dell well I'll start uh as a product manager uh for poweredge you know one of the biggest challenges is bringing sufficient power into these systems really to support these high performance processors both CPUs and gpus luckily it's more than worth it uh the performance gains relative to the increased power uh make it a no-brainer to go with the Next Generation systems and the other piece of the equation is uh from an efficiency perspective uh how do we cool it um luckily generationally we keep improving how much we can air cool we've got liquid cooling options that make everything run very efficiently so again um it's more power consumption to deliver this high level of performance but we can do it more efficiently this generation compared to past what's the Innovation behind this next Generation if you had to put put the finger on it what was what's the key aspects well actually it's not just one thing it's a a bunch of incremental improvements and a variety of areas be it power delivery be it the uh the designs of the system so that we can more efficiently move air through there it's the way that we uh bring cooling to the chips it's smartly controlled fans so that we're only moving as much air as needed at any given time reacting dynamically to a workload inside a system it's a lot of refinement it's continuous Improvement uh generation over generation that adds up to Big differences uh at the system level well and what are we talking about in terms of power that we're going to be seeing that can maintain the cooling and also the sustainability requirements there's a lot of green action going on sustainability goals this is a big part of this new metric absolutely uh you know so the processor has David mentioned are you know right now our processor to consume on the high end about 350 watts and the gpus can consume close to 1000 Watts as we look into the future and you need to have efficient solution at all levels uh to to call these Solutions and when we talk about cooling solutions especially it's not about can you cool with air or can you cool with like what if the question is technical economically and sustainably with any given Solution that's what you're looking for and so this is with the right solution uh right solution for the right problem uh right so at some point when you can you can always come up with a air cooling solution but the problem is the power of the cooling solution is gonna you know put a dent on your pocketbook and that's that's the point where you cut over into Technologies like look at cool language we call plate or immersion cooling various things so we are always driven by what we call PCO total cost of ownership right what is the optimal solution for your total cost of ownership if it is a it's a if it is Coldplay it's Coldplay if it's immersion cooling if it's a motion cooling right and you you have the Spectrum cover so we can hit All Points you know as we go as we increase the power of the platform increase the efficiency of the platform uh we are able to do that and one additional Point here is that since you brought up sustainability even if there was no power issue to dealt with a lot of folks are looking at you know liquid cooling solutions simply because it's more sustainable because in general uh liquids like let's say take water for an example it's an order of magnitude more efficient it's conducting heat away compared to air so that gives you the efficiency that then translates to reduced power that contributes to your sustainability value how does the liquid cooling solutions today compare to previous Generations uh so you know liquid cooling has an interesting history that goes back to I believe the first patent was somewhere in the 50s for cooling capacitors on on on the street Transformers uh right that's there and then it moved from there to the supercomputers which were in liquid nitrous and baths in in the good old days in in the 70s and so we keep in technology it's a very interesting phenomena we keep the Reinventing things uh it's so it's it's not that so the the domain shifts over is basically what happens right so what used to be in the domain of capacitors mode or the supercomputers now it's moving to mainstream servers so what we are doing now is taking those principles that have worked effectively elsewhere and we are applying the same thing to cooling chips and server platforms yeah we hear a lot of people talking about direct liquid cooling compared to just other cooling solutions especially in the racks what's the what's the Advan images of the direct liquid cooling can you just put it quantify that or give commentary I'll jump in um so uh for the direct liquid cooling what we're doing is we're trying to match the heat load to the cooling system right so uh that's if you go back to engineering and thermodynamics that's the best way to be efficient you don't want to overpower uh your cooling if you don't need it you want to match that well so when you are placing a cold plate which is just a little box typically with a copper base and you're running water through that you're you're putting that really effective cooling right on the the heat Source everything else in the chassis can typically be cooled pretty efficiently with relatively low powered fans and so you're able to significantly decrease the total energy required to cool while enabling what we see is enabling uh chip Powers you know well past 1000 Watts we don't see a real limitation right now from the the Silicon vendors as far as the roadmap and and being able to use DLC to cool it what's the role of the industry playing on standards is there a lock in is it open can you guys share it's been discussion around you know worried about lock-in from a particular cooling solution or provider what role does the industry standards play in in the cooling area it seems super valuable especially when you have racks exceeding more than some of the numbers you guys are quoting there when you have more gpus and CPUs yeah again I'll I'll offer this and and just um at this time there's not a lot of standard um that exists it's every system tends to be kind of a one-off design from the chip to the the facility water um but there are efforts going on from the open compute project uh through um in the United States ashrae the American Society of heating Refrigeration air conditioning Engineers to Ahri which is the American heating Refrigeration Institute um they're all involved now in developing liquid cooled equipment standards that will open up the ecosystem and make it a lot easier uh for the components to be interchanged uh the idea is to open up the the ecosystem for Innovation and for more uh competition in the in the ecosystem which we anticipate will also make the technology more affordable so um we're migrating towards standards but we're not there yet it's definitely really important to be able to enable the type of scaling that we see is necessary to support the the compute innovations that are coming so hitting the levels now you can support the heat now what are some of the ReUse benefits there's been discussions around uh position to solve some of these challenges lower the T cases what are some of the most effective Solutions out there how do we do this efficiently so I think you've covered a couple of points there so let me talk about Reus first right so one of the reasons especially like immersion pooling is very interesting to folks is uh is because it allows you to have an outlet temperature that's much higher and you can utilize that outlet water temperature that's much higher for uh today what we do is to essentially we we pay we pay for you know removing the Heat and then we pay for rejecting the heat right you pay twice so what they want to get to is that once that heat has been removed from the platform you want to take that heat and you make it do a useful thing like you know if you're in a building maybe heat up the building uh you know or uh in in mid-latitudes in America they're using it for Greenhouse essentially they're pumping the heat into a greenhouse where it's you know maybe you know 30 or 40 Fahrenheit so you can you can grow vegetables there and uh in other countries where they have heat water Loops that supply to the homes they're using it essentially to supply heat hot water into the homes using the data center so data center is essentially instead of you know you paying to reject heat and causing an environmental impact in that you're actually benefiting the society through the data center business which is an amazing thing uh transformation to happen and that's real efficiency talk about leverage there I mean that's benefit Society green and turning it turn into societal benefits what about the other challenges around effective Solutions around higher tdps and lower T cases uh yeah so one of the one of the benefits of going down this path is that weed allows us to go for a higher GDP a thermal design power so which means we can deliver higher performance and if we can deliver higher performance in this in a smaller footprint then you need you know your overall overall uh volumetric space in which you're delivering the performance basically goes down so it's a lot more sustainable solution for you for you to have right and then having more efficient cooling solution means you can go for a lower lower TKS because you're able to you have the ability to reject that heat essentially and that plays into essentially higher ports again into higher performance that you can deliver to the customer Dave let's bring you in here you're the power Edge cooling product manager you got to make it all work with the with the products um I'm just going to say I'm sitting here I'm sitting here with these technologists that they're they're experts in this field they they I'm sure uh have excellent Vision into the future on how the different Technologies work and how they scale I work with the customers today that are trying to take these great ideas but how do they map back into some of the constraints that that our customers face today when they may have had a Data Center built 20 years ago you know uh so we we do work with our customers to make sure that um transitions to take advantage of these latest liquid cooled Solutions and there are a variety of liquid cooled Solutions um you know as a standard we offer direct liquid cooling but we also support immersion Cooling and other Solutions um you can do it at the system level you can do it at the rack level there's a lot of ways to apply liquid cooling and so we work with our customers to try to figure out what works best for their constraints what works best for their budgets what works best for their timelines and we're really at the beginning right now of deploying processors that are stressing that air cooling threshold so for a lot of customers they're still air cooling they're going to continue to leverage that equipment in their Data Center and they see their next step is when they're going to have to start considering liquid cooling others are already there they're they're comfortable with it it's running uh well it's accomplishing the goals uh and they're developing a skill set and how to manage it uh so every customer runs at their own pace and it's important that you know if Dell and Intel as we work with customers uh help our customers at the pace that's comfortable for them Tim mentioned in mode also had a comment on the other side Tim mentioned the power per racket should go exceeding the numbers Moen talked about the future of powering new use cases where there's benefits that come out of the heat reuse and water cooling the customers have the rod of racks they could have old racks this is a web performance per Rec power rack becomes in what's the Innovation around the racks whether they have to have old racks or new racks how are people stacking up their data centers because you know we're seeing more and more data centers being deployed not only for the hyperscale is but for everybody I mean we've got Edge coming around the corner you're gonna have a lot footprint challenges with the intelligent Edge coming so this is a real going to be a power and cooling challenge as you get more density yeah uh yeah I mean that's that's a great question and uh I would say very roughly if you were to break this into helping customers who have existing data centers and they're used to working with certain rack Footprints power distribution schemes uh it looks like incremental improvements and we try to make the latest Technologies digestible in bytes that that work customers that are starting with a green field oh clean sheet of paper um there are a lot of opportunities to to be creative to start with starting with um yeah getting power distribution uh oriented around density from the very beginning high power high voltage uh power distribution um tall racks uh you know Plumbing water into the data center for every rack position from the start so that you're future proof um but again it gets back to customers move at their own pace we have to give them options yeah one of the things I want to ask you on the product side because Dell's well known for modularity interchangeability increased Innovation or every year lower cost I mean come on that's the Dell formula what's the benefits for the End customer on on this area because this is a very important area they got to get the more power and there's a sustainability targets they want to meet too what are the key benefits to the customer well I'll jump in um in one of the innovations that we are driving is in partnership with uh members of the open compute project uh we are actively supporting the dcmhs along with Intel um it's a data center modular hardware systems uh this is you know at its core allowing that flexibility allowing uh you know oems like Dell to be able to incorporate the latest and greatest silicon into a standard format that then slides into a rack with disaggregated power um so that's where instead of you having pdus uh vertical pdus in the rear of the rack we now have power supplies and power shelves that are spread throughout the rack and then we've distribute DC power this offers a lot of advantages both in terms of uh more space in the computer platforms for for doing compute but also in terms of efficiency and sustainability because now we can have these optimized power supplies that are are available to provide Power to everything in the rack and so that also goes to the cooling because now we can have these manifolds in the back of the rack that that the servers can just slide right into all comes back to standards of course and the ability to be able to slide uh compute nodes in and out easily but this provides a promise of modularity affordability and interoperability for the future now that's not a today's statement but it's certainly you know publicly known that we're working together with Intel and others to develop these sorts of modular and efficient systems and by the way the open compute organization is a phenomenal group we covered their inaugural event many years ago with the cube and just they've had a great track record so congratulations is super important for the industry this area of sustainability and network efficiency another question for Intel you got the Intel Dell relationship well-documented successful over many many years and generations question for you what is Intel doing to increase the performance being mindful of the cooling challenges around sustainability and how are you working with oems such as Dell to create efficient cooling solutions for these new hyper-powered processors thank you John uh so first of all we have uh offerings that kind of Target uh these markets so we have a q's Cube that's uh targeted towards immersion we have optimized heatsinks that Target liquid cooling based Solutions and we have above all we have this know what Left Behind uh approach to solving uh solving the performance problems so we want to maximize the performance at the optimal uh Power footprint for you so we are in every generation we try to make our processors more power efficient we have built-in accelerator that's give you you know 10 to 15x yeah performance energy performance Improvement uh but for you compared to uh compared to the alternative and we have the right solution for the right problem right we have not just processors we have gpus and AI Solutions so put it all together uh we try to cover all the bases in there as far as our uh partnership with Intel a few of them were talked about earlier right so we work with them closely directly as as our OEM partner and also with them in these public forums like ocp and clean grade and Ashley and so on to make sure the right standards are in place for us to take advantage of the the liquid cooling immersion cooling it's sustainable day efforts like dcmhs that uh John referred to uh in here right so by we are so we have an approach to essentially provide them the solutions we partner with them closely when we go and because these type of solutions are not one percent problem these are uh definitely we need the OEM partner they need us and we need them and so we work tightly together to give the customer a solution that they can uh utilize as opposed to giving them ingredient pieces that they have put together yeah and the Neo has been moved on the sustainability side you guys doing a great job power Edge uh great name I'll always love that name more Power Saves power Next Generation you got a great product there David and uh and thanks for coming on Tim I'll give you the final word since the engineering technology at the office of the ctio which stands for chief Technology Innovation office what is going on there what's your what are you most excited about right now as you look out you got the standards bodies coming together you got a real momentum accelerating into the efficiency and Energy savings sustainability is looking good people are all on point not just mailing it in there's some real action what are you most excited about yeah that's actually really true and that has been one of the uh one of the most encouraging and exciting things um that I've seen recently in Dell is as we are developing these new high performance platforms uh in uh employing the ocp you know rv3 racks and dcmhs um also up front front and center is sustainability how are we accounting for that how are we best taking advantage of the power saving features that Intel is providing us how are we taking advantage of the power saving features in these new power supplies in even network cards and so on and uh it's been really exciting to be a part of this and to see how we can enable uh really the compute Solutions of the future um in a way that our consumers can really benefit from our customers can really benefit from uh while um you know also being completely Cutting Edge uh high-powered and uh sustainable well gentlemen thank you for your time and we did a full interview without talking about AI um so we can't leave it there we have to bring it up as a final question um we did a survey to our Cube Alumni network technical network of infrastructure cloud and on-premise uh friends and we had about 50 people we asked are you using AI most of them said they're going in for low-hanging fruit around helping around automation cost optimization Network optimization and and these low-hanging fruit use cases just final question for each of you are you seeing AI coming in to help with some of the some of the hard heavy undifferentiated heavy lifting in the area around getting more efficiency I would just thought I'd throw that out there as a lightning round anyone want to take a shot at that I'll start with that simply because our the focus for my team is to enable people to use AI more than how we're applying it to our product planning today um there's a lot of good new accelerator-based solutions to support customers in every industry to leverage Ai and you know the the efficiency factors and cooling these high Power Systems is uh are my maniacal Focus are they well they eat the power they want more power gpus I mean you can't get enough yeah gpus to get anything large multimodal models Foundation models yeah so on the AI it's it's a two-fold answer I would agree that with David that you know our job is to enable these solutions that AI can play in but we also see that AI has got a tremendous role to play where as a platform and a server system we have capabilities and then there are data center that has Cooling and systems that are typically tend to operate independently and there is a way now for an AI to come in and essentially say oh I see how these systems are being cooled and on the basis of this cooling I can you know change the temperatures I can move the needles on the various things and I can improve your performance I can give you better uh better sustainability so various knobs that it can turn and so that extend machine learning can be applied in those spaces it's an exciting place to be yeah it's exciting okay Tim man take a shot at that any AI comments on the internet all right I'll support that I would say you know and if you've heard recent comments from our our uh you know chairman uh Michael Dell um basically if you're not using AI you're you're leaving you know you're leaving ideas and performance on the table and so um we are uh we are aggressively supporting our customers in the space uh while exploring how we can best use this um to improve our products at potentially faster Pace um and provide the sorts of uh efficiency gains and sustainability gains that Mohan just uh outlined absolutely or it's it's part of the game now you guys are right you're in the center of the action you're on both sides you're enabling more AI to be smarter faster cheaper and at the same time you can use it to make efficiency on the sustainability energy side so super super cool pun intended thanks for coming on thecube appreciate it gentlemen thanks for your time thank you very much coverage of ISC high performance 2023 we're covering all things HPC machine learning AI high performance analytics and Computing I'm John Furrier your host thanks for watching [Music]
2023-05-29 21:06