Shreya Shah, Dell Technologies & Dave King, Denvr Dataworks | SC23

Shreya Shah, Dell Technologies & Dave King, Denvr Dataworks | SC23

Show Video

hey everyone welcome to the cube the leader in Live tech coverage covering supercomputing 2023 from the Maya High City Denver Colorado I'm Lisa Martin with Dave Nicholson Dave this is my first superc compute you were here last year I'm so excited yeah to learn about what's going on in HPC Quantum AI we're going to have some great conversations the next 3 days looking forward to it looking forward to it we've got an alumni back with us we have have sha Shaw the portfolio manager at D Technologies and another Dave we've got a two Dave quota set here Dave King the co-founder of Denver data Works welcome both of you thank you so much for joining us thank you thank you sh talk to us about the state of the market today there's it's to say it's Dynamic is a massive understatement what are some of the things that you guys are seeing and then we'll get into the partnership with Denver absolutely yeah one of the biggest trends that we're seeing in the industry right now is that um the power needs are going going up significantly and this is primarily because of tdps or thermally dissipated power for silicon um that is exploding right so when we say silicon we're talking about CPUs we're talking about gpus and what we're seeing across our customer said is that we have folks that are sitting at single digigit kilowatt per rack power and there's a spectrum and at the high end of the spectrum we've got folks that can support north of 100 Kow per rack and so you know as we think about being able to harness the power of AI and as those computational needs grow um we're seeing that there is a deficit in the the Demand versus the supply from a data center aspect cooling power computational needs and so in the next couple of years we're seeing customers quickly trying to Pivot their infrastructure to be able to realize and be able to productize be able to to support these in their data centers whether it's existing or you know new colos whatever that may be right um and so this data center um Journey or transformation call it Evolution or revolution wherever you may be that's already in effect and I'm I'm kind of thinking about it as data center. nextext and so how do uh how does Dell and Denver data Works come together to help answer and solve some of these problems that are emerging so quickly is what we're really excited about Dave give us a little bit of a backstory on Denver data Works ironically de and VR we're in Denver so I thought I got to bring out that irony right there Denver and Dave and Dave right but you're based in Alberta correct we are based we're a Calgary Alberta based company in western Canada is Mission Vision what was the Catalyst to launch the company back in 2017 yeah so 5 years ago as we founded the company we realized this Collision Course was setting up the computes getting hotter it's getting faster and for AI you need clusters of computers right so the single server isn't a thing anymore right you actually need to train AI models you need hundreds of servers in some cases so as you put that pressure on the infrastructure to outp put hundreds of very hot computers we started to look at the infrastructure and figure out where could we go to make a difference and Tria talk about the why Dell has chosen to partner with Denver data Works to really tackle some of the challenges and address the trends that you're saying well it comes down to partnership um you know if we again if we look at our customer set there's a huge breath of of who we need to support we want to provide um flexibility we want to provide diversity diversity in terms of silicon Choice diversity in terms of your interconnect um the diversity in terms of being able to meet the customer in their Journey wherever they may be and so when we think about partnership and that last mile or how do we actually make everything work into end this is where we're very excited to to work with um Denver data works you mentioned that uh you're not talking about just single servers doing these things you mentioned interconnect um there are a lot of folks who were talking about this as the connectivity Centric era uh right alongside the G GPU CPU Etc you know importance of those things so you're setting up very very complex environments you are partnered with Dell Building worldclass Technology that technology could be deployed in someone's data center what's the pitch for relying on your expertise instead of having people rack and stack it in their own data center I assume Dell is Dell is fine either way we're hearing from a lot of cios and CTO that they just don't have the time to do that right that you know it's it's they call it the fomo fear of missing out Factor call it the desire for time to Market uh but talk to us about the value proposition that you're bringing because yeah in theory I could Cobble it together on my own right yeah but it's hard right it's complex so what Denver's done is we focused on a hybrid approach where we can bring the data center to your data so one of the C's challenges is also security right sovereignty of their data if I have Healthcare data or something do I want to send it up to the cloud or somewhere I'm probably not even allowed to so what we've done is we've built modular data centers that are fast to deploy and we offer them as a service so they don't have to get into the large Capital expenditures and get into worrying about where their data is going because we're bringing the cloud to them so in a modular way we bring things forward and there's an environmental angle we've built our data centers so they don't use water for example so there something completely different that we're moving towards and in the meantime we're also deploying extremely large clusters with Dell um and Dell Professional Services because it's tough to put these things together we're deploying large clusters in Houston for example with extremely large language models being trained there on traditional data center clusters all put together with the shared expertise over time we'll bring those to you instead of you coming to the data center so oh go ahead yeah no no so is this so um are these would you consider these multi-tenant or single tenant environments or a combination of the two It's a combination yes we offer a cloud as a service right so it be multi-tenant environment where you're running on an Nvidia superod for example uh and there could be others sharing it with you or it could be your own so it's very flexible so we build it in a flexible way so you can come and go and to make it work with all the pressure that there is on the application set um we follow all the best practices and standards right so this is an Enterprise grade solution not typically something that most companies have the ability to build themselves it's it's hard shre share the impetus or the Catalyst for Dell engaging with Denver data Works was that was that a customer-driven you talked about meeting customers where they are in this journey this transformation this Revolution was that customer-driven I think there's there's a lot of factors right um one of the things that and I go back to this this data center transformation um one of the capabilities that Dave and team have kind of brought to light is immersion cooling okay um if we look at the the journey that customers typically take you have uh your free air cooling that you start with um much more from a a simpler way to deploy then you have your lack or liquid assisted air cooling which is self-contained cooling within the node you have open lack which is um self-contained cooling within the CDU the central distribution unit the cool cooling distribution unit and then you have your rear door heat exchanger where you know you're starting to do the plumbing for your facility water and that's where it gets very very complicated and then you get into your you know liquid cooling hybrid and Air complete liquid cooling and then you have your immersion Cooling and so that spectrum and the breath that we have to provide to the customer with the MDC capability and with the immersion cooling capabilities we're very very excited that we can service the customers not just at the the low end the mid-end but also at the high end that breath is impressive yes significant so we always do our due diligence before coming in and talking to to folks in this context you have a PhD and her job title is immersion yes and I will freely admit that I looked at that and I thought huh I didn't know what that meant and now I do yeah so Amy short is fantastic right and she is in fact uh PhD she's a chemist as much as she is a technician and a technical expert and so Amy runs our immersion plans because we absolutely submerge Dell servers right they're swimming in the hot tub and we cool them that way uh without the use of water and it's a place where we can get you mentioned some rack densities in your opening piece um today we've been running for 15 months with zero Hardware failures in immersion at production scale um and we can run sort of 150 kilowatts a rack easily and our technology can cool another 30 or 40% above that for the next Generations of whatever is going to be launched next year by Nvidia and AMD and Intel and Dell right we're ready for it from an environmental and a cooling perspective from an environmental perspective a cooling perspective share with us some of the main customer pain points or challenges you guys together are taking off the table do you want to go for I can go first sure I I mean a lot of the data centers can't handle the heat right so literally you can only put one one or two of these brand new servers the Dell XE 9680 is one of our favorite products we've been buying uh Handover Fist from Dell this year as many as we could get actually um and so you can only put one or two of those in a rack think of a 7 foot high rack and there's only two servers in it because the building can't handle any more heat density than that per square foot and so you're running out of refrigeration power cuz air just doesn't move enough so you can't put three or four or five or 10 in our racks we put 16 side by side wow because of our capability so you're gone from sort of two in this space of this table to 16 in the space of this table and so that's a problem because to run a large language model you may need a thousand servers so now you need a football field size building with 7 or 800 racks for all the networking connectivity and the servers so we take a football field and we compress it down into something that's under 900 sare feet and so there's some pain points there for a few things right it's expensive to operate a football field size building like the auditorium we're in today uh and try to keep things cool and so we found we think we found a better way anything that you would add in terms of the challenges that you're really knocking off the table Yeah I think with this generative AI craze um everybody wants to harness the power of AI like I said um but that's actually in in in in conflict with some of our carbon emissions you know the the reduction and the goals that we have so how do we bring those together and I think there's a lot of innovation in this space that will help us get there one of the things that you know we've talked about and I believe you know we talked about this previously as well is the heat capture how do you optimize that and then how do you reuse that um so that you have that circular you know um Loop that you can sort of minimize your footprint as as you're going up in your your um uh computational need you want to be able to grow some strawberries as well actually green houses are are one of those first I'd say um look into how can you recycle that energy how do you re reuse that energy but you know the sky's is the limit here how you know how can you power your or how can you cool your um your industrial you know systems and and you know your buildings and and houses and and pools and whatever that may be well Dave mentioned hot tubs earlier I imagine that the natural solution would be to just have like a spa on the outside wall of the data center it would be perfect a little less secure environment but you know exactly who knows right we'll see what the future holds think think of on the surface think of how inefficient it is to heat a hot tub with gpus however with with the extra Laten heat but so that's so it's a very serious consideration when you talk about heat dissipation and it's not as simple if I'm hearing this right it's not as simple as well I have a data center I've got a raised floor I have power I have cooling maybe I'll just rack this stuff up it's not that simple if I'm hearing correctly that's exactly right because yeah it's it's a it's a it's a classic engineering and physics problem air can only move so much heat so does that then mean that um you know we've seen this move this mix of it infrastructure being deployed on premises and in the cloud or off premises however you want to Define that do you think that Ai and the sort of the wind behind the AI sales is going to drive more people because of Dynamics like this to do things as a service will are you seeing at Dell as a service the default method for folks seeking to do modeling to start out or are you seeing it hybrid still what's what's you know I know you know you can be the Arms Dealer and provide to Partners Hardware that you build and that could be a mix of directly to customers through Partners but what what are what are you seeing in that regard I I don't think the answer is either or it's an an and it's a hybrid right depending on your workload and wh you are in that Journey because training doesn't necessarily have to be the largest model size you could be doing some fine-tuning or you could be doing some training with much smaller models and depending on you know some of the things that that they brought up security um Hippa for example right um that will force you to consider a hybrid approach and I think going forward that's really not going to change and so how do you make them work together is going to be the key to success that makes sense let's talk about sustainability we can't have conversations about power and Cooling and heat dissipation without really understanding how you can enable organizations to what's your vision for sustainable AI in the future question to both of you yeah I think from a Denver point of view we focus on a few things one of them is water use right outside of the carbon neutral and Net Zero and all the other slogans um data centers use a lot of water right and you can Loop it around a few times and do other things but at the same time when you pull water from an aquifer or a river it's no longer in the aquafer in the river right it's it's now gone to a different place so we've really focused on that as one of the key things and then Energy Efficiency and land efficiency right if you imagine it's a little different in North America or a little easier here with more land but if you imagine um Europe and massive million squ foot data centers they the density of people and land and their uh the environmental conscience there this is a real challenge for them to join the AI wave uh and this technology set because it's hard to build football field-sized data centers that meet the new environmental regulations everyone's focused on so for us it's a whole package right it's a combination of things including right down to land use what's the most efficient use should we put a bunch of servers all over the place or can we compress that in some way right I don't know that Denver has all the answers for that but at least we're trying right we're taking a shot at it yeah and what you're doing isn't happening obviously in a vacuum uh you know Dell is doing work to optimize Hardware you're optimizing at the data center layer but if you look at it holistically the hope is that yes we're using resources we're using energy that's going to drive carbon up however if the insight's gained by doing the work with AI Pro to be what we believe they will be the efficiencies gained are going to be able to drive carbon down in other areas that you may never personally be aware of right right so you've got data scientists and teams working on things and what you know is wow my data center is nearly on fire but since it's not on fire we're doing a great job but what you don't know maybe is that they are in fact doing Material Science that could not have been done otherwise that is transforming the way that other production things are being done so I like to be an optimist in that regard feel free to join me well sure right CU yeah there's the the Terminator view of AI right is that a problem uh or is AI a beneficial thing and there's AI for good right and we're on your side of the optimism side we want to make sure that it's also good for the planet and its resources right so we need to take a stab at that as well so now you get a compounding effect right do you guys have a favorite customer example that you think really articulates the value together what you're delivering to customers even even mentioning by industry works yeah you know I think we're we're very active together um in a few places um the first is uh the large language model groups right so everyone's uh jumping on that and you see all of the the top 20 startups um that everyone's well aware of in large language model space so we're working very closely together with some of those but interesting um is we're seeing a lot of collaboration between Denver and Dell in Enterprise right so Enterprise whether it's a digital twin of a factory right or it's optimizing operations and those sorts of things the technology is starting to come there it's early days for the Enterprise but obviously Dell is uh if not the world leader certainly one of the world leaders in that space in the Enterprise by market share or any other measure um and so we're thrilled to partner with Dell because we think there's a second wave here that you'll see beyond the research institutes and the large language model you know billion dollar fundings we're seeing we're going to see regular Rank and file Enterprises show up and we're starting to see that we're seeing a lot of activity from cities interesting right optimizing Transportation right or the sewer system uh or uh citizen access with large language models call into the helpline and if you can only speak Portuguese the operator speaking English you can now talk because of large language models in the middle so we're seeing customer service applications like that that were working collaboratively on all the time I think to add to that one of the things that I I want to also call out is that we've all been in the AI craze but let's not forget HPC and high performance Computing and modeling and simulation that has been around for so long it's not going anywhere oh shre that was so 2022 didn't you see the Llama walking around outside but in all seriousness you know AI has has been brought up on the backbone of of HPC is what I say from a data center aspect and you'll still be doing inferencing and you'll still be doing training on just the CPUs for as an example right and so this breath of customers and the workloads it's really not going anywhere AI is taken off you know on a trajectory of its own so how do you you know service all these different customers with different needs is where you know we're positioned really well to go and attack the market yeah we see often conversations about AI if you look just under the covers you realize this is really machine learning it's not really strictly Ai and then if you look specifically at the projects that we see cios and CTO involved with you can you can apply that sort of 8020 rule at least in the Enterprise where 80% of it is sort of garden variety optimization of processes that are the lifeblood of running a business so if you're a CIO responsible for keeping the lights on and innovating a lot of your resources are going towards this keeping the lights on running the business activity it's never going to make the cover of the Wall Street Journal that you drove efficiency by 133% in some process but a lot of what is called AI that turns out to be machine learning is in the op on the optimization side the headlines are always going to be the really sexy cool things that's right and and especially the kinds of things that you see here I mean you walk by we've got the NSA down the down the down the block here NASA uh all of these institutions of Higher Learning um but where the rubber meets the road in the Enterprise where Dell has so much experience um that's what we're seeing we're seeing a lot of optimization which there's nothing wrong with that at all yeah so with all this momentum excitement you're you're moving fast and furiously what are some of the things that are next that we can expect to see from d and Denver together I think one of the big things that um we hope to see is more often Open Standards um the cooling that we talked about the um you know open Rack or V3 as an example giving customers the flexibility and the choice we're going to continue working on that we're going to continue um we're going to continue partnering on that and you'll see a lot more from both Denver and Dell together well we will be keeping our eyes on this space sh Dave thank you so much for joining Dave and me on the program sharing your insights what you're doing together to really enable sustainable Ai and enable a lot of optimization for Enterprises across the world we we will definitely keep watching this space thank you thank you thanks for having us our pleasure for our guest I'm Dave Nicholson I'm Lisa Martin you're watching the Cube live from the mile high City Denver Colorado at sc23 be back after a short break [Music]

2023-11-23 14:49

Show Video

Other news