AMD Presents: Advancing AI

AMD Presents: Advancing AI

Show Video

Hey, good morning. Good morning, everyone. Welcome to all of you who are joining us here in Silicon Valley and to everyone who's joining us online from around the world. It has been just an incredibly exciting year with all of the new products and all the innovation that has come across our business and our industry. But today, it's all about AI. We have a lot of new AI solutions to launch today and to news to share with you, so let's go ahead and get started.

Now, I know we've all felt this this year. I mean, it's been just an amazing year. I mean, if you think about it, a year ago, OpenAI unveiled ChatGPT. And it's really sparked a revolution that has totally reshaped the technology landscape.

In this just short amount of time, AI hasn't just progressed. It's actually exploded. The year has shown us that AI isn't just kind of a cool new thing. It's actually the future of computing. And at AMD, when we think about it, we actually view AI as the single most transformational technology over the last 50 years. Maybe the only thing that has been close has been the introduction of the internet.

But what's different about AI is that the adoption rate is just much, much faster. So although so much has happened, the truth is right now, we're just at the very beginning of the AI era. And we can see how it's so capable of touching every aspect of our lives. So if you guys just take a step back and just look, I mean, AI is already being used everywhere. Think about improving healthcare, accelerating climate research, enabling personal assistance for all of us and for greater business productivity, things like industrial robotics, security, and providing lots of new tools for content creators.

Now the key to all of this is generative AI. It requires a significant investment in new infrastructure. And that's to enable training and all of the inference that's needed. And that market is just huge. Now a year ago when we were thinking about AI, we were super excited. And we estimated the data center AI accelerator market would grow approximately 50% annually over the next few years, from something like $30 billion in 2023 to more than $150 billion in 2027.

And that felt like a big number. However, as we look at everything that's happened in the last 12 months and the rate and pace of adoption that we're seeing across the industry, across our customers, across the world, it's really clear that the demand is just growing much, much faster. So if you look at now to enable AI infrastructure, of course it starts with the cloud, but it goes into the enterprise. We believe we'll see plenty of AI throughout the embedded markets and into personal computing.

We're now expecting that the data center accelerator TAM will grow more than 70% annually over the next four years to over 400 billion in 2027. So does that sound exciting for us as an industry? I have to say for someone like me who's been in the industry for a while, this pace of innovation is faster than anything I've ever seen before. And for us at AMD, we are so well positioned to power that end-to-end infrastructure that defines this new AI era.

So speaking about massive cloud server installations to we're going to talk about on-prem enterprise clusters to the next generation of AI in embedded and PCs, our AI strategy is really centered around three big strategic priorities. First, we must deliver a broad portfolio of very performant, energy-efficient GPUs, CPUs, and adaptive computing solutions for AI training and inference. And we believe, frankly, that you're going to need all of these pieces for AI. Second, it's really about expanding our open, proven, and being very developer-friendly in our software platform to ensure that leading AI frameworks, libraries, and models are all fully enabled for AMD hardware and that it's really easy for people to use.

And then third, it's really about partnership. You're going to see a lot of partners today. That's who we are as a company. It's about expanding the co-innovation work and working with all parts of the ecosystem, including cloud providers, OEMs, software developers.

You're going to hear from some really AI leaders in the industry to really accelerate how we work together and get that widespread deployment of our solutions across the board. So we have so much to share with you today. I'd like to get started. And of course, let's start with the cloud. Generative AI is the most demanding data center workload ever. It requires tens of thousands of accelerators to train and refine models with billions of parameters.

And that same infrastructure is also needed to answer the millions of queries from everyone around the world to these smart models. And it's very simple. The more compute you have, the more capable the model, the faster the answers are generated.

And the GPU is at the center of this generative AI world. And right now, I think we all know it, everyone I've talked to says it, the availability and capability of GPU compute is the single most important driver of AI adoption. Do you guys agree with that? So that's why I'm so excited today to launch our Instinct MI300X. It's the highest performance accelerator in the world for generative AI.

MI300X is actually built on our new CDNA 3 data center architecture. And it's optimized for performance and power efficiency. CDNA 3 has a lot of new features. It combines a new compute engine. It supports sparsity, the latest data formats, including FP8. It has industry-leading memory capacity and bandwidth.

And we're going to talk a lot about memory today. And it's built on the most advanced process technologies and 3D packaging. So if you compare it to our previous generation, which frankly was also very good, CDNA 3 actually delivers more than three times higher performance for key AI data types, like FP16 and BF16, and a nearly seven times increase in int tape performance. So if you look underneath it, how do we get MI300X? It's actually 153 billion transistors, 153 billion.

It's across a dozen 5-nanometer and 6-nanometer chiplets. It uses the most advanced packaging in the world. And if you take a look at how we put it together, it's actually pretty amazing. We start with four IO die in the base layer. And what we have on the IO dies are 256 megabytes of infinity cache and all of the next-gen IO that you need.

Things like 128-channel HBM3 interfaces, PCIe Gen 5 support, our fourth-gen infinity fabric that connects multiple MI300Xs so that we get 896 gigabytes per second. And then we stack eight CDNA 3 accelerator chiplets, or XCDs, on top of the IO die. And that's where we deliver 1.3 petaflops of FP16 and 2.6 petaflops of FP8 performance. And then we connect these 304 compute units with dense through-silicon vias, or TSVs, and that supports up to 17 terabytes per second of bandwidth. And of course, to take advantage of all of this compute, we connect eight stacks of HBM3 for a total of 192 gigabytes of memory at 5.3 terabytes per second of bandwidth. That's a lot of stuff on that.

I have to say, it's truly the most advanced product we've ever built, and it is the most advanced AI accelerator in the industry. Now let's talk about some of the performance and why it's so great. For generative AI, memory capacity and bandwidth are really important for performance.

If you look at MI300X, we made a very conscious decision to add more flexibility, more memory capacity, and more bandwidth, and what that translates to is 2.4 times more memory capacity and 1.6 times more memory bandwidth than the competition. Now when you run things like lower precision data types that are widely used in LLMs, the new CDNA 3 compute units and memory density actually enable MI300X to deliver 1.3 times more teraflops of FP8 and FP16 performance than the competition. Now these are good numbers, but what's more important is how things look in real world inference workloads.

So let's start with some of the most common kernels used by the latest AI models. LLMs use attention algorithms to generate precise results. So for something like FlashAttention-2 kernels, MI300X actually delivers up to 1.2 times better

performance than the competition. And if you look at something like the Llama 2 70B LLM, and we're going to use this a lot throughout the show, MI300X again delivers up to 1.2 times more performance. And what this means is the performance at the kernel level actually directly translates into faster results when running LLMs on a single MI300X accelerator. But we also know, we talked about these models getting so large, so what's really important is how that AI performance scales when you go to the platform level and beyond. So let's take a look at how MI300X scales. Let's start first with training.

Training is really hard. People talk about how hard training is. When you look at something like the 30 billion parameter model from Databricks, MPT LLM, it's a pretty good example of something that is used by multiple enterprises for a lot of different things. And you can see here that the training performance for MI300X is actually equal to the competition. And that means it's actually a very, very competitive training platform today.

But when you turn to the inference performance of MI300X, this is where our performance really shines. We're showing some data here, measured data on two widely used models, Bloom 176B. It's the world's largest open multi-language AI model.

It generates text in 46 languages. And our Llama 2 70B, which is also very popular, as I said, for enterprise customers. And what we see in this case is a single server with eight MI300X accelerators is substantially faster than the competition, 1.4 to 1.6X. So these are pretty big numbers here. And what this performance does is it just directly translates into a better user experience. You guys have used it.

When you ask the model something, you'd like it to come back faster, especially as the responses get more complicated. So that gives you a view of the performance of MI300X. Now excited as we are about the performance, we are even more excited about the work we're doing with our partners. So let me turn to our first guest, very, very special. Microsoft is truly a visionary leader in AI. We've been so fortunate to have a deep partnership with Microsoft for many, many years across all aspects of our business.

And the work we're doing today in AI is truly taking that partnership to the next level. So here to tell us more about that is Microsoft's Chief Technology Officer, Kevin Scott. Kevin, it is so great to see you. Thank you so much for being here with us.

It's a real pleasure to be here with you all today. We've done so much work together on EPYC and Instinct over the years. Can you just tell our audience a little bit about that partnership? Yeah, I think Microsoft and AMD have a very special partnership. And as you mentioned, it has been one that we've enjoyed for a really long time.

It started with the PC. It continued then with a bunch of custom silicon work that we've done together over the years on Xbox. It's extended through the work that we've done with you all on EPYC for the high-performance computing workloads that we have in our cloud.

And like the thing that I've been spending a bunch of time with you all on the past couple of years, like actually a little bit longer even, is on AI compute, which I think everybody now understands how important it is to driving progress on this new platform that we're trying to deliver to the world. I have to say we talk pretty often. We do. But Kevin, what I admire so much is just your vision, Satya's vision about where AI is going in the industry.

So can you just give us a perspective of where are we on this journey? Yeah, so we have been with a huge amount of intensity over the past five years or so, been trying to prepare for the moment that I think we brought the world into over the past year. So it is almost a year to the day since the launch of ChatGPT, which I think is perhaps most people's first contact with this new wave of generative AI. But the thing that allowed Microsoft and OpenAI to do this was just a deep amount of infrastructure work that we've been investing in for a very long while. And one of the things that we realized fairly early in our journey is just how important compute was going to be and just how important it is to think about the sort of full systems optimization.

So the work that we've been doing with you all has been not just about figuring out what the silicon architecture looks like, but that's been a very important thing and making sure that we together are building things that are going to intercept where the actual platform is going to be years in advance, but also just doing all of that software work that needs to be done to make this thing usable by all the developers of the world. I think that's really key. I think sometimes people don't understand, they think about AI as this year, but the truth is we've been building the foundation for so many years. Kevin, I want to take this moment to really acknowledge that Microsoft has been so instrumental in our AI journey. The work we've done over the last several generations, the software work that we're doing, the platform work that we're doing, we're super excited for this moment. Now I know you guys just had Ignite recently and Satya previewed some of the stuff you're doing with 300X, but can you share that with our audience? We're super enthusiastic about 300X.

Satya announced that the MI300X VMs were going to be available in Azure. It's really, really exciting right now seeing the bring up of GPT-4 on MI300X, seeing the performance of LLlama 2, getting it rolled into production. The thing that I'm excited here today is we will have the MI300X VMs in preview available today. I completely agree with you. The thing that's so exciting about AI is every day we discover something new and we're learning that together.

Kevin, we're so honored to be Microsoft's partner in AI. Thank you for all the work that your teams have done, that we've done together. We look forward to a lot more progress. Likewise. Thank you very much.

All right, so look We certainly do learn a tremendous amount every day and we're always pushing the envelope. Let me talk to you a little bit about how we bring more people into our ecosystem. When I talk about the Instinct platform, you have to understand our goal has really been to enable as many customers as possible to deploy Instinct as fast and as simply as possible. To do this, we really adopted industry standards. We built the Instinct platform based on an industry standard OCP server design. I'd actually like to show you what that means because I don't know if everyone understands.

Let's bring her out. Her or him? Let me show you the most powerful gen AI computer in the world. Those of you who follow our shows know that I'm usually holding up a chip, but we've shown you the MI300X chip already, so we thought it would be important to show you just what it means to do generative AI at a system level. What you see here is eight MI300X GPUs and they're connected by our high-performance Infinity fabric in an OCP-compliant design.

What makes that special? This board actually drops right into any OCP-compliant design, which is the majority of AI systems today. We did this for a very deliberate reason. We want to make this as easy as possible for customers to adopt so you can take out your other board and put in the MI300X Instinct platform. If you take a look at the specifications, we actually support all of the same connectivity and networking capabilities of our competition, so PCI Gen 5, support for 400 gig ethernet, that 896 gigabytes per second of total system bandwidth, but all of that is with 2.4 times

more memory and 1.3 times more compute server than the competition. That's really why we call it the most powerful gen AI system in the world. Now, I've talked about some of the performance in AI workloads, but I want to give you just a little bit more color on that. When you look at deploying servers at scale, it's not just about performance.

Our customers are also trying to optimize power, space, CapEx and OpEx, and that's where you see some really nice benefits of our platform. When you compare our Instinct platform to the competition, I've already showed you that we deliver comparable training performance and significantly higher inference performance, but in addition, what that memory capacity and bandwidth gives us is that customers can actually either run more models, if you're running multiple models on a given server, or you can run larger models on that same server. In the case where you're running multiple different models on a single server, the Instinct platform can run twice as many models for both training and inference than the competition.

On the other side, if what you're doing is trying to run very large models, you'd like to fit them on as few GPUs as possible. With the FP16 data format, you can run twice the number of LLMs on a single MI300X server compared to our competition. This directly translates into lower CapEx, and especially if you don't have enough GPUs, this is really, really helpful. So, to talk more about MI300X and how we're bringing it to market, let me bring our next guest to the stage. Oracle Cloud and AMD have been engaged for many, many years in bringing great computing solutions to the cloud. Here to tell us more about our work together is Karan Batta, Senior Vice President at Oracle Cloud Infrastructure.

Hey, Karan. Hi, Lisa. Thank you so much for being here. Thank you for your partnership.

Can you tell us a little bit about the work that we're doing together? Yeah, thank you. Excited to be here today. Oracle and AMD have been working together for a long, long time, right, since the inception of OCI back in 2017. And so, we've launched every generation of EPYC as part of our bare metal compute platform, and it's been so successful, customers like Red Bull as an example. And we've expanded that across the board for all of our portfolio of past services like Kubernetes, VMware, et cetera. And then we are also collaborating on Pensando DPUs, where we offload a lot of that logic so that customers can get much better performance, flexibility.

And then, you know, earlier this year, we also announced that we're partnering with you guys on Exadata, which is a big deal, right? So, we're super excited about our partnership with AMD, and then what's to come with 300X? Yeah. We really appreciate OCI has really been a leading customer as we talk about how do we bring new technology into Oracle Cloud. Now, you're spending a lot of time on AI as well. Tell us a little bit about your strategy for AI and how we fit into that strategy. Absolutely.

You know, we're spending a lot of time on AI, obviously. Everyone is. We are. Everybody is.

It's the new thing. You know, we're doing that across the stack, from infrastructure all the way up to applications. Oracle is an applications company as well.

And so, we're doing that across the stack, but from an infrastructure standpoint, we're investing a lot of effort into our core compute stack, our networking stack. We announced clustered networking. And what I'm really excited to announce is that we're going to be supporting MI300X as part of that bare-metal compute stack. We are super thrilled about that partnership. We love the fact that you're going to have 300X. I know your customers and our customers are talking to us every day about it.

Tell us a little bit about what customers are saying. Yeah, we've been working with a lot of customers. Obviously, we've been collaborating a lot at the engineering level as well with AMD. And you know, customers are seeing incredible results already from the previous generation. And so, I think that will actually carry through with the 300X.

And so much so that we're also excited to actually support MI300X as part of our generative AI service that's going to be coming up live very soon as well. So, we're very, very excited about that. We're working with some of our early customer adopters like Naveen from Databricks Mosaic.

So, we're very excited about the possibility. We're also very excited about the fact that the ROCm ecosystem is going to help us continue that effort moving forward. So, we're very pumped. That's wonderful. Karan, thank you so much.

Thank your teams. We're so excited about the work we're doing together and look forward to a lot more. Thank you, Lisa.

Thank you. Now, as important as the hardware is, software actually is what drives adoption. And we have made significant investments in our software capabilities and our overall ecosystem. So, let me now welcome to the stage AMD President Victor Peng to talk about our software and ecosystem progress.

Thank you, Lisa. Thank you. And good morning, everyone. You know, last June at the AI event in San Francisco, I said that the ROCm software stack was open, proven, and ready.

And today, I'm really excited to tell you about the tremendous progress we've made in delivering powerful new features as well as the high performance on ROCm. And how the ecosystem partners have been significantly expanding the support for Instinct GPUs and the entire product portfolio. Today, there are multiple tens of thousands of AI models that run right out of the box on Instinct. And more developers are running on the MI250, and soon they'll be running on the MI300.

So we've expanded deployments in the data center, at the edge, in client, embedded applications of our GPUs, CPUs, FPGAs, and adaptive SoCs, really end to end. And we're executing on that strategy of building a unified AI software stack so any model, including generative AI, can run seamlessly across an entire product portfolio. Now, today, I'm going to focus on ROCm and the expanded ecosystem support for our Instinct GPUs. We architected ROCm to be modular and open source to enable very broad user accessibility and rapid contribution by the open source community and AI community.

Open source and ecosystem are really integral to our software strategy, and in fact, really open is integral to our overall strategy. This contrasts with CUDA, which is proprietary and closed. Now, the open source community, everybody knows, moves at the speed of light in deploying and proliferating new algorithms, models, tools, and performance enhancements.

And we are definitely seeing the benefits of that in the tremendous ecosystem momentum that we've established. To further accelerate developer adoption, we recently announced that we're going to be sporting ROCm on our Radeon GPUs. This makes AI development on AMD GPUs more accessible to more developers, start-ups, and researchers. So our foot is firmly on the gas pedal with driving the MI300 to volume production and our next ROCm release. So I'm really super excited that we'll be shipping ROCm 6 later this month.

I'm really proud of what the team has done with this really big release. ROCm 6 has been optimized for gen AI, particularly large language models, has powerful new features, library optimizations, expanded ecosystem support, and increases performance by factors. It really delivers for AI developers.

ROCm 6 supports FP16, BF16, and the new FP8 data pipes for higher performance while reducing both memory and bandwidth needs. We've incorporated advanced graph and kernel optimizations and optimized libraries for improved efficiency. We're shipping state-of-the-art attention algorithms like FlashAttention-2, page attention, which are critical for performance in LLMs and other models. These algorithms and optimizations are complemented with a new release of rCCL, our collective communications library for efficient, very large-scale GPU deployments. So look, the bottom line is ROCm 6 delivers a quantum leap in performance and capability. Now I'm going to first work you through the inference performance gains you'll see with some of these optimizations on ROCm 6.

So for instance, running a 70 billion Llama 2 model, page attention and other algorithms speed up the token generation by paging attention keys and values, delivering 2.6x higher performance. HIP graph allows processing to be defined in graphs rather than single operations, and that delivers a 1.4x speed up. FlashAttention, which is widely used kernel for very high-performance LLL performance, delivers 1.3x speed up. So all those optimizations together deliver an 8x speed up on the MI300x with ROCm 6 compared to the MI250 and ROCm 5.

That's 8x performance in a single generation. So this is one of those huge benefits we provide to customers with this great performance improvement with the MI300x. So now let's look at it from a competitive perspective. Lisa had highlighted the performance of large models running on multiple GPUs. What I'm sharing here is how the performance of smaller models running on single GPUs, in this case the 13 billion Llama 2 model.

The MI300x and ROCm 6 together deliver 1.2x higher performance than the competition. So this is the reason why our customers and our partners are super excited about creating the next innovations in AI on the MI300x. So we're relentlessly focused on delivering leadership technology and very comprehensive software support for AI developers. And to fuel that drive, we've been significantly strengthening our software teams through both organic and inorganic means, and we're expanding our ecosystem engagements. So we recently acquired and Mipsology.

Nod brings world-class expertise in open source compilers and runtime technology. They've been instrumental in the MLIR compiler technology as well as in the communities. And as part of our team, they are significantly strengthening our customer engagements and they're accelerating our software development plans.

Mipsology also strengthens our capabilities and they're especially in delivering to customers in very AI-rich applications like autonomous vehicles and industrial automation. So now let me turn over to the ecosystem. In addition to working closely with the ecosystem, oh, sorry. We announced that we had the partnership with Hugging Face just last June.

Today they have 62,000 models running daily on Instinct platforms. And in addition, we've worked closely on getting these LLM optimizations as part of their optimal library and toolkit. Our partnership with PyTorch Foundation has also continued to thrive with CI/CD pipelines and validation, enabling developers to target our platforms directly. And we continue to make very significant contributions to all the major frameworks, including upstream support for AMD GPUs in JAX, OpenXLA, QPI, and even initiatives like Deep Speed for Science. Just yesterday, the AI Alliance was announced with over 50 founding members that also include AMD, IBM, and Meta and other companies. And I'm really delighted to share some very late-breaking news.

AMD GPUs, including the MI300, will be supported in the standard OpenAI Triton distribution starting with the 3.0 release. We're really thrilled to be working with Philippe Tillet, who created Triton, and the whole OpenAI team. AI developers using the OpenAI Triton are more productive working at a higher level of design abstraction, and they still get really excellent performance. This is great for developers and aligned with our strategy to empower developers with powerful and open software stacks and GPU platforms. This is in contrast to the much greater effort developers would need to invest working at a much lower level abstraction in order to eke out performance. Now I've shared a lot with you about the progress we made on software, but the best indication of the progress we've really made are the people who are using our software and GPUs and what they're saying.

So it gives me great pleasure to have three AI luminaries and entrepreneurs from Databricks, essential AI, and Lamini to join me on stage. Please give a very warm welcome to Ion Stoica, Ashish Vaswani, and Sharon Zhou. Great. Welcome, Ion, Ashish, and Sharon.

Thank you so much for joining us here. Really appreciate it. So I'm gonna ask each of you a bit about first with the mission of your company and share about the innovations you're doing with our GPUs and software and what the experience has been like. So Ion, let me start with you. Now you're also not only founder of Databricks, but you're on the staff of the department of UC Berkeley, director of Sky Computing Labs, and also you've been world with AnyScale and many AI startups. So maybe you could talk about your engagement with AMD as well as your experience in the MI200 and MI300.

Yeah, thank you very much. Very glad to be here. And yes, indeed, I collaborated with AMD wearing multiple hats, director of a Sky Computing Lab at Berkeley, which AMD is supporting, and also founders of AnyScale and Databricks. And in all my work over the year, one thing I really focus on is democratizing the access to AI. What this means, it's improving the scale, performance, and cost, reducing the cost, to run these large AI applications, which means everything from AI workloads, everything from training, fine-tuning, inference, and generative AI applications. Just to give you some examples, we developed VLLM, which is arguably now the most popular open-source inference engines for LLMs.

We have developed Ray, another open-source framework which is used to distribute machine learning workloads. Ray has been used by OpenAI to train ChatGPT. And more recently, Sky Computing, one of the projects there is SkyPilot, which helps you to run your applications or machine learning applications and workloads across multiple clouds. And why do you want to do that? It's because you want to alleviate the scarcity of the GPUs and reduce the costs. Now, when it comes to our collaborations, we collaborate on all these kind of projects.

And one thing which was a very pleasant surprise is that it was very easy to run and include ROCm in our stack. It really runs out of the box from day one. Of course, you need to do more optimization for that. And this is what we are doing and we are working on. So for instance, we had the support for MI250 and to Ray. And we are working, actually, collaborating with AMD, like I mentioned, to optimize the inference for VLLM, again, running on MI250 and MI300X.

And from the point of view of SkyPilot, we're really looking forward to have more and more of MI250s and MI300X in various clouds. So we have more choices. It sounds great. Thank you so much for all the collaboration across all those clouds.

Ashish why don't you tell us about Essential's mission and also your experience with ROCm and Instinct? Thank you. Great to be here, Victor. Essential, we're really excited. We're really excited to push the boundaries of human-machine partnership in enterprises. We should be able to do it. We're at the beginning stages where we'll be able to do 10x or 50x more than what we can just do by ourselves today.

So we're extremely excited. And what that's going to take, I believe it's going to be a full-stack approach. So you're building the models, serving infrastructure, but more importantly, understanding workflows in enterprises today and giving people the tools to configure these models, teach these models to configure them for their workflows end to end. And so the models learn with feedback. They get better with feedback.

They get smarter. And then they're eventually able to even guide non-experts to do tasks they were not able to do. We're really excited.

And we actually were lucky to start to benchmark the 250s earlier this year. And hey, we want to solve a couple of hard problems, scientific problems. And we were like, hey, are we going to get long context and check? OK, so are we going to be able to trade larger models? Are we able to serve larger models and smaller chips? And so as we saw, and the ease of using the software was also very pleasant. And then we saw how things were progressing. For example, I think in two months, I believe, FlashAttention, which is a critical component to actually scale to longer sequences, appeared, so it was generally very happy and just impressed with the progress and excited about the chips. Thanks so much, Ashish.

And Sharon. So Sharon, Lamini has a very innovative business model and working with enterprise for their private models. Why don't you share the mission and how the experience with AMD has been? Yeah, thanks, Victor. So by way of quick background, Sharon, co-founder CEO of Lamini, most recently, I was a computer science faculty at Stanford leading a research group in generative AI.

I did my PhD there also under Andrew Ng and teach about a quarter million students and professionals online in generative AI. And I left Stanford to pursue Lamini and co-found Lamini on the premise of making the magical, difficult, expensive pieces of building your own language model inside an enterprise extremely accessible, easy to use so that companies who understand their domain-specific problems best can be the ones who can actually wield this technology and, more importantly, fully own that technology. In just a few lines of code, you can run an LLM and be able to imbue it with knowledge from millions of documents, which is 40,000 times more than hitting Claude 2 Pro on that API.

So just a huge amount of information can be imbued into this technology using our infrastructure. And more importantly, our customers get to fully own their models. For example, NordicTrack, one of our customers that makes all the ellipticals and treadmills in the gym, parent companies, iFit, they have over 6 million users on their mobile app platform. And so they're building an LLM that can actually create this personal AI fitness coach imbued with all the knowledge they have in-house on what a good fitness coach is.

And it turns out it's actually not a professional athlete. They tried to hire Michael Phelps, did not work. So they have real knowledge inside of their company and they're imbuing the LLM with that so that we can all have personal fitness trainers. So we're very excited to be working with AMD. We actually have had a cloud, AMD cloud, in production for over the past year on MI200, so MI210, MI250s. And we're very excited about the MI300s.

And I think something that's been super important to us is that with Lamini software, we've actually reached software parity with CUDA on all the things that matter with large language models, including inference and training. And I would say even beyond CUDA. We have reached beyond CUDA for things that matter for our customers.

So that's including higher memory, higher memory or higher capacity means bigger models. And our customers wanna be able to build bigger and more capable models. And then a second point, which Lisa kind of touched on earlier today is, these machines, these chips can actually, given higher bandwidth, be able to return results with lower latency, which matters for the user experience, certainly a personal fitness coach, but for all of our customers as well. Super exciting, that's great. Great. So, Ion back to you, changing this up a little bit.

So, you heard several key components of ROCm is open source. And we did that for rapid adoption and also getting better, more enhancements from the community, both open source and AI. So what do you think about this strategy and how do you think this approach might help some of the companies that you've founded? So obviously, given my history, really love the open source.

I love the open source ecosystem. And we try to do over time to do our own contribution, bring out, and I think that one thing to note is that many of the generative AI tools today are open source. And we are talking here about Hugging Face, about PyTorch, Triton, like I mentioned, BLM, Drey, and many others. And many of these tools actually can run today on AMD and ROCm, stack today. And this makes ROCm another key component of the open source ecosystem. And I think this is great.

And it's, in time, I'm sure that actually quite fast. It's like the community will take advantage of the unique capabilities of the AMDs, MI250 and MI300X to innovate and to improve the performance of all these tools which are running at a higher level of the generative AI stack. Great, and that's our purpose and aim, so I'm glad to hear that.

So I'm gonna, out of order execution, jump over to Sharon. So Sharon, what do you think about how AI workloads are evolving in the future? And what do you think, GPU Instincts, since you have great experience with it and ROCm can play in that future of AI development? Okay, so maybe a bit of a spicy take. I think that GOFAI, good old-fashioned AI, is not the future of AI. And I really do think it's LLMs, or some variant of LLMs of these models that can actually be able to soak up all this general knowledge that is missing from these traditional algorithms. And we've seen this across so many different algorithms in our customers already.

Those who are even at the bleeding edge of recommendation systems, forecasting systems, classification, are even using this because of that general knowledge that it's able to learn. So I think that's the future. It's maybe more known as Software 2.0, coined by my friend, Andre Karpathy. And I really do think Software 2.0,

which is hitting these models time and time again, instead of writing really extensive software inside a company, we'll be supporting enterprises 2.0, meaning enterprises of the future, of the next generation. And I think the AMD Instinct GPUs are critical to basically supporting, ubiquitously supporting the Software 2.0 of the future. And we absolutely need compute to be able to run these models efficiently, to run lots of these models, more of these models, and larger models with greater capabilities. So overall, very excited with the direction of not only these AI workloads, but also the direction that AMD is taking in doubling down on these MI300s that, of course, can take on larger models and more capable models for us.

Awesome. So Ashish, we'll finish up with you and I'll give you the same kind of question. So where do you think about the future of AI workloads and how do you think our GPUs and ROCm and can play and how you're driving things at Essential? Yep. So I think that we have to improve reasoning and planning to solve these complex tasks, like take an analyst and if they actually, they want to absorb an earnings call and figure out how they should revise their opinion and whether to invest in a company or what recommendations that they should provide. It's actually gonna take, it's gonna take multiple reasoning over multiple steps.

It's gonna take ingesting a large document and being able to extract information from it, apply their models, actually ask for information when they don't have any, get world knowledge, but also maybe have some reasoning and some outside reasoning and planning there. And then for all these sort of, so when I look at the MI300 with very large HBM and high memory bandwidth, I think of what's gonna be unlocked and which capabilities are going to be improved and what new capabilities will be available. So I mean, even with what we have today, just imagine a world where you can process long documents or you can make these models much more accurate by adding more examples in the prompt. But imagine just complete user sessions that you can maintain and model state, how they would actually improve the end-to-end user experience, right? And I think that we're moving to a kind of architecture where what typically is to happen in inference, a lot of search is now gonna go into training where the models are gonna explore thousands of solutions and eventually pick one that's actually the best option for the goal, the best solution for the goal. And that's good, and definitely the large HBM and high bandwidth is gonna not only be important for serving large models with low latency for better end-to-end experience, but also for some of these new techniques that we're just exploring that are gonna improve the capabilities of these models.

So very excited about the new chip and what it's gonna unlock. Great, thank you, Ashish. Ion, Ashish, Sharon, this has been really terrific. Thank you so much for all the great insights you have provided us. Thank you. And thank you for joining us today.

Thank you. Thank you. Thank you. Thank you. It's just so exciting to hear what companies like Databricks, Essential AI, Lamini are achieving with our GPUs and just super thrilled that their experience with our software has been so smooth and really a delight. So you can tell, they see absolutely no barriers, right? And they're extremely motivated to innovate on AMD platforms.

Okay, to sum it up, what we delivered over the past six months is empowering developers to execute their mission and realize their vision. We'll be shipping ROCm 6 very soon. It's optimized for LLMs and together with the MI300X, it's gonna deliver 8X gen-on-gen performance improvement and it's higher performance in inference than the competition. We have 62,000 models running on Instinct today and more models will be running on the MI300 very soon.

We have very strong momentum, as you can see in the ecosystem, adding open AI training to our extensive list of NG standard frameworks, models, runtimes and libraries. And you heard from the panels, right? Our tools are proven and easy to use. Innovators are advancing the state of the art of AI on AMD GPUs today. ROCm 6 and the MI300X will drive an inflection point in developer adoption, I'm confident of that. We're empowering innovators to realize the profound benefits of pervasive AI faster on AMD.

Thank you. And now I'd like to invite Lisa back on the stage. Thank you, Victor. And weren't those innovators great? I mean, you love the energy and just all of the thought there. So look, as you can see, the team has really made great, great progress with ROCm and our overall software ecosystem.

Now, I said I wanted though, we really want broad adoption for MI300X. So let's go through and talk to some additional customers and partners who are early adopters of MI300X. Our next guest is a partner really at the forefront of GenAI innovation and working across models, software and hardware. Please welcome Ajit Matthews of Meta to the stage.

Hello, Ajit, it's so nice of you to be here. We're incredibly proud of our partnership together. Meta and AMD have been doing so much work together. Can you tell us a little bit about Meta's vision in AI? Cause it's really broad and key for the industry.

Absolutely, thanks Lisa. We are excited to partner with you and others and innovate together to bring generative AI to people around the world at scale. Generative AI is enabling new forms of connection for people around the world, giving them the tools to be more creative, expressive and productive. We are investing for the future by building new experiences for people across our services and advancing open technologies and research for the industry.

We recently launched AI stickers, image editing, Meta AI, which is our AI assistant that spans our family of apps and devices and lots of AIs for people to interact within our messaging platforms. In July, we opened access to our Llama 2 family of models and as you've seen it, have blown away by the reception from the committee who have built some truly amazing applications on top of them. We believe that an open approach feeds to better and safer technology in the long run as we have seen from our involvement in the PyTorch Foundation, Open Compute Project and across dozens of previous AI models and data set releases.

We're excited to have partnered with the industry on our generative AI work, including AMD. We have a shared vision to create new opportunities for innovation in both hardware and software to improve the performance and efficiency of AI solutions. That's so great, Ajit. We completely agree with the vision. We agree with the open ecosystem and that really being the path to get all of the innovation from all the smart folks in the industry. Now, we've collaborated a lot on the product front as well, both EPYC and Instinct.

Can you talk a little bit about that work? Yeah, absolutely. We have been working together on EPYC CPUs since 2019 and most recently deployed Genoa and Bergamo-based servers at scale across Meta's infrastructure where it now serves many diverse workloads. But our partnership is much broader than EPYC CPUs and we have been working together on Instinct GPUs starting since the MI100 in 2020. We have been benchmarking ROCm and working together on improvements for its support in PyTorch across each generation of AMD Instinct GPU, leading up to MI300X now.

Over the years, ROCm has evolved, becoming a competitive software platform due to optimizations and ecosystem growth. AMD is a founding member of PyTorch foundations and has made significant commitment to PyTorch investment providing day zero support for PyTorch 2.0 with ROCm, Torch.compile, Torch.export, all of those things are great. We have seen tremendous progress on both Instinct GPU performance and ROCm maturity and are excited to see ecosystem support grow beyond PyTorch 2.0, like to open AI Triton, today's announcement with respect to being a default backend of AMD, that's great, FlashAttention-2 is great, Hugging Face, great, and other industry frameworks.

All of these are great partnerships. It really means a lot to hear you say that, Ajit. I think we also view that it's been an incredible partnership. I think the teams work super closely together, that's what you need to do to drive innovation. And the work with PyTorch Foundation is foundational for AMD, but really the ecosystem as well.

But our partnership is very exciting right now with GPUs, so can you talk a little bit about the 300X plans? Oh, here we go. We are excited to be expanding our partnership to include Instinct MI300X GPUs in our data centers for AI inference workings. Thank you, so much. So, just to give you a little background, MI300X leverages the OCP accelerator module, standard and platform, which has helped us adopt in record time. In fact, MI300X is trending to be one of the fastest designed-to-deployment solutions in the Meta history. We have also had a great experience with ROCm, and the performance is able to deliver with MI300X.

The optimizations and the ecosystem growth over the years have made ROCm a competitive software platform. As model parameters increase and the Llama family of models continues to grow in size and power, which it will, the MI300X with its 192 GB of memory and higher memory bandwidth meets the expanding requirements for large language model inference. We are really pleased with the ROCm optimizations that AMD has done, focused on the Llama 2 family of models on MI300X. We are seeing great, promising performance numbers, which we believe will benefit the industry. So, to summarize, we are thrilled with our partnership and excited about the capabilities offered by the MI300X and the ROCm platform as we start to scale their use in our infrastructure for production workloads.

That is absolutely fantastic, Ajit. Thank you, Lisa. Thank you so much. We are thrilled with the partnership and we look forward to seeing lots of MI300Xs in your infrastructure.

So, thank you for being here. That's good. Thank you. So, super exciting. We said cloud is really where a lot of the infrastructure is being deployed, but enterprise is also super important.

So, when you think about the enterprise right now, many enterprises are actually thinking about their strategy. They want to deploy AI broadly across both cloud and on-prem, and we're working very closely with our OEM partners to bring very integrated enterprise AI solutions to the market. So, to talk more about this, I'd like to invite one of our closest partners to the stage, Arthur Lewis, President of Dell Technologies Infrastructure Solutions Group. Hey, welcome, Arthur. I'm so glad you could join us for this event. And Dell and AMD have had such a strong history of partnership.

I actually also think, Arthur, you have a very unique perspective of what's happening in the enterprise, just given your purview. So, can we just start with giving the audience a little bit of a view of what's happening in enterprise AI? Yeah, Lisa, thank you for having me today. We are at an inflection point with artificial intelligence. Traditional machine learning and now generative AI is a catalyst for much greater data utilization, making the value of data tangible and therefore quantifiable. Data, as we all know, is growing exponentially.

A hundred terabytes of data was generated last year, more than doubling over the last three years. And IDC projects that data will double again by 2026. And it is clear that data is becoming the world's most valuable asset. And this data has gravity. 83% of the world's data resides on-prem, and much of the new data will be generated at the edge.

Yet customers are dealing with years of rapid data growth, multiple copies on-prem across clouds, proliferating data sources, formats, and tools. These challenges, if not overcome, will prevent customers from realizing the full potential of artificial intelligence and maximizing real business outcome. Today, customers are faced with two suboptimal choices. Number one, stitch together a complex web of technologies and tools and manage it themselves, or two, replicate their entire data estate in the public cloud. Customers need and deserve a better solution.

Our job is to bring artificial intelligence to the data. That's great perspective, Arthur. And that 83% of the data and where it resides, I think, is something that sticks in my mind a lot. Now let's move to a little bit of the technology. I mean, we've been partnering together to bring some great solutions to the market. Tell us more about what you have planned from a tech standpoint.

Well, today's an exciting day. We are announcing a much-anticipated update to the family of our PowerEdge 9680, the fastest growing product in Dell ISG history, with the addition of AMD's Instinct MI300X Accelerator for artificial intelligence. Effective today, we are going to be able to offer a new configuration of eight MI300X accelerators, providing 1.5 terabytes of coherent HBM3 memory, delivering bandwidth of 5.3 terabytes per server. This is an unprecedented level of performance in the industry and will allow customers to consolidate large language model inferencing onto a fewer number of services, while providing for training at scale, while also reducing complexity, cost, and data center footprint.

We are also leveraging AMD's Instinct Infinity Platform, which provides a unified fabric for connecting multiple GPUs within and across servers, delivering near linear scaling and low latency for distributed AI. Further, and there's more. Through our collaboration with AMD on software and open source frameworks, which Lisa, you talked a lot about today, including PyTorch and TensorFlow, we can bring seamless services for customers and out-of-the-box LLM experience. We talked about making it simple.

This makes it incredibly simple. And we've also optimized the entire stack with Dell storage, specifically power scale and object scale, providing ultra low latency ethernet fabrics, which are designed specifically to deliver the best performance and maximum throughput for generative AI training and inferencing. This is an incredibly exciting step forward. And again, effective today, Lisa, we're open for business, we're ready to quote, and we're taking orders. I like the sound of that.

Look, it's so great to see how this all comes together. Our teams have been working so closely together over the last few years and definitely over the last year. Tell us though, there's a lot of co-innovation and differentiation in these solutions.

So just tell us a little bit more about that. Well, our biggest differentiator is really the breadth of our technology portfolio at Dell Technologies. Products like power scale, which is our one file system for unstructured data storage, has been helping customers in industries like financial services, manufacturing, life sciences, to help solve the world's most challenging problems for decades as the complexity of their workflows and scale of their data estate increases.

And with AMD, we are bringing these components together with open networking products and AI fabric solutions, taking the guesswork out of building tailored gen AI solutions for customers of all sizes, again, making it simple. We have both partnered with Hugging Face to ensure transformers and LLMs for generative AI don't just work for our combined solutions but are optimized for AMD's accelerators and easy to configure and size for workloads with our products. And in addition to that, Dell validated designs, we have a comprehensive set and a growing array of services and offerings that can be tailored to meet the needs of customers looking for a complimentary gen AI strategy consultation all the way up to and fully managed solution for generative AI. That's fantastic, Arthur.

Great set of solutions, love the partnership and love what we can do for our enterprise customers together. Thank you so much for being here. Thank you for having me, Lisa. Yeah.

Our next guest is another great friend. Supermicro and AMD have been working together to bring leadership computing solutions to the market for many years based on AMD EPYC processors as well as Instinct accelerators. Here to tell us more about that, please join me in welcoming CEO Charles Liang to the stage. Congratulations.

Thank you so much. Hello, Charles. For a successful launch.

Yeah, thank you so much for being here. I mean, Supermicro is really well known for building highly optimized systems for lots of workloads. We've done so much together. Can you share a little bit about how you're approaching gen AI? Thank you. Because our building block solution based on a modularized design. So that enables Supermicro to design product quicker than others and deliver product to customer also quicker, better leverage inventory and better for service.

And thank you for our close relationship. Thank you for all I have. So that's why we are able to design product time to market as soon as possible. Well, I really appreciate that our teams also work very closely together. And we now know that everybody is calling us for AI solutions. You've built a lot of AI infrastructure.

What are you seeing in the market today? Oh, the market continues to grow very fast. The only limitation is- Very fast, right? Very fast. Maybe more than very fast. So all we need is just more chips.

I know. So today, including USA, Netherlands, Taiwan and Malaysia, we have more than 4,000 rack per month capacity and customer facing to no enough power, no enough space problem. So with our rack-scale building block solution, with free air cooling, optimized for hybrid air and free air cooling, optimized for liquid cooling, that can have customer safe energy power up to 30 to even 40%. And that allow customer to install more system with fixed power budget and all same power, same system, but less energy cost. So all of those, together with our rack-scale building block solution, we installed a whole rack, including generative CPU, GPU, and storage, switch, firmware, management software, security function. And when we shift to customer, customer just simply plug in two cable, power cable, data cable, and then ready to run, ready to online.

For liquid cooling customer, for sure they need a water kind of tube. So that make a customer can easily online with one chip available. Yeah, no, that's fantastic. Thank you, Charles.

Now, let's talk a little bit about MI300X. What do you have planned for MI300? Okay, the big product. We have a product based on MI300X, like 8U for air cooler, or for the air cooler. And then 4U optimize for liquid cooler. So the air cooler per rack, we support up to 40 kW or 50 kW.

For liquid cooler, we support up to 80 kW or 100 kW. And so all kind of rack-scale plug and play. So when customer need, once we have chip, we can ship the customer quicker. That sounds wonderful.

Well, look, we appreciate all the partnership, Charles, and we will definitely see a lot of opportunity to collaborate together on the generative AI. So thank you so much. Thank you so much. Thank you.

Okay, now let's turn to our next guest. Lenovo and AMD have a broad partnership as well that spans from data center to workstations and PCs, and now to AI. So here to tell us about this special partnership, please welcome to the stage, Kirk Skaugen, EVP and President of Infrastructure Solutions Group at Lenovo. Hello, Kirk. Thank you so much for being here. We truly appreciate the partnership with Lenovo.

You have a great perspective as well. Tell us about your view of AI and what's going on in the market. Sure. Well, AI is not new for Lenovo. We've been talking and innovating around AI for many years. We just had a great supercomputing where we're the number one supercomputer provider to the top 500, and we're proud that IDC just ranked us number three AI server infrastructure in the world as well.

So it's not new to us, but you are at Tech World, so thanks for joining us in Austin. We're trying to help shape the future of AI from the pocket to the edge to the cloud, and we've had this kind of concept of AI for all. So what does that mean? Pocket meaning Motorola, smartphone, AI devices, and then all the way to the cloud with our ODM Plus model. So our collaboration with our customers is really to

2023-12-09 18:09

Show Video

Other news