GTC 2021 Keynote with NVIDIA CEO Jensen Huang

GTC 2021 Keynote with NVIDIA CEO Jensen Huang

Show Video

I am a creator Blending art and technology To immerse our senses I am a healer Helping us take the next step And see what's possible I am a pioneer Finding life-saving answers And pushing the edge to the outer limits. I am a guardian Defending our oceans Magnificent creatures that call them home I am a protector Helping the earth breathe easier And watching over it for generations to come I am a storyteller Giving emotion to words And bringing them to life. I am even the composer of the music. I am AI Brought to life by NVIDIA, deep learning, and brilliant minds everywhere.

There are powerful forces shaping the world's industries. Accelerated computing that we pioneered has supercharged scientific discovery, while providing the computer industry a path forward. Artificial intelligence particularly, has seen incredible advances.

With NVIDIA GPUs computers learn, and software writes software no human can. The AI software is delivered as a service from the cloud, performing automation at the speed-of-light. Software is now composed of microservices that scale across the entire data center - treating the data center as a single-unit of computing. AI and 5G are the ingredients to kick start the 4th industrial revolution where automation and robotics can be deployed to the far edges of the world. There is one more miracle we need, the metaverse, a virtual world that is a digital twin of ours. Welcome to GTC 2021 - we are going to talk about these dynamics and more.

Let me give you the architecture of my talk. It's organized in four stacks - this is how we work - as a full-stack computing platform company. The flow also reflects the waves of AI and how we're expanding the reach of our platform to solve new problems, and to enter new markets. First is Omniverse - built from the ground-up on NVIDIA's body of work. It is a platform to create and simulate virtual worlds.

We'll feature many applications of Omniverse, like design collaboration, simulation, and future robotic factories. The second stack is DGX and high-performance data centers. I'll feature BlueField, new DGXs, new chips, and the new work we're doing in AI, drug discovery, and quantum computing.

Here, I'll also talk about Arm and new Arm partnerships. The third stack is one of our most important new platforms - NVIDIA EGX with Aerial 5G. Now, enterprises and industries can do AI and deploy AI-on-5G. We'll talk about NVIDIA AI and Pre-Trained Models, like Jarvis Conversational AI.

And finally, our work with the auto industry to revolutionize the future of transportation - NVIDIA Drive. We'll talk about new chips, new platforms and software, and lots of new customers. Let's get started. Scientists, researchers, developers, and creators are using NVIDIA to do amazing things. Your work gets global reach with the installed base of over a billion CUDA GPUs shipped and 250 ExaFLOPS of GPU computing power in the cloud. Two and a half million developers and 7,500 startups are creating thousands of applications for accelerated computing.

We are thrilled by the growth of the ecosystem we are building together and will continue to put our heart and soul into advancing it. Building tools for the Da Vincis of our time is our purpose. And in doing so, we also help create the future. Democratizing high-performance computers is one of NVIDIA's greatest contributions to science. With just a GeForce, every student can have a supercomputer. This is how Alex Krizhevsky, Ilya, and Hinton trained AlexNet that caught the world's attention on deep learning. And with GPUs in supercomputers, we gave scientists a time machine.

A scientist once told me that because of NVIDIA's work, he can do his life's work in his lifetime. I can't think of a greater purpose. Let me highlight a few achievements from last year. NVIDIA is continually optimizing the full stack. With the chips you have, your software runs faster every year and even faster if you upgrade. On our gold suite of important science codes, we increased performance 13-fold in the last 5 years, and for some, performance doubled every year.

NAMD molecular dynamics simulator, for example, was re-architected and can now run across multiple GPUs. Researchers led by Dr. Rommie Amaro at UC San Diego, used this multi-GPU NAMD, running on Oak Ridge Summit supercomputer's 20,000 NVIDIA GPUs, to do the largest atomic simulation ever - 305 million atoms. This work was critical to a better understanding of the COVID-19 virus and accelerated the making of the vaccine. Dr. Amaro and her collaborators won the Gordon Bell Award for this important work. I'm very proud to welcome Dr. Amaro and more than 100,000 of you to this year's GTC - our largest-ever by double.

We have some of the greatest computer scientists and researchers of our time speaking here. 3 Turing award winners, 12 Gordon Bell award winners, 9 Kaggle Grand Masters - and even 10 Oscar winners. We're also delighted to have the brightest minds from industry sharing their discoveries.

Leaders from every field - healthcare, auto, finance, retail, energy, internet services, every major enterprise IT company. They're bringing you their latest work in COVID research, data science, cybersecurity, new approaches to computer graphics and the most recent advances in AI and robotics. In total, 1,600 talks about the most important technologies of our time, from the leaders in the field that are shaping our world. Welcome to GTC. Let's start where NVIDIA started...computer graphics.

Computer graphics is the driving force of our technology. Hundreds of millions of gamers and creators each year seek out the best NVIDIA has to offer. At its core, computer graphics is about simulations - using mathematics and computer science to simulate the interactions of light and material, the physics of objects, particles, and waves; and now simulating intelligence and animation. The science, engineering, and artistry that we dedicate in pursuit of achieving mother nature's physics has led to incredible advances. And allowed our technology to contribute to advancing the basic sciences, the arts, and the industries. This last year, we introduced the 2nd generation of RTX - a new rendering approach that fuses rasterization and programmable shading with hardware-accelerated ray tracing and artificial intelligence. This is the culmination of ten years of research.

RTX has reset computer graphics, giving developers a powerful new tool just as rasterization plateaus. Let me show you some amazing footage from games in development. The technology and artistry is amazing. We’re giving the world’s billion gamers an incredible reason to upgrade. RTX is a reset of computer graphics. It has enabled us to build Omniverse - a platform for connecting 3D worlds into a shared virtual world.

Ones not unlike the science fiction metaverse first described by Neal Stephenson in his early 1990s novel, Snow Crash, where the metaverse would be collectives of shared 3D spaces and virtually-enhanced physical spaces that are extensions of the internet. Pieces of the early-metaverse vision are already here - massive online social games like Fortnite or user-created virtual worlds like Minecraft. Let me tell you about Omniverse from the perspective of two applications - design collaboration and digital twins. There are several major parts of the platform. First, the Omniverse Nucleus, a database engine that connects users and enables the interchange of 3D assets and scene descriptions.

Once connected, designers doing modeling, layout, shading, animation, lighting, special effects or rendering can collaborate to create a scene. The Omniverse Nucleus is described with the open standard USD, Universal Scene Description, a fabulous interchange framework invented by Pixar. Multiple users can connect to Nucleus, transmitting and receiving changes to their world as USD snippets.

The 2nd part of Omniverse is the composition, rendering, and animation engine - the simulation of the virtual world. Omniverse is a platform built from the ground up to be physically-based. It is fully path-traced. Physics is simulated with NVIDIA PhysX, materials are simulated with NVIDIA MDL and Omniverse is fully integrated with NVIDIA AI. Omniverse is cloud-native, multi-GPU scalable and runs on any RTX platform and streams remotely to any device.

The third part is NVIDIA CloudXR, a stargate if you will. You can teleport into Omniverse with VR and AIs can teleport out of Omniverse with AR. Omniverse was released to open beta in December. Let me show you what talented creators are doing. Creators are doing amazing things with Omniverse.

At Foster and Partners, designers in 17 locations around the world, are designing buildings together in their Omniverse shared virtual space. ILM is testing Omniverse to bring together internal and external tool pipelines from multiple studios. Omniverse lets them collaborate, render final shots in real time and create massive virtual sets like holodecks.

Ericsson is using Omniverse to do real-time 5G wave propagation simulation, with many multi-path interferences. Twin Earth is creating a digital twin of Earth that will run on 20,000 NVIDIA GPUs. And Activision is using Omniverse to organize their more than 100,000 3D assets into a shared and searchable world. Bentley is the world's leading infrastructure engineering software company. Everything that's constructed - roads and bridges, rail and transit systems, airports and seaports - about 3% of the world's GDP or three-and-a-half trillion dollars a year.

Bentley's software is used to design, model, and simulate the largest infrastructure projects in the world. 90% of the world's top 250 engineering firms use Bentley. They have a new platform called iTwin - an exciting strategy to use the 3D model, after construction, to monitor and optimize the performance throughout its life.

We are super excited to partner with Bentley to create infrastructure digital twins in Omniverse. Bentley is the first 3rd-party company to be developing a suite of applications on the Omniverse platform. This is just an awesome use of Omniverse, a great example of digital twins and Bentley is the perfect partner.

And here's Perry Nightingale from WPP, the largest ad agency in the world, to tell you what they're doing. WPP is the largest marketing services organization on the planet, and because of that, we're also one of the largest production companies in the world That is a major carbon hotspot for us. We've partnered with NVIDIA, to capture locations virtually and bring them to life with studios in Omniverse. Over 10 billion points have turned into a giant mesh in Omniverse. For the first time, we can shoot locations virtually that are as real as the actual places themselves. Omniverse also changes the way we make work.

A collaborative platform, that means multiple artists, at multiple points in the pipeline, in multiple parts of the world can collaborate on a single scene. Real time CGI in sustainable studios. Collaboration with Omniverse is the future of film at WPP. One of the most important features of Omniverse is that it obeys the laws of physics.

Omniverse can simulate particles, fluids, materials, springs and cables. This is a fundamental capability for robotics. Once trained, the AI and software can be downloaded from Omniverse. In this video, you'll see Omniverse's physics simulation with rigid and soft bodies, fluids, and finite element modeling.

And a lot more - enjoy! Omniverse is a physically-based virtual world where robots can learn to be robots. They'll come in all sizes and shapes - box movers, pick and place arms, forklifts, cars, trucks. In the future, a factory will be a robot, orchestrating many robots inside, building cars that are robots themselves. We can use Omniverse to create a virtual factory, train and simulate the factory and its robotic workers inside. The AI and software that run the virtual factory are exactly the same as what will run the actual one. The virtual and physical factories and their robots will operate in a loop.

They are digital twins. Connecting to ERP systems, simulating the throughput of the factory, simulating new plant layouts, and becoming the dashboard of the operator - even uplinking into a robot to teleoperate it. BMW may very well be the world's largest custom-manufacturing company.

BMW produces over 2 million cars a year. In their most advanced factory, a car a minute. Every car is different. We are working with BMW to create a future factory. Designed completely in digital, simulated from beginning to end in Omniverse, creating a digital twin, and operating a factory where robots and humans work together. Let's take a look at the BMW factory. Welcome to BMW Production, Jensen. I am pleased to show you why BMW sets the standards for innovation and flexibility. Our collaboration with NVIDIA Omniverse and NVIDIA AI leads into a new era of digitalization of automobile production.

Fantastic to be with you, Milan. I am excited to do this virtual factory visit with you. We are inside the digital twin of BMW's assembly system, powered by Omniverse. For the first time, we are able to have our entire factory in simulation. Global teams can collaborate using different software packages like Revit, CATIA, or point clouds to design and plan the factory in real time 3D. The capability to operate in a perfect simulation revolutionizes BMW's planning processes. BMW regularly reconfigures its factories to accommodate new vehicle launches. Here we see 2 planning experts located in different parts of the world, testing a new line design in Omniverse. One of them wormholes into an assembly simulation with a motion capture suit, records

task movements, while the other expert adjusts the line design - in real time. They work together to optimize the line as well as worker ergonomics and safety. “Can you tell how far I have to bend down there?” “First, I’ll get you a taller one” “Yeah, it’s perfect” We would like to be able to do this at scale in simulation.

That's exactly why NVIDIA has Digital Human for simulation. Digital Humans are trained with data from real associates. You can then use Digital Humans in simulation to test new workflows for worker ergonomics and efficiency. Now, your factories employ 57,000 people that share workspace with many robots designed to make their jobs easier. Let's talk about them. You are right, Jensen.

Robots are crucial for a modern production system. With NVIDIA Isaac robotics platform, BMW is deploying a fleet of intelligent robots for logistics to improve the material flow in our production. This agility is necessary since we produce 2.5 million vehicles per year, 99% of them are custom.

Synthetic data generation and domain randomization available in Isaac are key to bootstrapping machine learning. Isaac Sim generates millions of synthetic images and vary the environment to teach the robots. Domain randomization can generate an infinite permutation of photorealistic objects, textures, orientations and lighting conditions. It is ideal for generating ground truth, whether for detection, segmentation, or depth perception.

Let me show you an example of how we can combine it all to operate your factory. With NVIDIA's Fleet Command, your associates can securely orchestrate robots and other devices in the factory from Mission Control. They could monitor in real time complex manufacturing cells, update software over the air, launch robot missions and teleoperate. When a robot needs a helping hand, an alert can be sent to Mission Control and one of your associates can take control to help the robot. We're in the digital twin of one of your factories, but you have 30 others spread across 15 countries.

The scale of BMW production is impressive, Milan. Indeed, Jensen, the scale and complexity of our production network requires BMW to constantly innovate. I am happy about the tight collaboration between our two companies. NVIDIA Omniverse and NVIDIA AI give us the chance to simulate all 31 factories in our production network.

These new innovations will reduce the planning times, improve flexibility and precision and at the end, produce 30% more efficient planning processes. Milan, I could not be more proud of the innovations that our collaboration is bringing to the Factories of the Future. I appreciate you hosting me for a virtual visit of the digital twin of your BMW Production. It is a work of art! The ecosystem is really excited about Omniverse. This open platform, with USD universal 3D interchange, connects them into a large network of users.

We have 12 connectors to major design tools already with another 40 in flight. Omniverse Connector SDK is available for download now. You can see that the most important design tools are already signed-up. Our lighthouse partners are from some of the world's largest industries - Media and Entertainment, Gaming, Architecture, Engineering, and Construction; Manufacturing, Telecommunications, Infrastructure, and Automotive. Computer makers worldwide are building NVIDIA-Certified workstations, notebooks, and servers optimized for Omniverse. And starting this summer, Omniverse will be available for enterprise license.

Omniverse - NVIDIA's platform for creating and simulating shared virtual worlds. Data center is the new unit of computing. Cloud computing and AI are driving fundamental changes in the architecture of data centers. Traditionally, enterprise data centers ran monolithic software packages.

Virtualization started the trend toward software-defined data centers - allowing applications to move about and letting IT manage from a "single-pane of glass". With virtualization, the compute, networking, storage, and security functions are emulated in software running on the CPU. Though easier to manage, the added CPU load reduced the data center's capacity to run applications, which is its primary purpose. This illustration shows the added CPU load in the gold-colored part of the stack. Cloud computing re-architected data centers again, now to provision services for billions of consumers.

Monolithic applications were disaggregated into smaller microservices that can take advantage of any idle resource. Equally important, multiple engineering teams can work concurrently using CI/CD methods. Data center networks became swamped by east-west traffic generated by disaggregated microservices.

CSPs tackled this with Mellanox's high-speed low-latency networking. Then, deep learning emerged. Magical internet services were rolled out, attracting more customers, and better engagement than ever. Deep learning is compute-intensive which drove adoption of GPUs. Nearly overnight, consumer AI services became the biggest users of GPU supercomputing technologies. Now, adding Zero-Trust security initiatives makes infrastructure software processing one of the largest workloads in the data center.

The answer is a new type of chip for Data Center Infrastructure Processing like NVIDIA's Bluefield DPU. Let me illustrate this with our own cloud gaming service, GeForce Now, as an example. GeForce Now is NVIDIA's GeForce-in-the-cloud service. GeForce Now serves 10 million members in 70 countries. Incredible growth.

GeForce Now is a seriously hard consumer service to deliver. Everything matters - speed of light, visual quality, frame rate, response, smoothness, start-up time, server cost, and most important of all, security. We're transitioning GeForce Now to BlueField. With Bluefield, we can isolate the infrastructure from the game instances, and offload and accelerate the networking, storage, and security. The GeForce Now infrastructure is costly. With BlueField, we will improve our quality of service and concurrent users at the same time - the ROI of BlueField is excellent.

I'm thrilled to announce our first data center infrastructure SDK - DOCA 1.0 is available today! DOCA is our SDK to program BlueField. There's all kinds of great technology inside.

Deep packet inspection, secure boot, TLS crypto offload, RegEX acceleration, and a very exciting capability - a hardware-based, real-time clock that can be used for synchronous data centers, 5G, and video broadcast. We have great partners working with us to optimize leading platforms on BlueField: Infrastructure software providers, edge and CDN providers, cybersecurity solutions, and storage providers - basically the world's leading companies in data center infrastructure. Though we're just getting started with BlueField 2, today we're announcing BlueField 3.

22 billion transistors. The first 400 Gbps networking chip. 16 Arm CPUs to run the entire virtualization software stack - for instance, running VMware ESX. BlueField 3 takes security to a whole new level, fully offloading and accelerating IPSEC and TLS cryptography, secret key management, and regular expression processing.

We are on a pace to introduce a new Bluefield generation every 18 months. BlueField 3 will do 400 Gbps and be 10x the processing capability of BlueField 2. And BlueField 4 will do 800 Gbps and add NVIDIA's AI computing technologies to get another 10x boost.

100x in 3 years -- and all of it will be needed. A simple way to think about this is that 1/3rd of the roughly 30 million data center servers shipped each year are consumed running the software-defined data center stack. This workload is increasing much faster than Moore's law. So, unless we offload and accelerate this workload, data centers will have fewer and fewer CPUs to run applications. The time for BlueField has come. At the beginning of the big bang of modern AI, we recognized the need to create a new kind of computer for a new way of developing software.

Software will be written by software running on AI computers. This new type of computer will need new chips, new system architecture, new ways to network, new software, and new methodologies and tools. We've invested billions into this intuition, and it has proven helpful to the industry. It all comes together as DGX - a computer for AI. We offer DGX as a fully integrated system, as well as offer the components to the industry to create differentiated options. I am pleased to see so much AI research advancing because of DGX - top universities, research hospitals, telcos, banks, consumer products companies, car makers, and aerospace companies.

DGX help their AI researchers - whose expertise is rare, scarce, and their work strategic. It is imperative to make sure they have the right instrument. Simply, if software is to be written by computers, then companies with the best software engineers will also need the best computers. We offer several configurations - all software compatible. The DGX A100 is a building block that contains 5 petaFLOPS of computing and superfast storage and networking to feed it. DGX Station is an AI data center in-a-box designed for workgroups - plugs into a normal outlet.

And DGX SuperPOD is a fully integrated, fully network-optimized, AI-data-center-as-a-product. SuperPOD is for intensive AI research and development. NVIDIA's own new supercomputer, called Selene, is 4 SuperPODs. It is the 5th fastest supercomputer and the fastest industrial supercomputer in the world's Top 500. We have a new DGX Station 320G. DGX Station can train large models - 320 gigabytes of super-fast HBM2e connected to 4 A100 GPUs over 8 terabytes per second of memory bandwidth.

8 terabytes transferred in one second. It would take 40 CPU servers to achieve this memory bandwidth. DGX Station plugs into a normal wall outlet, like a big gaming rig, consumes just 1,500 watts, and is liquid-cooled to a silent 37 db. Take a look at the cinematic that our engineers and creative team did.

A CPU cluster of this performance would cost about a million dollars today. DGX Station is $149,000 - the ideal AI programming companion for every AI researcher. Today we are also announcing a new DGX SuperPOD. Three major upgrades: The new 80 gigabyte A100 which brings the SuperPOD to 90 terabytes of HBM2 memory, with aggregate bandwidth of 2.2 exabytes per second.

It would take 11,000 CPU servers to achieve this bandwidth - about a 250-rack data center - 15 times bigger than the SuperPOD. Second, SuperPOD has been upgraded with NVIDIA BlueField-2. SuperPOD is now the world's first cloud-native supercomputer, multi-tenant shareable, with full isolation and bare-metal performance.

And third, we're offering Base Command, the DGX management and orchestration tool used within NVIDIA. We use Base Command to support thousands of engineers, over two hundred teams, consuming a million-plus GPU-hours a week. DGX SuperPOD starts at seven million dollars and scales to sixty million dollars for a full system. Let me highlight 3 great uses of DGX. Transformers have led to dramatic breakthroughs in Natural Language Processing. Like RNN and LSTM, Transformers are designed to inference on sequential data.

However, Transformers, more than meets the eyes, are not trained sequentially, but use a mechanism called attention such that Transformers can be trained in parallel. This breakthrough reduced training time, which more importantly enabled the training of huge models with a correspondingly enormous amount of data. Unsupervised learning can now achieve excellent results, but the models are huge.

Google's Transformer was 65 million parameters. OpenAI's GPT-3 is 175 billion parameters. That's 3000x times larger in just 3 years. The applications for GPT-3 are really incredible. Generate document summaries.

Email phrase completion. GPT-3 can even generate Javascript and HTML from plain English - essentially telling an AI to write code based on what you want it to do. Model sizes are growing exponentially - on a pace of doubling every two and half months. We expect to see multi-trillion-parameter models by next year, and 100 trillion+ parameter models by 2023. As a very loose comparison, the human brain has roughly 125 trillion synapses. So these transformer models are getting quite large.

Training models of this scale is incredible computer science. Today, we are announcing NVIDIA Megatron - for training Transformers. Megatron trains giant Transformer models - it partitions and distributes the model for optimal multi-GPU and multi-node parallelism. Megatron does fast data loading, micro batching, scheduling and syncing, kernel fusing. It pushes the limits of every NVIDIA invention - NCCL, NVLink, Infiniband, Tensor Cores.

Even with Megatron, a trillion-parameter model will take about 3-4 months to train on Selene. So, lots of DGX SuperPODS will be needed around the world Inferencing giant Transformer models is also a great computer science challenge. GPT-3 is so big, with so many floating-point operations, that it would take a dual-CPU server over a minute to respond to a single 128-word query. And GPT-3 is so large that it doesn't fit in GPU memory - so it will have to be distributed.

multi-GPU multi-node inference has never been done. Today, we're announcing the Megatron Triton Inference Server. A DGX with Megatron Triton will respond within a second! Not a minute - a second! And for 16 queries at the same time. DGX is 1000 times faster and opens up many new use-cases, like call-center support, where a one-minute response is effectively unusable. Naver is Korea's #1 search engine. They installed a DGX SuperPOD and are running their AI platform CLOVA to train language models for Korean.

I expect many leading service providers around the world to do the same Use DGX to develop and operate region-specific and industry-specific language services. NVIDIA Clara Discovery is our suite of acceleration libraries created for computational drug discovery - from imaging, to quantum chemistry, to gene variant-calling, to using NLP to understand genetics, and using AI to generate new drug compounds. Today we're announcing four new models available in Clara Discovery: MegaMolBART is a model for generating biomolecular compounds.

This method has seen recent success with Insilico Medicine using AI to find a new drug in less than two years. NVIDIA ATAC-seq denoising algorithm for rare and single cell epi-genomics is helping to understand gene expression for individual cells. AlphaFold1 is a model that can predict the 3D structure of a protein from the amino acid sequence. GatorTron is the world's largest clinical language model that can read and understand doctors' notes.

GatorTron was developed at UoF, using Megatron, and trained on the DGX SuperPOD gifted to his alma mater by Chris Malachowsky, who founded NVIDIA with Curtis and me. Oxford Nanopore is the 3rd generation genomics sequencing technology capable of ultra high throughput in digitizing biology - 1/5 of the SARS-CoV-2 virus genomes in the global database were generated on Oxford Nanopore. Last year, Oxford Nanopore developed a diagnostic test for COVID-19 called LamPORE, which is used by NHS.

Oxford Nanopore is GPU-accelerated throughout. DNA samples pass through nanopores and the current signal is fed into an AI model, like speech recognition, but trained to recognize genetic code. Another model called Medaka reads the code and detects genetic variants. Both models were trained on DGX SuperPOD. These new deep learning algorithms achieve 99.9% detection accuracy of single nucleotide variants - this is the gold standard of human sequencing. Pharma is a 1.3 trillion-dollar industry where a new drug can take 10+ years and fails 90% of the time.

Schrodinger is the leading physics-based and machine learning computational platform for drug discovery and material science. Schrodinger is already a heavy user of NVIDIA GPUs, recently entering into an agreement to use hundreds of millions of NVIDIA GPU hours on the Google cloud. Some customers can't use the cloud, so today we are announcing a partnership to accelerate Schrodinger's drug discovery workflow with NVIDIA Clara Discovery libraries and NVIDIA DGX. The world's top 20 pharmas use Schrodinger today. Their researchers are going to see a giant boost in productivity. Recursion is a biotech company using leading-edge computer science to decode biology to industrialize drug discovery. The Recursion Operating System is built on NVIDIA DGX SuperPOD for generating, analyzing and gaining insight from massive biological and chemical datasets.

They call their SuperPOD the BioHive-1 - it's the most powerful computer at any pharma today. Using deep learning on DGX, Recursion is classifying cell responses after exposure to small molecule drugs. Quantum computing is a field of physics that studies the use of natural quantum behavior - superposition, entanglement, and interference - to build a computer.

The computation is performed using quantum circuits that operate on quantum bits - called qubits. Qubits can be 0 or 1, like a classical computing bit, but also in superposition - meaning they exist simultaneously in both states. The qubits can be entangled where the behavior of one can affect or control the behavior of others.

Adding and entangling more qubits lets quantum computers calculate exponentially more information. There is a large community around the world doing research in quantum computers and algorithms. Well over 50 teams in industry, academia, and national labs are researching the field.

We're working with many of them. Quantum computing can solve Exponential Order Complexity problems, like factoring large numbers for cryptography, simulating atoms and molecules for drug discovery, finding shortest path optimizations, like the traveling salesman problem, The limiter in quantum computing is decoherence, falling out of quantum states, caused by the tiniest of background noise So error correction is essential. It is estimated that to solve meaningful problems, several million physical qubits will be required to sufficiently error correct. The research community is making fast progress, doubling physical Qubits each year, so likely achieving the milestone by 2035 to 2040. Well within my career horizon. In the meantime, our mission is to help the community research the computer of tomorrow with the fastest computer of today.

Today, we're announcing cuQuantum - an acceleration library designed for simulating quantum circuits for both Tensor Network Solvers and State Vector Solvers. It is optimized to scale to large GPU memories, multiple GPUs, and multiple DGX nodes. The speed-up of cuQuantum on DGX is excellent. Running the cuQuantum Benchmark, state vector simulation takes 10 days on a dual-CPU server but only 2 hours on a DGX A100. cuQuantum on DGX can productively simulate 10's of qubits. And Caltech, using Contengra/Quimb, simulated the Sycamore quantum circuit at depth 20 in record time using cuQuantum on NVIDIA's Selene supercomputer. What would have taken years on CPUs can now run in a few days on cuQuantum and DGX.

cuQuantum will accelerate quantum circuit simulators so researchers can design better quantum computers and verify their results, architect hybrid quantum-classical systems -and discover more Quantum-Optimal algorithms like Shor's and Grover's. cuQuantum on DGX is going to give the quantum community a huge boost. I'm hoping cuQuantum will do for quantum computing what cuDNN did for deep learning. Modern data centers host diverse applications that require varying system architectures. Enterprise servers are optimized for a balance of strong single-threaded performance and a nominal number of cores.

Hyperscale servers, optimized for microservice containers, are designed for a high number of cores, low cost, and great energy-efficiency. Storage servers are optimized for large number of cores and high IO throughput. Deep learning training servers are built like supercomputers - with the largest number of fast CPU cores, the fastest memory, the fastest IO, and high-speed links to connect the GPUs. Deep learning inference servers are optimized for energy-efficiency and best ability to process a large number of models concurrently.

The genius of the x86 server architecture is the ability to do a good job using varying configurations of the CPU, memory, PCI express, and peripherals to serve all of these applications. Yet processing large amounts of data remains a challenge for computer systems today - this is particularly true for AI models like transformers and recommender systems. Let me illustrate the bottleneck with half of a DGX. Each Ampere GPU is connected to 80GB of super fast memory running at 2 TB/sec.

Together, the 4 Amperes process 320 GB at 8 Terabytes per second. Contrast that with CPU memory, which is 1TB large, but only 0.2 Terabytes per second. The CPU memory is 3 times larger but 40 times slower than the GPU.

We would love to utilize the full 1,320 GB of memory of this node to train AI models. So, why not something like this? Make faster CPU memories, connect 4 channels to the CPU, a dedicated channel to feed each GPU. Even if a package can be made, PCIe is now the bottleneck.

We can surely use NVLINK. NVLINK is fast enough. But no x86 CPU has NVLINK, not to mention 4 NVLINKS. Today, we're announcing our first data center CPU, Project Grace, named after Grace Hopper, a computer scientist and U.S. Navy rear Admiral, who in the '50s pioneered computer programming. Grace is Arm-based and purpose-built for accelerated computing applications of large amounts of data - such as AI.

Grace highlights the beauty of Arm. Their IP model allowed us to create the optimal CPU for this application which achieves x-factors speed-up. The Arm core in Grace is a next generation off-the-shelf IP for servers. Each CPU will deliver over 300 SpecInt with a total of over 2,400 SPECint_rate CPU performance for an 8-GPU DGX.

For comparison, todays DGX, the highest performance computer in the world today is 450 SPECint_rate. 2400 SPECint_rate with Grace versus 450 SPECint_rate today. So look at this again - Before, After, Before, After.

Amazing increase in system and memory bandwidth. Today, we're introducing a new kind of computer. The basic building block of the modern data center.

Here it is. What I'm about to show you brings together the latest GPU accelerated computing, Mellanox high performance networking, and something brand new. The final piece of the puzzle.

The world's first CPU designed for terabyte-scale accelerated computing... her secret codename - GRACE. This powerful, Arm-based CPU gives us the third foundational technology for computing, and the ability to rearchitect every aspect of the data center for AI. We're thrilled to announce the Swiss National Supercomputing Center will build a supercomputer powered by Grace and our next generation GPU.

This new supercomputer, called Alps, will be 20 exaflops for AI, 10 times faster than the world's fastest supercomputer today. Alps will be used to do whole-earth-scale weather and climate simulation, quantum chemistry and quantum physics for the Large Hadron Collider. Alps will be built by HPE and is come on-line in 2023. We're thrilled by the enthusiasm of the supercomputing community, welcoming us to make Arm a top-notch scientific computing platform. Our data center roadmap is now a rhythm consisting of 3-chips: CPU, GPU, and DPU.

Each chip architecture has a two-year rhythm with likely a kicker in between. One year will focus on x86 platforms. One year will focus on Arm platforms. Every year will see new exciting products from us.

The NVIDIA architecture and platforms will support x86 and Arm - whatever customers and markets prefer. Three chips. Yearly Leaps. One Architecture. Arm is the most popular CPU in the world. For good reason - its super energy-efficient. Its open licensing model inspires a world of innovators to create products around it.

Arm is used broadly in mobile and embedded today. For other markets like the cloud, enterprise and edge data centers, supercomputing and PCs. Arm is just starting and has great growth opportunities. Each market has different applications and has unique systems, software, peripherals, and ecosystems.

For the markets we serve, we can accelerate Arm's adoption. Let's start with the big one - Cloud. One of the earliest designers of Arm CPUs for data centers is AWS - its Graviton CPUs are extremely impressive. Today, we're announcing NVIDIA and AWS are partnering to bring Graviton2 and NVIDIA GPUs together. This partnership brings Arm into the most demanding cloud workloads - AI and cloud gaming. Mobile gaming is growing fast and is the primary form of gaming in some markets.

With AWS-designed Graviton2, users can stream Arm-based applications and Android games straight from AWS. It's expected later this year. We are announcing a partnership with Ampere Computing to create a scientific and cloud computing SDK and reference system. Ampere Computing's Altra CPU is excellent - 80 cores, 285 SPECint17, right up there with the highest performance x86.

We are seeing excellent reception at supercomputing centers around the world and at Android cloud gaming services. We are also announcing a partnership with Marvell to create an edge and enterprise computing SDK and reference system. Marvell Octeon excels at IO, storage and 5G processing. This system is ideal for hyperconverged edge servers. We're announcing a partnership with Mediatek to create a reference system and SDK for Chrome OS and Linux PC's. Mediatek is the world's largest SOC maker. Combining NVIDIA GPUs and Mediatek SOCs will make excellent PCs and notebooks.

AI, computers automating intelligence, is the most powerful technology force of our time. We see AI in four waves. The first wave was to reinvent computing for this new way of doing software - we're all in and have been driving this for nearly 10 years. The first adopters of AI were the internet companies - they have excellent computer scientists, large computing infrastructures, and the ability to collect a lot of training data. We are now at the beginning of the next wave. The next wave is enterprise and the industrial edge, where AI can revolutionize the world's largest industries. From manufacturing,

logistics, agriculture, healthcare, financial services, and transportation. There are many challenges to overcome, one of which is connectivity, which 5G will solve. And then autonomous systems. Self-driving cars are an excellent example. But everything that moves will eventually be autonomous. The industrial edge and autonomous systems are the most challenging, but also the largest opportunities for AI to make an impact. Trillion dollar industries can soon apply AI to improve productivity, and invent new products, services and business models.

We have to make AI easier to use - turn AI from computer science to computer products. We're building the new computing platform for this fundamentally new software approach - the computer for the age of AI. AI is not just about an algorithm - building and operating AI is a fundamental change in every aspect of software - Andrej Karpathy rightly called it Software 2.0.

Machine learning, at the highest level, is a continuous learning system that starts with data scientists developing data strategies and engineering predictive features - this data is the digital life experience of a company. Training involves inventing or adapting an AI model that learns to make the desired predictions. Simulation and validation test the AI application for accuracy, generalization, and potential bias. And finally, orchestrating a fleet of computers, whether in your data center or at the edge in the warehouse, farms, or wireless base stations. NVIDIA created the chips, systems, and libraries needed for end-to-end machine learning - for example, technologies like Tensor Core GPUs, NVLINK, DGX, cuDNN, RAPIDS, NCCL, GPU Direct, DOCA, and so much more.

We call the platform NVIDIA AI. NVIDIA AI libraries accelerate every step, from data processing to fleet orchestration. NVIDIA AI is integrated into all of the industry's popular tools and workflows.

NVIDIA AI is in every cloud, used by the world's largest companies, and by over 7,500 AI startups around the world. And NVIDIA AI runs on any system that includes NVIDIA GPUs, from PCs and laptops, to workstations, to supercomputers, in any cloud, to our $99 Jetson robot computer. One segment of computing we've not served is enterprise computing. 70% of the world's enterprises run VMware, as we do at NVIDIA. VMware was created to run many applications on one virtualized machine. AI, on the other hand, runs a single job, bare-metal, on multiple GPUs and often multiple nodes.

All of the NVIDIA optimizations for compute and data transfer are now plumbed through the VMware stack so AI workloads can be distributed to multiple systems and achieve bare-metal performance. The VMware stack is also offloaded and accelerated on NVIDIA BlueField. NVIDIA AI now runs in its full glory on VMware, which means everything that has been accelerated by NVIDIA AI now runs great on VMware. AI applications can be deployed and orchestrated with Kubernetes running on VMware Tanzu. We call this platform NVIDIA EGX for Enterprise.

The enterprise IT ecosystem is thrilled - finally the 300,000 VMware enterprise customers can easily build an AI computing infrastructure that seamlessly integrates into their existing environment. In total, over 50 servers from the world's top server makers will be certified for NVIDIA EGX Enterprise. BlueField 2 offloads and accelerates the VMware stack and does the networking for distributed computing. Enterprise can choose big or small GPUs for heavy-compute or heavy-graphics workloads like Omniverse, or mix and match. All run NVIDIA AI.

Enterprise companies make up the world's largest industries and they operate at the edge - in hospitals, factories, plants, warehouses, stores, farms, cities and roads - far from data centers. The missing link is 5G. Consumer 5G is great, but Private 5G is revolutionary. Today, we're announcing the Aerial A100 - bringing together 5G and AI into a new type of computing platform designed for the edge.

Aerial A100 integrates the Ampere GPU and BlueField DPU into one card - this is the most advanced PCI express card ever created. So, it's not a surprise that Aerial A100 in an EGX system will be a complete 5G base station. Aerial A100 delivers up to full 20 Gbps and can process up to 9 100Mhz massive MIMO for 64T64R - or 64 transmit and 64 receive antenna arrays - state of the art capabilities. Aerial A100 is software-defined, with accelerated features like PHY, Virtual Network Functions, network acceleration, packet pacing, and line-rate cryptography. Our partners ERICSSON, Fujitsu, Mavenir, Altran, and Radisys will build their total 5G solutions on top of the Aerial library.

NVIDIA EGX server with Aerial A100 is the first 5G base-station that is also a cloud-native, secure, AI edge data center. We have brought the power of the cloud to the 5G edge. Aerial also extends the power of 5G into the cloud.

Today, we are excited to announce that Google will support NVIDIA Aerial in the GCP cloud. I have an important new platform to tell you about. The rise of microservice-based applications and hybrid-cloud has exposed billions of connections in a data center to potential attack. Modern Zero-Trust security models assume the intruder is already inside and all container-to-container communications should be inspected, even within a node. This is not possible today. The CPU load of monitoring every piece of traffic is simply too great.

Today, we are announcing NVIDIA Morpheus - a data center security platform for real-time all-packet inspection. Morpheus is built on NVIDIA AI, NVIDIA BlueField, Net-Q network telemetry software, and EGX. We're working to create solutions with industry leaders in data center security - Fortinet, Red Hat, Cloudflare, Splunk, F5, and Aria Cybersecurity. And early customers - Booz Allen Hamilton, Best Buy, and of course, our own team at NVIDIA. Let me show you how we're using Morpheus at NVIDIA. It starts with a network. Here we see a representation of a network, where dots are servers and lines (the edges) are packets flowing between those servers.

Except in this network, Morpheus is deployed. This enables AI inferencing across your entire network, including east/west traffic. The particular model being used here has been trained to identify sensitive information - AWS credentials, GitHub credentials, private keys, passwords. If observed in the packet, these would appear as red lines, and we don't see any of that. Uh oh, what happened. An updated configuration was deployed to a critical business app on this server. This update accidentally removed encryption, and now everything that communicates with that app sends and receives sensitive credentials in the clear.

This can quickly impact additional servers. This translates to continuing exposure on the network. The AI model in Morpheus is searching through every packet for any of these credentials, continually flagging when it encounters such data. And rather than using pattern matching, this is done with a deep neural network - trained to generalize and identify patterns beyond static rule sets. Notice all of the individual lines. It's easy to see how quickly a human could be overwhelmed by the vast amount of data coming in.

Scrolling through the raw data gives a sense of the massive scale and complexity that is involved. With Morpheus, we immediately see the lines that represent leaked sensitive information. By hovering over one of those red lines, we show complete info about the credential, making it easy to triage and remediate. But what happens when this remediation is necessary? Morpheus enables cyber applications to integrate and collect information for automated incident management and action prioritization. Originating servers, destination servers, actual exposed credentials, and even the raw data is available. This speeds recovery and informs

which keys were compromised and need be rotated. With Morpheus, the chaos becomes manageable. The IT ecosystem has been hungry for an AI computing platform that is enterprise and edge ready.

NVIDIA AI on EGX with Aerial 5G is the foundation of what the IT ecosystem has been waiting for. We are supported by leaders from all across the IT industry - from systems, infrastructure software, storage and security, data analytics, industrial edge solutions, manufacturing design and automation, to 5G infrastructure. To complete our enterprise offering, we now have NVIDIA AI Enterprise software so businesses can get direct-line support from NVIDIA. NVIDIA AI Enterprise is optimized and certified for VMware, and offers services and support needed by mission-critical enterprises. Deep learning has unquestionably revolutionized computing. Researchers continue to innovate at lightspeed with new models and new variants.

We're creating new systems to expand AI into new segments - like Arm, enterprise, or 5G edge. We're also using these systems to do basic research in AI and building new AI products and services. Let me show you some of our work in AI, basic and applied research. DLSS: Deep Learning Super Sampling. StyleGAN: AI high resolution image generator.

GANcraft: a neural rendering engine, turning Minecraft into realistic 3D. GANverse3D: turns photographs into animatable 3D models. Face Vid2Vid: a talking-head rendering engine that can reduce streaming bandwidth by 10x, while re-posing the head and eyes. Sim2Real: a quadruped trained in Omniverse, and the AI can run in a real robot - digital twins. SimNet: a Physics Informed Neural Network solver that can simulate large-scale multi-physics. BioMegatron: the largest biomedical language model ever trained.

3DGT: Omniverse synthetic data generation. and OrbNet: a machine learning quantum solver for quantum chemistry. This is just a small sampling of the AI work we're doing at NVIDIA. We're building AIs to use in our products and platforms. We're also packaging up the models to be easily integrated into your applications.

These are essentially no-coding open-source applications that you can modify. Now, we're offering NGC pre-trained models, that you can plug into these applications, or ones you develop. These pre-trained models are production quality, trained by experts, and will continue to benefit from refinement.

There are new credentials to tell you about the models' development, testing, and use. And each comes with a reference application sample code. NVIDIA pre-trained models are state-of-the-art and meticulously trained, but there is infinite diversity of application domains, environments, and specializations. No one has all the data - sometimes it's rare, sometimes they're trade-secrets.

So we created technology for you to fine-tune and adapt NVIDIA pre-trained models for your applications. TAO applies transfer learning on your data to fine-tune our models with your data. TAO has an excellent federated learning system to let multiple parties collectively train a shared model while protecting data privacy.

NVIDIA federated learning is a big deal - researchers at different hospitals can collaborate on one AI model while keeping their data separate to protect patient privacy. TAO uses NVIDIA's great TensorRT to optimize the model for the target GPU system. With NVIDIA pre-trained models and TAO, something previously impossible for many, can now be done in hours. Fleet Command is a cloud-native platform for securely operating and orchestrating AI across a distributed fleet of computers - it was purpose-built for operating AI at the edge. Fleet Command running on Certified EGX systems will feature secure boot, attestation, uplink and downlink, to a confidential enclave. From any cloud or on-prem, you can monitor the health of your fleet.

Let's take a look at how one customer is using NGC pre-trained models and TAO to fine-tune models and run in our Metropolis smart city application, orchestrated by Fleet Command. No two industrial environments are the same. And conditions routinely change Adapting, managing, and operationalizing AI-enabled applications at the edge for unique, specific sites can be incredibly challenging - requiring a lot of data and time for model training.

Here, we're building applications for a factory with multiple problems to solve Understanding how a factory floor space is used over time Ensuring worker safety, with constantly evolving machinery and risk factors Inspecting products on the factory line, where operations change routinely We start with a Metropolis application running 3 pretrained models from NGC. We are up and running in minutes. But since the environment is visually quite different from the data used to train this model, the video analytics accuracy at this specific site isn't great. People are not being accurately recognized and tracked and defects are being missed. Now let's use NVIDIA TAO to solve for this.

With this simple UI, we re-train and adapt our pre-trained models with labelled data from the specific environment we're deploying in. We select our datasets. Each is a few 100 images, as opposed to millions of labelled images required if we were to train this from scratch.

With NVIDIA Tao we go from 65% to over 90% accuracy. And through pruning & quantization, compute complexity is reduced by 2x - with no reduction in accuracy and real-time performance is maintained. In this example, the result is 3 models specifically trained for our factory - all in just minutes With one-click, we update and deploy these optimized models onto NVIDIA Certified servers with Fleet Command, seamlessly and securely from the cloud. From secure boot to our confidential AI Enclave in GPUs, application data and critical intellectual property remains safe.

AI accuracy, system performance and health can be monitored remotely. This establishes a feedback loop for continuous application enhancements. The factory now has an end-to-end framework to adapt to changing conditions - often and easily. We make it easy to adapt and optimize NGC pretrained models with NVIDIA TAO and deploy and orchestrate applications with Fleet Command.

We have all kinds of computer vision, speech, language and robotics models, and more coming all the time. Some are the work of our genetics and medical imaging team. For example, this model that predicts supplemental oxygen needs from xX-ray and electronic health records. This is a collaboration of 20 hospitals across 8 countries and 5 continents for COVID-19. Federated learning and NVIDIA TAO was essential to make this possible. The world needs a state-of-the-art conversational AI that can be customized and processed anywhere.

Today, we're announcing the availability of NVIDIA Jarvis - a state-of-the-art deep learning AI for speech recognition, language understanding, translations, and speech. End-to-end GPU-accelerated, Jarvis interacts in about 100 milliseconds - listening, understanding, and responding faster than the blink of a human eye. We trained Jarvis for several million GPU-hours, on over 1 billion pages of text, and sixty thousand hours of speech in different languages, accents, environments and lingos. Out of the box, Jarvis achieves a world-class 90% recognition accuracy. You can get even better for your application by refining with your own data using NVIDIA TAO. Jarvis supports 5 languages today - English, Japanese, Spanish, German, French, and Russian.

Jarvis does excellent translation, in the Bilingual Evaluation Understudy benchmark, Jarvis scores 40 points for English-to-Japanese, and 50 points for English-to-Spanish. 40 is high quality and state-of-the-art. 50 is considered fluent translation. Jarvis can be customized for domain jargon. We've trained Jarvis for technical and healthcare scenarios. It's easy with TAO. And Jarvis now speaks with expression and emotion that you can control - no more mechanical talk. And lastly, this is a big one - Jarvis can be deployed in the cloud, on EGX in your data center, at the edge in a shop, warehouse, or factory running on EGX Aerial, or inside a delivery robot running on a Jetson computer. Jarvis early access program began last May and our Conversational AI software has been downloaded 45,000 times.

Among early users is T-Mobile, the U.S. telecom giant. They're using Jarvis to offer exceptional customer service that demands high-quality and low-latency needed in real-time speech recognition. We're announcing a partnership with Mozilla Common Voice, one of the world's largest multi-language voice datasets, and its openly available to all NVIDIA will use our DGXs to process and train Jarvis with the dataset of 150,000 speakers in 65 languages and offer Jarvis back to the community for free.

So go to Mozilla Common Voice and make some recordings! Let's make universal translation possible and help people around the world understand each other. Now let me show you Jarvis. The first part of Jarvis is speech recognition.

Jarvis is over 90% accurate out-of-the-box - that's world-class. And you can still use TAO to make it even better for your application, like customizing for healthcare jargon. Chest x-ray shows left retrocardiac opacity, and this may be due to atelectasis, aspiration, or early pneumonia. I have no idea what I said but Jarvis recognized it perfectly. Jarvis translation now supports five languages. Let's do Japanese.

"Excuse me, I'm looking for the famous Jangara Ramen shop. It should be nearby, but I don't see it on my map. Can you show me the way? I'm very hungry." That's great. Excellent accuracy. I think. Instantaneous response. You can do German, French, and Russian--with more languages on the way. Jarvis also speaks with feelings.

Let's try this - "The more you buy, the more you save!" "The more you buy, the more you save" I think we are going to need more enthusiasm. "The more you buy, the more you save!" NVIDIA Jarvis - state-of-the-art deep learning conversational AI, interactive response, 5 languages, customize with TAO, and deploy from cloud, to edge, to autonomous systems. Recommender systems are the most important machine learning pipeline in the world today. It is the engine for search, ads, on-line shopping, music, books, movies, user-generated content, news. Recommender systems predict your needs and preferences from past interactions with you, your explicit preferences and learned preferences using methods called collaborative and content filtering. Trillions of items to be recommended to billions of people - the problem space is quite large.

We would like to productize a state-of-the-art recommender system so that all companies can benefit from the transformative capabilities of this AI. We built an open-source recommender system framework, called Merlin, which simplifies an end-to-end workflow from ETL, to training, to validation, to inference. It is architected to scale as your dataset grows. And if you already have a ton of data, it is the fastest recommender system ever built.

In our benchmarks, we achieved speed-ups of 10-50x for ETL, 2-10x for training and 3-100 times for inference depending on the exact setup. Merlin is now available on NGC. Our vision for Maxine is to eventually be your avatar in virtual worlds created in Omniverse - to enable virtual presence. The technologies of virtual presence can benefit video conferencing today.

Alex is going to tell you all about it. Hi everyone, I'm Alex and I'm the Product Manager for NVIDIA Maxine. These days, we're spending so much time doing video conferencing. Don't you want a much better video communication experience? Today, I am so thrilled today to share with you the NVIDIA Maxine, a suite of AI technologies that can improve the video conferencing experience for everyone. When combined with NVIDIA Jarvis, Maxine offers the most accurate speech to text .

See, It's now transcribing everything I'm saying, all in real time. And in addition, with the help of Jarvis, Maxine can also translate what I am saying into multiple languages. That would make international meetings so much easier. Another great feature I'd love to share with you is Maxine's Eye Contact feature. Often times when I am presenting, I am not looking at the camera When I turn on Maxine's eye contact feature, it corrects the position of my eyes so that I'm looking back into the camera again. Now, I can make eye contact with everyone else.

The meeting experience just gets so much more engaging! Last but definitely not least, Maxine can even improve the video quality when bandwidth isn't sufficient. Now let's take a look at my video quality when bandwidth drops to as low as 50 kbps. Definitely not great. Maxine's AI face codec can improve the quality of my video even when bandwidth is as low as 50 kbps. The most powerful part of Maxine is that all of these amazing AI features can run simultaneously together and all in real time.

Or you can just pick and choose a few. It's just that flexible! Now before I go let me turn off all the Maxine features so you can see what I looked like without Maxine's help. Not ideal. Now let's turn back on all the Maxine features. See - so much better thanks to NVIDIA Maxine! NGC has pre-trained models. TAO lets you fine-tune and adapt models to your applications.

Fleet Command deploys and orchestrates your models. The final piece is the inference server - to infer insight from the continuous streams of data coming into your EGX servers or your cloud instance. NVIDIA Triton is our inference server. Triton is a model scheduling and dispatch engine that can handle just about anything you throw at it: Any AI model that runs on cuDNN, so basically every AI model. From any framework - TensorFlow, Pytorch, ONNX, OpenVINO, TensorRT, or custom C++/python backends.

Triton schedules on the multiple generations of NVIDIA GPUs and x86 CPUs. Triton maximizes the CPU and GPU utilization. Triton scales with Kubernetes and handles live updates. Triton is fantastic for everything from image to speech recognition, from recommenders to language understanding.

But let me show you something really hard - generating biomolecules for drug discovery. In drug discovery, it all comes down to finding the right molecule, the right shape, the right interactions with a protein - the right pharmaco-kinetic properties. Working with scientists from Astra-Zeneca, NVIDIA researchers trained a language model to understand SMILES - the language of chemical structures.

We developed the model with Megatron and trained it on a SuperPOD. Then created a generative model that reads the structure of successful chemical drug-compounds to then generate potentially effective novel-compounds. Now we can use AI to generate candidate compounds that can then be further refined with physics-based simulations, like docking or Schrödinger's FEP+. Generating with a CPU is slow - it takes almost 8 seconds to generate one molecule.

On a single A100 with Triton, it takes about 0.3 seconds - 32X faster! Using Triton Inference Server, we can scale this up to a SuperPOD and generate thousands of molecules per second. AI and simulation is going to revolutionize drug discovery. We provide these AI capabilities to our ecosystem of millions of developers around the world.

Thousands of companies are building their most important services on NVIDIA AI. BestBuy is using Morpheus as the foundation of AI-based anomaly detection in their network. GE Healthcare has built an echocardiogram that uses TensorRT to quickly identify different views of the wall motion of the heart and select the best ones for analysis. Spotify has over 4 billion playlists. They're using RAPIDS to analyze models more efficiently, which lets them refresh personalized playlists more often.

This keeps t

2021-04-16 01:25

Show Video

Other news