AWS re:Invent 2020: Semiconductor design on AWS with Qualcomm

AWS re:Invent 2020: Semiconductor design on AWS with Qualcomm

Show Video

hello i'm mark duffield and welcome to the semiconductor design on aws with guest speaker qualcomm today i am super excited to be talking both about semiconductor but also to hand this off shortly over to anupama astana who have been working with for several years on their hybrid options and running aws i should say workloads on aws so super excited to be here and again just a overall welcome to re invent and even if it's virtual we know it's going to be awesome contributors here special call out here to ian and matt who also did some contributions to this and i'll let anupama dive into that as well today what we're going to be covering is this uh fairly short and but very uh content rich we're going to be covering aws for semiconductor design i'll then hand it off over to anupama and then at the end i'm going to close with some resources for where you can go to actually figure out and how you can get help for running your workloads and your flows on aws all right so with that said we like to talk to customers about a fundamental rethink when they're thinking about transitioning over to aws and thinking about pulling their workloads in from their on-prem data centers so stop worrying about the capital expense stop working uh stop worrying about the capacity and technology options but rather let's focus on innovation let's think about how you can actually uh have your engineers your designers be thinking about the next big chip rather than worrying about the infrastructure details and worrying about well when is the next funding going to land and when is the next big technology going to land we want to help you offload that and help you so you can actually be thinking about innovation rather than thinking about uh you know all of your infrastructure issues and speaking of infrastructure when you think about what aws has and the capabilities that we have this is just kind of a brief and terse list of what we have currently right so we have 24 launched regions right and each one of these regions has several availability zones we also have six more announced regions on top of that and a total of 77 availability zones additionally we have five local zones which are enabling ultra latency uh sensitive applications and we're seeing a lot of customers in this space in the semiconductor space specifically really be interested in local zones and what can be done there as well we're also spread out across 245 countries we have over 97 direct connect locations we have 220 points of presence which are enabling things like route 53 and s3 transfer and our cloud front distribution content distribution network as well with cloudfront so if you look at the infrastructure of aws you know there's there's really not a place that we can't help customers and we're continuing to build on that as well so let's dive into uh semiconductor design on aws and this is just kind of a short and to the point um idea that uh you know an approach that we like to hand off to our customers right so you have the the same things on the left side and the same things on the right but now you're running all of this on aws and we're helping you do that right so you're going to have the customer specifications you're going to have the tools the design data you know project data and the libraries that you're gonna need you're going to need to actually design a chip that's all going to have the output of a next-gen connected device uh whether that's an soc or maybe it's just a small chip maybe it's an asic what have you um but the products are the same right so from wafer fabrication and assemble products etc and there's really not a workload in this space that we're not helping customers with right so from ip characterization and place and rot and extraction and opc uh front end back end rtl everything we're doing it on aws which includes an entire soc tape out so when you think about running your workflows in aws we really like you to think about well let's not think about this as a completely different environment let's think about this as well maybe another data center that just might be had spun up in another location right we can help you transition these workloads off quickly to aws and it's going to look and feel much like it will on-prem and with that i'll hand it over to anupama who's going to deep dive into some of the architecture that they've been working on some of the hybrid applications that they've been doing at qualcomm anupama over to you thanks mark for the introductions i'm manupuma from qualcomm um and i lead the grid solution engineering team for qualcomm globally and very excited to be here uh today and sharing our journey into aws with the eda workloads and some of the very creative innovations that came about as we went on this project along with the amazon and synopsis so qualcomm just a brief introduction about qualcomm and and what we do here uh qualcomm is the world's leading wireless technology innovator um qualcomm led the evolution of cellular roadmap which has become increasingly complex over time and as you all are already aware wireless is a very dynamic industry and it's amplified with the 5g foundation of industrial policy and it's just driving a lot of revolution across multiple industries and we are inventing and commercializing very critical enabling technologies in this space what that means is we are fueling a digital transformation across multiple industries and it's expected to be 13 trillion of global economic impact um until 2035. some of the major industries we participate in are from healthcare to energy to manufacturing um 5g enables new cycle of innovation and leading to new business models driving a lot of innovative design requirements from our engineering organizations uh globally and what does what that means is we have to have very rich hpc platforms that are able to address the ever-increasing requirements of qualcomm technology roadmaps and product offerings qualcomm is has multiple data centers around the world and how we manage that whole process of making sure we have enough compute capacity available in our data centers globally is through a very effective demand management process so what we do is look at the product roadmaps their consumptions uh over a historical timeline augment the technology scaling factors and come out with a with the output of uh global compute requirements that looks somewhat like the chart you see up there as you can see with the number of products that we are designing they're increasing complexity and changes in our technology nodes we are driving a lot of compute requirements that comes in peaks and valleys because the tape outs are cyclic in nature and it it does create compute constraints from time to time and how we want to augment our on-prem solutions is with this agile and flexible cloud offering that will enable us to basically give a very rich offering um to the engine to engineering to be able to address unplanned um workloads when possible and also when there is um the time to market is a constraint due to capacity constraint we would like to address it with cloud offerings so our vision is to enable flexibility in the engineering tape outs during compute demand peaks by providing the ability for key workloads to be able to execute in the cloud so the workload we picked was um static timing analysis uh with one reason it is the most compute intensive workload in the in the design cycle but it also comes with the challenge that it also happens to be the most complex workload in the design cycle the benefit uh is of course you know we the ability to offset our most complex cpu and memory intensive flow and enable peak shaving and provide the flexibility back to design teams in enhanced time to market capabilities and the challenges um are multi-fold basically challenges in front of us was not only is a very complex flow but it is also towards end of design cycle very close to tape out so any hiccups in the process would be um very um i mean that could impact basically your whole tape out cycle and customer deliverables and there were multiple other areas that we needed to focus on to to ensure uh the capability was introduced um in a very complimentary way to the engineering teams in particular we are talking about data needs basically a process to generate specific bill of material uh for consuming uh specific to a single invocation of a workflow and also migrating all of those potential data sources to the cloud um with a limited time notice because when you need the compute you really need it and you don't do not have a lot of time to go invest in data transfers the second area was basically the flow needs which means you know the present flow manager software for physical design team had dependencies that required um running on-prem and making sure that on-prem data was always in sync with the cloud instances so um there were make-based workflows and that were that would expect local generation of some output files so that complexity had to be comprehended in the hybrid cloud model um also there were concurrency needs basically as you set up the hybrid cloud environment any any time delays in transition to the cloud they should become compensated uh by access to a vast and highly performant compute in the cloud because that's the motivator in addition uh there were adoption needs which means transition to this completely new methodology and environment should be as transparent to possible for seamless integration with the injuring execution so we we worked on a rich architecture on the which is demonstrated on on the screen there on the left you have basically your on-prem data centers with the data stores your compute and managed by a workload scheduling software and then we connected that whole environment through encrypted vpn vpn channels to aws offering and there we we had our data basic data replications done with very rich caching technology and also we had a separate solution for reference data so why we split the two is if you if you try and cache all the data then you're dealing with terabytes and terabytes of of data which which can be very time consuming so intelligent partitioning between what is the active data set versus what is the reference data set was one key decision we made in our design process um so cache presentation allows near um near instantaneous peering at mount level and um we were able to achieve a speed of 1.99 uh gigabits per second transfer rate in the testing for the caching partitions for the reference data we chose to go with multi-threaded apis to s3 and with the metadata manifest that allows you to present us as posix file systems using luster so this this was useful because we could burn down the luster instances between the execution time of different workflows making it very cost efficient and and also we were able to achieve a better um speed of transfer which was 2.92 uh gigabits per second for for making sure the flows were working well uh we were forwarding our on-prem scheduler to cloud native scheduler and using basically remote execution protocol so it was it was transferred into engineering community and also cloud generated output files were being cached back on prem to make it seamless from from a workflow check pointing and progression perspective also we were able to ease the adoption into this new architecture by making sure the native workflow manager as well as a job submission methodologies did not change on engineering so they were able to work as usual without major impact into adopting the hybrid cloud environment so um the results are basically um shown as comparison comparative runs on the screen there if you look at the if you look at the on-prem we have we are achieving mostly the run times are in aggregate comparable to what you see on prem also for when we were initially populating the cache we did not see a major impact from cache read operations the compute phase was quite similar to what you would see in the cloud and also the right phase between the s3 solution and the and the luster uh did perform really well to meet the objectives of the workflow experiment we did here so in all we were able to run two levels of compute experiments with this architecture one was physical design running at the core level and the second was physical design running at soc level as you can see on the screen we achieved approximately 12.8 percent um average runtime uh per corner faster in in both cases for core level as well as soc level uh leading into a maximum turnaround time of about 24.9 faster at the core level or at the full execution level so this was basically successful execution of sta uh capability at both soc level using uh prime time native tools prime prime time and star rc from synopsis also we were able to demonstrate equal and or better turnaround time um in the cloud which was um which was similar to on-prem and data transfer rates were optimized with efficient caching efficient caching solutions because those of you who work in the physical design area would would note that the data requirements for physical designs are way higher than other functional spaces so having having to deal with something like physical design with the data volume it drives was a very complex under undertaking but with the cash operations it performed really well for us so overall any any project that that you take on has its architectural of offerings as well as emerging technologies that you adopt in the process but our key to success was also in the orchestration and process side that we implemented with entering that made it even more powerful so some of the areas we invested in was making sure we had full visibility on accounting and budgeting because at the lsf level if you put effective monitors and controls in in place then you are able to keep a very very good watch over um the budgeting pieces of which which is obviously very critical to engineering as well as business business operations and saves you from having any surprises towards end of the execution plan also enabled interactive debug capabilities which is a very very productive for physical design engineering as well as areas where we could put monitors in place to reclaim the cloud instances when they were not in use or accidentally left over and security i mean when you work in any fortune 500 company security is where your crown jewels are you're dealing with red ip yellow ip all the time so we were able to go through a security governance process and gain sign off for the architecture that we we just shared with all of you but also i think the key is continuous oversight it is a recurring process uh so that is in that is an area where we put very good governance on so in all uh sharing our summary in our journey to aws we were able to enable hybrid cloud model for uh complex workload like sta on a live design we are very proud of everything we achieved as a team between us um synopsis as well as all the efforts from amazon that led to us demonstrating equal or better performance for complex eda workloads in the cloud so thank you to our um partners in this journey for all your help in enabling qualcomm and look forward to working with you on doing even more with the rich team effort that we have going on with the with the ecosystem thank you very much well thank you very much anupama that is super exciting to see and it's been a pleasure to work with you and your team and really looking forward to continuing that work so awesome and thank you again so with that covered you know we've covered and touched on a few things about the infrastructure and of course this great example uh that anupama has gone through uh let's talk about how you can actually enable your flows on aws and some resources to be able to do that so straight away this is probably the best place to start just go out to this link you're going to find all kinds of great information this is a curated list of content that we're constantly updating we have blog posts out here and white papers reference architectures starting from just some basic architectures to some much more complex and i'll dive into one of those in just a second but we also have more tools and training and much more content out here to include videos and and other customer testimonials as well so this is probably the best place to start and get you up and running quickly on aws please reach out to your account managers your solutions architects anyone in aws and we'll be really happy to help you so now this is a fairly complicated reference architecture and we understand that but what we're trying to really show here is the um the entire industry and and the reach that aws has across this industry so in the top left there you can see the on-prem data center talking about data and connections and then in the middle you'll see the aws portions which include the infrastructure stand up so license servers and nfs servers and all of our native services that are there as well to include fsx luster and efs and of course s3 as well so as we have customers that are actually transitioning over to aws we we want them to think about how they can get a poc workload up and running quickly on aws so small set of data something that they can they feel very comfortable with on aws and they see it actually work and that's where we really try to get to first but then we start building on that very quickly start thinking about okay well now let's actually extend that let's think about maybe actually converting an original s3 bucket into a data lake and then how do you feed that into an analytics pipeline and into a quick sites dashboard dashboard so you can actually start seeing how you can make better use of the data that you have and you can actually reduce the time to market and increase your roi another call out here about this diagram is the overall collaboration that you're going to see across the entire industry in the bottom left there you'll see the tool providers and you'll see the ip providers as well so let's think about how we can actually enable that collaboration across the third parties but also with your foundry right so let's think about yield analysis and some of the wafer defect information does that does that potentially affect your design if you're able to get that information quickly and while it's happening on the fab floor does that maybe change your next design that you already have in flight right so we're trying to get our customers to think about the big picture and thinking about the collaboration across the entire industry and this doesn't stop there of course now you can talk about the contract manufacturers and devices themselves right getting that feedback into collaboration uh vpcs that allow your allow your entire teams to collaborate across the entire industry so although this is there's a lot going on here we can walk you through this entire process we can stand up this entire environment and we're super excited to to enable all of this across the entire industry not just specific to you know an eda flow but also the ip libraries and the the tools and the foundry data and all of this and bring that all into one place so you can actually see and act on the data and the results from that data as well and with that i wanted to say thank you and i really appreciate you attending this this session again very much appreciate everything that anupama had done for this presentation and really excited to continue working with her and working with qualcomm and thank you for attending reinvent and don't forget to fill out your surveys please thank you take care

2021-02-12 15:35

Show Video

Other news