Lightning talks: Gaming and Entertainment: Content creation at scale
hi everyone welcome to this session on rfi gaming and entertainment i am nikhil jaguanshi the research lead for this area to start the session i'm pleased to welcome cj williams today who is the entertainment lead in azure industries hey cj it's great to have you with us today hey it's great to be here and talk about entertainment with you yeah so could you tell us a little bit about microsoft's focus on cloudtech for the entertainment industry yeah absolutely uh you know when we think about uh entertainment and the cloud it really starts with the creator right there's lots of types of creators professionals a large companies people who are just doing it for a hobby and they all have one goal in mind which is to entertain their fans and what we're looking at or what we're focused on is really being able to empower those creators whether they be game developers or someone who's doing a video or music empower them to basically realize their dreams of entertaining their fans and so that's our big focus and we're looking at it across the board from where they're actually creating and how we can bring things like ai to help them create the best experiences they can all the way to how they actually engage their audiences and how they can understand what their audiences you know want to be entertained with and so you know our focus is really across the board to empower them to as i said realize their dreams and and entertain their fans and entertain uh those that are are looking at what they're doing that that sounds super exciting cj and thanks for joining us today yep great to be here several ongoing technology trends inform our research at microsoft i'll note a few here the tools for creating digital content in movies tvs and games are converging for instance game engines are moving towards photorealistic rendering techniques that have been used in movies and movies are trending towards virtual production using game engines so one can imagine a future where massive virtual worlds are created directly in the cloud with a common set of collaborative tools for distribution cloud streaming is already standard for movies and tv and a similar trend is occurring with game streaming such as project xcloud pixel streaming with game engines allows creators to deploy 3d applications to the cloud which users can then interact within a web browser and lastly media consumers are no longer passive this is already true with esports and streaming where communities are built around games but is now accelerating towards games acting as cloud hosted social platforms for instance minecraft fortnite and roblox where users socialize create and exchange artifacts and experiences which brings a lot of value to these platforms at microsoft research we have a diversity of projects addressing the heart problems for this future let me highlight a few project acoustics brings automatic and detailed physical audio effects for games using the cloud to reduce months of computation to minutes it ships today in major game franchises reaching more than 100 million players game ai today is largely based on hard-coded rules leading to non-player characters with simplistic behaviors project peria at msr cambridge aimed for a much richer game ai that learns from experience like people do so training is vastly accelerated by paralyzing in the cloud 3d scenes for games and movies are expensive and time consuming to create as kate will also talk to today at msr asia there is an ongoing research theme of deep learning for automated scene acquisition and generation like this this example rooms generated shown here on the consumption side watch for is a cloud powered live video analytics platform with many diverse applications for instance it can automatically surface exciting moments in an esports live stream and watch for is currently serving 400 million minutes of video content per month as part of amazon esports hub and bing livestream search then this true skill from msr cambridge that is used by xbox live and many major games today the system is used for high quality ranking and matchmaking in multiplayer games which is crucial for player engagement so they match with people with similar skills cloud hosted video content is out of reach for communities worldwide with lower new network connectivity and project mishnu from msr india is solving this problem with a novel networking infrastructure which has been developed in active collaboration with media and satellite partners in india finally the lag between player inputs and game response in a cloud game setting is a major problem so project galena at msr montreal aims to improve upon this by using imitation learning so as you can see we have a large range of efforts attacking core research problems at msr and with the rfi gaming and entertainment effort our focus is on partnering with the industry at large to build technology to empower creators everywhere as sieges said earlier so next up i'm excited to host two technical leaders in gaming for their talks today uh aaron mclaren will talk first he's the audio engine leader at epic games and he will talk about the latest innovations in audio content creation with unreal engine 5 and meta sounds and we will conclude this session with a talk from kate traynor studio technical director of the coalition who will talk about the many content creation challenges faced by game studios today and with that it's goodbye from me and let's hear from aaron thank you everybody hello uh my name is aaron mclaren i'm the audio engine lead at epic games and i'll be presenting today procedural game audio with metasounds before i get into that i wanted to talk a little bit about what real-time procedural audio actually is procedural audio is fundamentally audio that's dynamic uh data driven from many different uh data sources either gameplay uh you know input from players it could be you know game saved data etc from any source uh fundamentally interactable if it's procedural it needs to be uh in games often uh something that can be changed or varied uh depending on gameplay and generated algorithmically in real time uh in many ways in most ways in fact it's analogous to procedural graphics which is something that games are quite used to um procedural graphics would be graphics which are rendered in real time um and they're fundamentally required for games so procedural audio is currently and has been required for games for a long time very much so in xr and uh increasingly outside of traditional uh media or game media like uh music and film especially in the context of linear real time or virtual production which is increasingly sort of a growing industry and uh we talk about procedural versus static audio in the context here where procedural audio again is algorithmically uh generated and fundamentally interactive versus static audio which would be generated ahead of time and rendered uh ahead of time and then sort of just played back that would be like your media player like an mp3 player um and some elements of both of these are typically in play in games in cinema it's almost always just static audio so one of the fundamental problems to procedural content creation across the board not just for audio but also for graphics and any other kind of media in which the art is being generated algorithmically is that requires highly technical artists uh systems-based thinking and a whole new set of tools that avoid various technical bottlenecks like performance and other kinds of problems when it comes to real-time generation of content this sort of extra cost results in a sort of higher upfront investment in terms of time and technology and sort of production overhead but it has a huge potential down the line once that system is created to create massive amounts of variability and interactive content and of course uh in this sort of technical landscape it requires tooling that supports quick iterations and previewing within within an actual real-time environment so that artists which are building these systems can preview them and play with inputs variables that kind of a system that allows you to actually create this content at any sort of scale so in the domain of graphics for real-time procedural generation there's been a number of different tools that have been developed across the sort of game industry and also in in other other uh industries one of the big innovations that happened in the probably about 30 years ago now is the development of shaders which is a kind of language that is standardized that allows for programming hardware pipelines procedurally uh very quickly and it doesn't require a programmer to go in hard code exactly what kind of graphics configuration something needs to be in but shaders themselves is itself a programming language and it does require programming skills which is a sort of high technical bar for content creators so there's many different tools that exist out in the world that allow people who are technical but not necessarily wanting to go down to the level of programming uh you know in text uh there's tools that exist to sort of optimize that workflow to create a more easy and visual and quick environment for programming these procedural systems so in unreal engine we have for example the material graph which is on the left here and on the right is niagara which is a new system for programming particles both of these use under their under the hood basically shaders to drive hardware and these tools allow for a significant boost of productivity at scale for generating and controlling procedural graphics for an audio the analogous tools really don't exist traditionally tools uh in games are procedural but they are basically randomly varying and slightly modifying static assets so the typical process is that sound designer will make a static asset in their their tool outside of a game engine import those tools and then randomly pick from those variations and do some high-level parametric control of them for example volume pitch maybe they'll apply a filter which is generally pretty hard coded and programmed into a kind of pipeline so you call it like a a a static pipeline for audio processing and then panning which also includes like spatialization hrtf sort of this this whole domain here so uh they're often in current systems is like a hard-coded sort of dsp effect pipeline the effects themselves may be swapped out as plug-ins but the general process and of where their the audio is processed and generated is static and these kinds of effects might be reverb eq or dynamics processing which you know sort of your standard mastering effects and uh overall the process is pretty static and this includes all of middleware that i'm aware of and uh most game audio engines there's definitely some work in this area outside of games and sort of on the periphery of games and some middleware is going in this direction but generally speaking this is the traditional approach for procedural audio and the screenshot here is uh soundcues which is uh sort of technology in ue4 and it just basically is varying which randomly picking a variation of a static asset so in ue5 we have some new technology we're working on called metasounds which is basically a new approach to audio in games they're analogous to audio shaders in a way they are basically a user programmable dsp pipeline it allows a sort of analogous tool to the material graph or niagara in unreal engine 5 for audio this is a screenshot here of metasounds and i'll give a demo real quick so the demo i'm going to show you is inside of a project available in early access you can just check it out download it now today and check out the project this is a real metasound used in the ue5 early access project val of the ancients which is available for download the metasound combines traditional static asset variation playback and real-time procedural elements [Applause] the procedural elements allowed the metasound to actually change its duration dynamically either from the game or from the sound designer trying to match animations [Applause] the meta sound features a wide variety of procedural elements synthesis modulation and so on as well as traditional wave player sort of static asset playback this sound i put together for a live twitch stream uh where i walked through in detail how to construct this meta sound but here i want to showcase it to talk about uh the uh sample accurate timing for the drum events [Music] also the parameters are updatable live [Music] so here i can mute the synth element here i can also change the beats per minute [Music] so i wanted to give a quick tour of some of the projects i've seen posted in social media with metasounds uh check out hashtag metasounds on twitter and youtube to see some of the work that people have been doing this first one i wanted to show is uh from nick fornell here in uh games and threes integrated his third party library into metasounds this is a rain physical model that it can be driven from any gameplay this next one is a project by chris zuko called the project mix and he's uh integrated metasounds into his project and here he's showcasing the power of metasounds uh with his system that he's built so he's able to change parameters [Music] school of video game audio posted this demo of a sort of procedural robot voice generator worm juice dev here is uh working on a procedural music system that's driving some visuals in real time [Music] melodic minimal techo techno here is made by battle angel sound which had a really great sounding patch here and arthur barther over here is uh doing some cool stuff with side tone generation and chords driving lumen and nanite which are two new graphics technologies in ue5 [Music] there are hundreds and hundreds of cool projects that people are posting please check them out so uh metasounds doesn't uh happen in a vacuum there is a lot of precedence in outside of the domain of games there's super collider chuck max pure data phallus reactor blocks c sound and kaima and there's probably others i've missed in these areas procedural audio is a huge focus but there are sort of downsides to them which prevent them from operating at scale and games but this is not a new idea in the context of audio i recommend people checking these out if you're interested uh on the subject so metasounds at a high level is a directed async a cyclical graph um all of the metasounds are actually rendered in parallel in background tests using the same architecture as our uh decoders that we have for sounds that means that each of the meta sounds can have independent sample rate channel count buffer size and in fact they can scale as a function of time this is one of the innovations and metasounds that i think that separates out metasounds from those other examples here is that you know we actually have the potential to operate these uh procedural graphs at scale in the context of a real game um we've architected it to be fundamentally pluginable into uh our uh existing audio rendering framework which means all of our existing features that we have in ue unreal engine like spatialization audio propagation submix processing and some of those other more traditional techniques will work right out of the box with metasounds and um we can actually have metasounds composed within other metasounds which is another innovation in the context of of unreal engine with material graph etc we can also have third parties extend metasounds easily through a plug-in node registry and metasounds now and ue5 actually operate with presets so you can create a graph once and then reuse that graph over and over and over versus having a unique graph per instance and then a really exciting feature of metasounds is sample accurate events which i'll talk about in more detail in a second so real quick this is probably too much information here but essentially data is a shared between nodes in a way that allows or prevents copying data so for example in here if node a was an audio buffer output node b c and d would be referencing the audio buffer in a versus making copies and so in this way we can optimize data flow from node to node and it's very easy in fact to make new metasounds with our api here it's all kind of driven through modern c plus plus i wanted to do a little bit of an explanation of sample accurate events uh basically we have this method where we have stored events into a array of sample indices and so a trigger sequence in metasounds is actually a buffer of events where each index each element in that buffer represents an index or an event is intended to happen and with this we can actually subdivide an audio render callback to at an arbitrary resolution so we can actually have events happening at any point in time within a medicine graph so metasounds is really exciting for us and one of the reasons why i'm excited to present here is that i think it's an ideal environment for research into audio and perception in an interactive real-time environment it's a lot of interesting directions that i think someone could take this technology into interesting directions it's very easy to extend the node api to include any third-party library tool so machine learning systems third-party dsp libraries if you're doing physical modeling research and just in general it's a test bed to investigate new spatialization propagation or other perceptual studies we're actually working with microsoft project acoustics to build a deep integration with metasounds where we can actually get the acoustic simulation data directly into sounds and offer a really full uh systemic integration in a way unlike any game engine that i'm aware of has done with proceed procedural audio with microsoft project acoustics and i think in particularly the sample accurate triggering of audio events is a new framework for investigating an interesting domain for perception so i think uh timing there's been a lot of studies that talk about latency and sort of perception on timing relative to avsync but i think there's a lot of interesting stuff between like sort of audio audio sync without what i'm dubbing a async but basically event uh synchronization across different audio uh events i think with metasounds and sample like triggering there's some interesting research in that area hello my name is kate raynor she her i'm a studio technical director at the coalition one of microsoft's xbox game studios i get to work with these game studios which have some of the greatest game developers and game franchises in our industry today today i'm going to talk to you about content generation at scale i'm going to talk about how we've seen exponential growth in content through different gaming generations a little bit about gaming virtual worlds and the metaverse some example content challenges that we face and i'll also talk a little bit about player engagement challenges that are open for research what we've seen through the generations is the size of development teams grow at an exponential rate in this graph here you can see in through the different generations of console hardware uh through the early 80s to the 20s to 2016 on the y-axis you see iterations of franchises version one two three and the size of the spheres the number of developers the challenge we see coming from generation seven the xbox 360 generation to generation eight xbox one we see a typical game team size growing in three to four times in size typically exceeding 200 developers it's common for some of the bigger more successful aaa franchises to have game teams of 400 to 600 developers or more as we're moving into generation 9 we see this trend continue and with more limitations removed on the detail and fidelity of the content were able to showcase in real time however the cost of creating this content is spiraling out of control aaa game teams have moved to employ armies of outsourcers and globally distributed development in order to reach scales of up to a thousand plus for some of the most content heavy open world games it's as if the size of game chains have been growing at the rate of media storage and the actual content size of the games themselves as we see in this graph over the generations we've seen game sizes go from 4 megabytes in the cartridge days all the way up to 25 50 and 100 gigabytes being fairly common for a late generation 8 or early generation 9 title one thing that's also interesting in this graph is we see a real particular jump in the rate of io with the latest generation allowing us to fill memory faster and stream even larger worlds into memory as we move around them so who are employing these armies of content creators well the reality is few game publishers are willing to grow to teams this size except for perhaps their top three franchises the costs of doing this are prohibitive and you really need to be a blockbuster success otherwise you are probably losing money indie game developers with smaller teams they can afford to do this but not all of them want to make 2d games or highly stylized simplified graphics user generation content is a good way of trying to generate more content from the playerbase but it's not a fit for all games and not at the quality that a professional game team can create what we're desperately in need for are tools that enable us to scale content production without scaling the human effort and the costs involved research machine learning and ai can be employed to create powerful tools to accelerate and scale content creation without the human effort but while still retaining the ability for us to provide art direction to create the stylized worlds that are often common for video games microsoft flight simulator is an astounding example of content generation of scale using machine learning cloud computing and photogrammetry it's amazing how this data is seeded from real-world satellite imagery and weather data along with contextual and ml and ai models to create a digital mirror of the world the statistics in terms of the content generated at the level of quality achieved is astounding and simply unrealistic to create with human effort especially when you consider that it is regenerated in 72 hours to reflect real world differences now if you think about this how would we apply research to ground this and get the macro details as you walk around this world what about all the interiors and the buildings and the prop lace placements how would you make it a living world well when we think of game virtual worlds often it's not pure simulation it's our directed worlds where we need artistic directed style transfer and really power tools to enable content creators to create their vision we could create worlds that are created from game content to then see new game environments that retain some of the unique gameplay properties and qualities necessary for that game really we're talking about some of the tools that are going to be essential for creating the metaverse we're creating virtual worlds with a scope and details that evolve that are simply unrealistic to make with human effort alone so let's talk about what some of these game content challenges are we've already talked about virtual worlds when we looked at flight and the direction it could go game developers are often more often creating large open worlds that are highly stylized some of the techniques that we would want to empower to enable art directed creation are listed here things like train generation prop placement foliage lighting object modeling texturing material generation and the ever-growing challenge of then optimizing that content so it's sufficient to display uh while still being believable and realistic to the viewer as they play their game localization is a unique challenge that all games face really one of the biggest issues here is that even today we are still manually sending our text and audio data to localization houses where humans as mechanical turks if you will will come back and localize the game with new text and new audio often we use cultural and local audio talent to create authentic voices the cost of iterating on this is high and we do it dozens of times throughout production as a result not all games are accessible to everyone in their locale the challenge here is how can we evolve localization text to speech generation to support more languages but more importantly how can we do in a way that is perceived as native not as clearly synthetic or robotic voice how can we match the cultural expectations beyond just a literal translation from one to the other can we retroactively translate again without actually modifying it an example extended into real time is in multiplayer games where we have players talking to each other dynamically converting so they hear the players they're playing with in their own locale and the others are hearing it in the language that they understand character animation another high cost area of content creation makes special especially for aaa games where we're trying to create believable character motion game teams today employ up to 50 animators sometimes over 100 animators classically trained animators to be able to create the quality and scope of animation required game teams because of this limit the amount of variation and diversity in their animation in their games due to production costs stylistic variation like different acting and different mood contexts diversity like having female characters that actually move like female characters and not as using a lot reusing a large male animation set different ages different heights all of these play into the costs involved in creating the animation when you're dealing with non-human animations it gets even more complex motion capturing quadrupeds and animals is incredibly challenging and often we need art directed animation if we're creating something that you can't put in a motion capture suit this video shows some examples of male and female walk cycles and how they change when they're in this case holding a rifle and how the weight of that changes their motion and the way that they're moved what we end up with is a commentational explosion in this simple example where you just have different different walk run sprint cycles different equipment you might be carrying different stances different genders different energies and different moods the combinational explosion of the number of animations that need to be get created quickly adds up and becomes cost per hit or prohibited to create motion capture but more often to manually create especially when you're dealing with stylistic animation the challenge here is how do we cost effectively generate complete motion sets with new styles and new better new variations you can extend this to non-human characters non-human animation production would be a significant win it's difficult to create this animation and very expensive could we create tooling and processes to enable animators to retarget repurpose and adjust animation onto creatures could you capture just a subset of animation and then extrapolate it to a larger animation set once you've understood and trained against those style differences in this slide you can see from uh gears of war uh some examples of some creatures that are quadrupeds and their different animation sets all going through their walk cycles and you can imagine these are not things that are easily created through um without human involvement and human animators so here are a few other bonus target bonus areas for player engagement challenges in the gaming space where research can apply and solve some very difficult problems i'll talk briefly automation testing the scale of bug detection and bug prediction and testing these large games requires large amounts of testers large amounts of time and effort and often affects the quality of the game if we're not able to capture and address these issues before their a game is released into the wild detecting game breaking design and content changes and predicting issues can really help accelerate and reduce that effort in game development it's really important for us to understand our players their engagement their preferences their sentiment understanding game economies and tuning them skills assessment and prediction all of this can be driven through data informed prediction machine learning and research cheating is a big problem especially in online games being able to detect anomalies understand when someone is breaking the game by undermining the fairness of the economy it can affect the sentiment and the longevity of your game so i've talked a little bit about how we've seen exponential growth in content with game development and how that trend continues as we scale content to create ever more detailed and immersive worlds where the challenges and the opportunity is really applying modern research modern machine learning techniques to this space ultimately what we really need are tools that allow us to scale content production without scaling the human effort and cost to create this content we've seen both simulated digital mirrors which are critical for creating the metaverse and creating things like microsoft flight simulator but we also need art directed tools that allow artists and content creators to feel empowered to have their vision expanded and multiplied through the power of modern machine learning so this concludes my talk and thank you very much you
2022-02-03 15:16