NETINT Technologies about scalable distribution in the age of DRM: Key Challenges and Implications

NETINT Technologies about scalable distribution in the age of DRM: Key Challenges and Implications

Show Video

Welcome to Voices of Video. I'm Jan Ozer. This  is where we explore critical streaming-related   topics with experts in the field. If you're  watching and have questions, please post   them as a comment on whichever platform you're  watching, and we'll answer live if time permits.  Today's episode is all about distribution at  scale. Alex Zambelli is technical product manager   for video platforms at Warner Brothers Discovery  is our guest. I've known Alex at least 15 years,  

going back to his history with  Microsoft, and we'll start there,   where he was a codec evangelist and a producer  of events like Olympics and NFL Football. We'll   hear about some of his experiences there. Then we'll walk through the various points in   his career where he got to Warner Brothers. These  are a lot of stops that are worth chatting about.  Then I'm known, I think, as a codec theorist. I do  a lot of testing and I render conclusions. That's  

useful in a lot of ways, or at least I hope  it is, but it's not real world. Alex just has   a ton of real-world experience that he's going to  share with us today, things as high level as where   the industry needs to go to to make it simpler  for publishers like Warner Brothers to focus on   content as opposed to compatibility, and issues  as deep diving as what's his percentage of VBR?   Is it 200% constrained VBR, 300% constrained BR? Particular to what I'm interested in, when does   a company like Warner Brothers look at adopting  a new codec? I think Alex is going to talk about   the decision that they're in the process of  making, which is whether to integrate AV1.  So Alex just has a ton of real-world experience  in live event production at huge scales,   as well as premium content encoding and  delivery with some of the biggest names   in the industry. So I'm way excited to have Alex  joining us today. Alex, thanks for being here.  Jan, thank you so much for  having me. Real pleasure. I'm   looking forward to the next hour talking to you. Yeah. We don't get a chance to do this that often.   Let's dive in. I'm not intimately familiar with  your CV. Did you start in streaming at Microsoft,  

or was there a stop before that? I did start my career at Microsoft.   So that was my very first job out of college,  actually. So this was back in 2002. I started   out as a software tester. So I started as a  software test engineer in Windows Media Player.   I worked on both Windows Media Player  and then the codec team at Microsoft   as a software tester for about five years. And so, it was during that second phase of my   software testing role there working on the codecs  where I started working with the VC-1 codec,   which at the time was a new codec for Microsoft  in the sense that it was the first codec that   Microsoft had standardized. So there was a  codec called Windows Media Video 9, WMV 9,  

and Microsoft took that through SMPTE to basically  get it standardized. And so, that became VC-1.  Some folks may recall that that was  basically one of the required codecs   for both HD DVD and Blu-ray at the time.  And so, that's what put it on the map.  And so, during that time where  I was testing the VC-1 encoder,   I started interacting a lot with Microsoft's  external customers and partners. And so,   that then transitioned me into my next job  at Microsoft, which was technical evangelism.  So I ended up doing technical evangelism for  VC-1 for a few years. Then my scope brought in   to include really all Microsoft media technologies  that were at the time available and could be used   for building large online streaming solutions. And so, when I started at Microsoft working  

in digital media, I mean in 2002, it was still  mostly dominated by physical media. So we're still   talking about CDs, DVDs, Blu-rays. By the time I  transitioned into this technical evangelism job,   which was around 2007 or so, streaming was  really starting to pick up steam. And so,  

from that point on, really to this day,  my career has been focused on streaming,   really, because that has become the dominant  method of distribution for digital media.  And so, I mentioned that starting around 2007  or so, I started doing technical evangelism   for a whole bunch of different Microsoft media  technologies. So at the time, Silverlight was   a technology Microsoft was developing. That  was a competitor to Flash. And so, it was   seen as a solution for building rich webpages,  because everything was still primarily online   through websites and browsers at the time. Mobile  applications haven't even started picking up yet. 

And so, really the primary way of delivering  stream media at the time was through the browser,   and this is where Silverlight came in. It was  a plugin that allowed both rich web experiences   to be built, but also really great premium media  experiences as well. And so, that included even   things like digital rights management, so using  PlayReady DRM to protect the content and so on. 

How did that transition to actual production  in your work at the Olympics and with the NFL?  Yeah. So at the time, Microsoft was partnering  with NBC Sports on several projects. The first   one that I was involved with was the 2008  Olympics in Beijing. And so, NBC Sports had   the broadcast rights to the Olympics, still  does, and they wanted to basically put all   of the Olympics content online for essentially  any NBC Sports subscriber to be able to access. 

That was, I think, a first, where that  was really the first attempt to put all   of Olympics streaming online. So up until  that point, if you wanted to watch an event,   you had to wait for it to be broadcast on  either your local NBC station or one of   the cable channels. And so, if it wasn't  broadcast in live linear, you can never   see it. It wasn't available. And so, NBC Sports  had the idea to put all of that content online.  So the very first version of the NBC Olympics  site that we built in 2008 was still using Windows   Media for livestreaming, but was starting to  use Silverlight in what at the time was actually   the very first prototype implementation of  adaptive streaming at Microsoft to do on demand.  Then the next project we did with NBC Sports in  2009 was supporting Sunday Night Football. For   that, we built a fully adaptive streaming-based  website. So that was the origins of Microsoft's  

Smooth Streaming technology. So Microsoft  had taken that prototype that was built   during the 2008 Olympics and essentially  productized that into Smooth Streaming.  So we had both live streams in HD, which was,  again breakthrough at the time to be able to   do HD at scale. Now just we take it for granted.  But in 2009, that was really seen as a big deal. 

Then 2010 Vancouver Olympics, that's when,  really, we went full-on Smooth Streaming.   Everything was basically available on  demand and live in Smooth Streaming.  So, yeah, those are some really, I would say,  groundbreaking events that we did. We ended up   being nominated for a few sports Emmys, technical  Emmys at the time. I don't remember which years we  

won or didn't win, but, yeah, to get recognized  by the industry is also pushing the envelope.  I'm remembering ... And I don't want to mix you  up with another technology, but I'm remembering   either Monday or Sunday Night Football with  a player that had four different views that   you could page through. Was that you guys? Yup. That was us. Yeah, that was us. Yup,   that was Sunday Night Football. So, yeah, we had  basically ... You could watch multiple camera  

angles simultaneously. One of the cool things  about that is that we used Smooth Streaming   to do that where it was actually a single  manifest that had all four camera angles in   the same manifest. And so, switching between  the camera angles was completely seamless   because it was similar to switching bitrates  the way you do in DASH or HLS today.   So it was a very cool solution that  ... Actually, I don't think we've even   rebuilt it since then. It was a feature that  we developed in 2009 and then lost to history. 

Did you actually go to the Olympics or were  you working back in the plumbing in Redmond?  We were on the backend side of it. So I did  get a chance to go to one Olympic event at   the Vancouver Olympics since they were close to  Seattle where I live. But other than that, yeah,   we spent most of those projects in windowless  rooms in data centers, mostly in Redmond,   sometime in Las Vegas, because we were working  closely with iStreamPlanet at the time as well,   who were based out of Las Vegas. Spent a  lot of time in New York as well at 30 Rock,   because NBC Sports was still at the 30 Rock  location at the time. So, yeah, it was a fun time.  What were the big takeaways? If you met  somebody on a plane and they ask, "Gosh,   I'm doing a livestreaming event that's huge.  What did you learn from the Olympics? What are  

the high-level things that you took away from that  that you've implemented throughout your career?"  One of the, perhaps, obvious takeaways was that  livestreaming is hard in that it's not on demand.   Everything you know about on-demand streaming,  you have to throw that out the window when   you start working on livestreaming because  you're dealing with very different issues,   you're dealing with real-time issues. And so, even  something as simple as packets getting lost on the   way from your origin encoder to your distribution  encoder, and dealing with packet loss and then   dealing with segment loss on the publishing  side and figuring out how do you handle that   and handling blackouts and ad insertions. And so, everything's under a lot more pressure,   because if you are doing on-demand streaming  and if there's something wrong with the content,   if there's something wrong with the  origin or any part of your delivery chain,   you have a little bit of leeway in that you've  got time to address it, and hopefully you'll   address it very quickly. But if the content goes  down for a few hours, it's fine. People will come   back later. Whereas with live, you don't have  that luxury. You really have to be on top of it.  And so, my memory of it is that every time we  were doing these events, it was all hands on   deck. I mean we had everyone from Microsoft  to NBC, to Akamai, to iStreamPlanet. All the  

different companies are involved in these  projects. We would just have everyone on   calls ready to go fix whatever needed to be fixed  in real time because that was the nature of it.  So that was a big learning lesson there was that  live is not on demand. You have to really give it   a lot more focus, give it a lot more attention  than you would necessarily to on demand. 

Does live ever get easy? I mean at even  events like what we're doing today,   it seems like there's always something that  breaks or there's always the potential for   it. You never feel comfortable with it. I think that's a great way to describe it.   It's just you're never comfortable because, yeah,  something could go wrong, and then you can't just   say, "Well, we'll fix it sometime in the next  24 hours." You have to fix it right now. And so,   it's like, yeah, if our Zoom link went  down right now, we'd be in trouble, right?  No backup for that. So you jumped from the  frying pan into the fire. I think your next   stop was iStreamPlanet, where you're doing live  events all the time. So tell us about that. 

At the very end of 2012, I left Microsoft and I  joined iStreamPlanet. iStreamPlanet, for those   not familiar with the company, so that was a  startup out of Las Vegas, started by Mio Babic.   They built a reputation for themselves as  being a premium live event streaming provider.  At the time, they wanted to get into  live linear and they wanted to also   start building their own technology. And  so, 2012 was when Mio started a software   engineering team in Redmond. And so, the next  year, I joined that software engineering team.  What I worked on was the very first live encoder  that was built in-house that I supplanted.  

And so, one of the ideas at the time was  to build it all on commodity hardware. So,   again, something that we now take for  granted because now we're accustomed to   things running in the cloud. And so, we  assumed that, yeah, of course you can go   spin up a live encoder in the cloud and it's  running on just commodity hardware that's there.  But 2012, 2013, that was not the case. It was  mostly hardware-based encoders that you have to   actually put in a data center and maintain.  And so, the idea that Mio had was like let's  

run it on commodity hardware. Let's  build a cloud-based live encoder.  And so, I worked in that product for about  four, four and a half years. 2015, if my memory   serves me correctly, I think it was 2015 or 2016,  iStreamPlanet got acquired by Turner, and Turner   was part of WarnerMedia. And so, iStreamPlanet  became the subsidiary of WarnerMedia. And so, that   was a pretty nice ending to that story as well. Real briefly if you can, I'm trying to ... So  

we had Silverlight here and then we had Flash  here, and somehow we ended up with both of those   going away. I guess it was the whole HTML5  thing, and that brought HLS and ... Smooth   is in there. But when did you transition  from VC-1 to 264 and how did that work?  When Silverlight launched, originally the  only video codec it supported was VC-1,   and then I think it was third or  fourth version of Silverlight-  That's right, yeah. ... where H.264 support was added.   I think Flash added it around the same time. I  think it was literally one month after another.   So the challenge with basically building any  streaming solution in HTML around that time,   so, again, going back to 2007, 2008 timeframe,  the challenge was that HTML was just not ready.  

There was basically no APIs in HTML that  would allow you to do streaming with the   level of control that that was needed. And so, there were some workarounds where,   for example, Apple went and ... When they  came out with HLS as their streaming protocol,   they baked it into the Safari browser. And so,  if you use the video tag in HTML in Safari,  

you could basically just point it at an  M3U8 playlist and it would just work.  But that was an exception rather than the rule. I  mean most other browser implementations, whether   it was Chrome or Firefox, or Internet Explorer at  the time, did not do that. And so, there was this  

challenge of, well, how do you stream? And so, what basically Flash and Silverlight,   I think, brought to the table at that time  was an opportunity to really leapfrog HTML   to basically just advance it, even if it was a  proprietary plugin, but advance the technology   to a point where it was usable. And so, one of the innovations   that Silverlight brought was the concept of a  media stream source, which today now exists in   HTML. So when you go build a solution in HTML  today that's a streaming solution, you're using   the media source extensions and the encrypt  media extensions portions of the HTML spec.  At the time, that was not yet an HTML5.  So Silverlight had that approach of, well,   we're not going to bake in any particular  stream protocol into the plugin. We're going   to basically open up an API that allows you to  go handle your own downloading of segments and   parsing of segments, and then you essentially  just pass those video and audio streams into   a media buffer and then the plugin goes and  decodes and renders that and handles the rest. 

Then another crucial part, I think, of what  Silverlight brought to the table was DRM, because   that was something that, again, HTML just didn't  have a good solution for content protection.   The reality of the industry that we work in  is that if you want to provide premium content   to audiences, you have to protect it.  Generally, content owners, studios will not   let you go stream their content just in the clear. And so, it was a big deal that Silverlight could  

both enable streaming but also enable content  protection of the content. Then Flash ended up   doing the same with Flash DRM, Adobe DRM as well. And so, around I think it was 2012, 2011,   if I remember, where both Silverlight and  Flash went away and were replaced by HTML.   It was because by that point, HTML had  matured enough where that was feasible.  There were still some growing pains there.  I remember there was a period where it was   like we were neither here nor there. But by, I  would say, 2014, 2015, HTML5 had all the needed  

APIs to enable basic stuff like implementing DASH  and HLS and Smooth Streaming in the browser and   protecting it with DRM. So that's where we are  today and, yeah, it took a while to get there.  Real quickly, what do you do at WarnerMedia?  So I'm hearing when ... Were you a programmer   or were you a live video producer? You started  testing, which is ... So what's your skillset?  So I mentioned that earlier when I started  my career, I started in engineering,   and then transitioned to technical evangelism.  By the time that I moved over to iStreamPlanet,   so my job at that point became product management.  And so, I've been a product manager since then,   so for the past 10 years. So after iStream, I went to Hulu,  

and I was a product manager for the  video platform of Hulu for five years.   Then my most recent job, so for the past two  years, I've been at Warner Brothers Discovery,   also product managing the  video platform here as well.  So what my responsibilities are as a product  manager is I focus on the video platform itself.   Specifically today, I focus on mostly transcoding,  packaging. So for the most recent launch of Max,   which is the new service that combines  Discovery+ and HBO Max, that just launched   last week. So I was the product manager for the  VOD transcoding and packaging platform there.  And so, that involved essentially defining the  requirements of what are the different codecs   and formats we need to support, what the workflows  should look like, how do we get content in from   the media supply chain, what are all the different  permutations or formats we need to produce, what   kind of signaling needs to be in the manifest  so the players would be able to distinguish   between HDR and SDR. So all those types of  technical details, those are part of my job. 

Let's do a speed round of some technical encoding  issues that ... Though your answers are ... You're   a pyramid expert. Where are you on encoding  cost versus quality? That would translate   to are you using the placebo or the very  slow preset? I don't know if you use x.264,   but do you use that to get the best possible  quality for bitrate irrespective of encoding cost,   or do you do something in the middle? I'm  sure you're not in the ultra-fast category.  

But real quick, where are you in that analysis? So, yeah, we currently do use x264 and x265 for a   lot of transcoding at Warner Brothers Discovery.  So we typically use either the slow or slower   presets for those encoders. Though one of the  things we have been discussing recently is that we   perhaps shouldn't necessarily use the same preset  across all bitrates or even across all content.  And so, that's an idea that we've been exploring  where if you look at your typical encoding ladder,   you've got, let's say, 1080p or 2160p at the top.  But at the bottom of your ladder, you'll have   320 by 180. 360, yeah. 

You might have a 640 by 360. And so, then the  question becomes, well, why use the same preset   for both those resolutions? Because x264,  very slow, is going to take a lot less time   on your 640 by 360 resolution than on your  1080p resolution. And so, that's one of the   ideas that we've been looking at is like, okay,  we should probably apply different presets for   different resolutions, different complexities. Then not all content is necessarily the same   in the sense that it's not equally complex. So  perhaps not everything requires the very slow  

preset. Then not all content is equally  popular. If there's a particular piece   of content that's watched by 20 million  viewers versus something that's watched by   10,000 viewers, the one that's watched by 20  million probably should get the more complex   preset, the slower preset, because whatever extra  compute you spend on that is going to be worth it,   because it'll hopefully translate to  some CDN savings on the other side.   So, yeah, so hopefully that answers your question. You talked about X.265, that's HEVC. When did you   add that and why, or were you even there?  Did Warner add it before you got there?  Yeah. So HBO Max had already been using HEVC.  So we obviously continued using it for Max as   well. On the Discovery+ side, we had been  using HEVC for some 4K content, but there  

wasn't a lot of it. And so, it was really mostly  all H.264 on the Discovery+ side. But with Max,   we are using obviously H.264 still and we are  using HEVC as well for both SDR and HDR content.  Okay. And so, right now, for example,   if you go play something on Max, on most devices,  it's actually going to playback in HEVC. So   even if it's SDR, it will be 10-bit HEVC. Then  obviously if it's HDR, it'll definitely be HEVC.  How many encoding ladders do you  have for a typical piece of content?  So the way we define ... And when  you say how many encoding ladders,  

you mean different variations of encoding  ladders, or do you mean steps within the ladder?  Different variations of encoding ladders. Literally looking at the spreadsheet right now,   and I think it's about six or eight  different variations right now.   And so, what we've tried to do is  build an encoding ladder where,   depending on the source resolution, we don't have  to necessarily have different permutations of the   ladders. And so, we have a UHD ladder where,  depending on what the source resolution is,   that determines where you stop in that ladder,  but doesn't change the ladder necessarily itself. 

Where the permutations come in is things  like frame rates. So if the source is   25p or 30p or 24p, that's going to go and use a  different ladder than if the source is 50p or 60p,   because that is one of the things we've done for  Max that wasn't supported before, for example,   is high frame rates. So previously everything  was capped at 30 FPS. Most of that was due   to the fact that there wasn't really  a lot of source content on HBO Max,   for example, that required more than 30 FPS. But now that the content libraries of Discovery+   and HBO Max are combined, there's a lot more  reality TV on the Discovery+ side. A lot of  

that is shot at 50 FPS if it's abroad or 60 FPS  if it's US. And so, we wanted to preserve that   temporal resolution as much as possible. And so,  we've started to support high frame rates as well.  And so, we have different encoding ladders  for different frame rates. Then, of course,   there's different encoding ladders  for SDR versus HDR. Even within HDR,   we have different encoding ladders for  HDR10 versus Dolby Vision 5, for example. 

What about for different devices?  So if I'm watching on my smart   TV and then I transition to my smartphone,  am I seeing the same ladder, or do you   have different ladders for different devices? At this moment, they're the same ladders for all   the devices. We might deliver different subsets  of the ladder for certain devices, but that's   typically capping on the high end of the ladder.  So if, for example, some device cannot handle   60 FPS or if it cannot handle resolutions above  1080p, for example, then we might intentionally   cap the manifest itself that we're delivering  to that device. But in terms of different   bitrates and different encodings, we're not  differentiating it yet between different devices.  So I'll give you my personal take on that  question, which is that in most cases it's not   really necessary, in my opinion, to have different  encoding ladders for different devices, because   your 1080p should look great no matter whether  you're watching it on an iPhone or Apple TV.   And so, having two different 1080p  encodes doesn't necessarily make sense. 

I've definitely heard people say, well,  perhaps on the lower end of the bitrate ladder,   where you have your lower bitrates, lower  resolutions, that's where you need to have   differentiation. But, again, in my opinion,  there's no harm in delivering 100, 200 kilobit   per second bitrates in a manifest to a smart TV  because most likely it's never going to play it.  And so, you can put it in the manifest. You can  deliver it to the TV or to the streaming stick.   In a vast majority of cases, it's never  even going to touch that variant. It's   just going to skip right over it,  go straight for the HD and the UHD.   The only times you might ever see that low  bitrate is if something catastrophic happens   to your network and really the player struggle  so badly, it needs to drop down to that level. 

What's your VBR maximum rate on a percentage  basis? So when we started out, it was CBR.   So your max was 100% of your target. Where are  you now with your VBR for your premium content?  So we've taken an approach with x264 and x265  of relying primarily on the CRF rate control,   but it's a CRF rate control that uses a bitrate  and a buffer cap. So when you're writing your   command line in FFmpeg, you can set the CRF  target, but you can also specify a VBV buffer   size and a VBV max rate. Right. 

And so, we are doing that. The reason behind that  is we want to make sure that we're controlling   essentially the codec level at each resolution and  each bitrate and that the peak's also constrained   that way. I can give you an example where if it's  something like, let's say, HEVC and it's 1080p,   you might want to stay at codec level 4 rather  than codec level 4.1, because 4.1 might ...  Or that one actually maybe is not as big  of a deal. But, for example, what if you're   choosing between level 5 and level 5.1, there  are certain devices that might not support 5.1,  

for example. And so, in order to stay under  codec level 5 for HEVC, you have to stay under   a certain buffer size. And so, that's what ends  up driving a lot of the actual caps that we set.  Circling back, I mean CRF gives  you a measure of per-title   encoding as well. So is that intentional? Yeah. That's part of it, yeah, is that with CRF,   really when you specify your VBV max rate, you're  just specifying your highest average bitrate,   really, for the video. And so, as long as you're  comfortable with that max rate, then you can also  

count on CRF probably bringing your average  bitrate below that max rate most of the time.  And so, if we set, for example, 10,000  kilobits per second as the max rate,   most of the time the CRF target is really going  to bring in that average bitrate much lower,   around five or six megabits. And so, that is  a way of getting per-title encoding in a way   and achieving CDN savings without sacrificing  quality, because depending on the complexity of   your content, it's either going to be way below  your max rate or it's going to hit against the   max rate. Then at least you're capping the highest  possible bitrate that you'll have for that video.  That's a pretty creative way to do it.  What's the impact of DRM on encoding ladder,  

if anything? So I know there's a difference  between hardware and software DRM and   there are some limitations on content you can  distribute with software-based DRM. So can you   encapsulate ... We're a bit running short of time,  but can you encapsulate that in a minute or two?  The way most of the content licensing agreements  are structured, typically under the content   security chapter, there's requirements around  what kind of, essentially, security levels are   required to playback certain resolutions, and then  often what kind of output protection is required.  And so, typically what you'll see  is that something like Widevine L1,   which is a hardware-based security level  of Widevine, or hardware-based protection.  

Then on the PlayReady side, something like SL3000,  which is also the hardware-based implementation   of Play Ready. Those will be required for 1090p  and above, for example. So a lot of the content   licensing agreements will say unless you have  hardware-backed DRM on the playback client,   you cannot play anything from 1080p and above. Then they'll be typically ... And they'll have   similar requirements around each level. So they'll  group the resolutions, typically an SD, HD, full  

HD, UHD, and each one of those will have different  DRM requirements in terms of security levels. Also   requirements around HDCP, whether that needs to  be enforced or not, whether it's HDCP 1, HDCP 2.  And so, what that essentially means in practice  then is that when you're doing your ABR ladder,   you have to define those security groups based  on resolution and you have to assign different   content keys to those groups. And so, your video  streams up to, let's say, 720p might get encoded   with one encryption key, and then between 720p  and 1080p gets a different encryption key. Then   everything above 1080p gets another encryption  key, and audio gets a different encryption key. 

Wow. And so,   by doing that will you essentially accomplish  that at playback time when the licenses are   being requested by the players for each of those  bitrates. Because they're using different keys,   you can now associate different playback policies  with each key. And so, you can say, well, this   SD content key, for example, has a policy that  doesn't require HDCP to be enforced and doesn't   require hardware level of protection, whereas the  HD group or the UHD group might require those.  So that's really something that we do today  in response to the way the content licensing   agreements are structured. And so, in the  future, that might change. My impression is  

that we're actually moving in a direction  of more DRM rather than less DRM. So even   as recently as three, four years ago, some  studios, some continuities were still allowing   certain resolutions to be delivered  in the clear, like SD, for example.   A lot of that's going away where now essentially  it's like, look, if you're going to do DRM,   you might as well do DRM across the board, because  it actually makes it less complicated that way.  One of the things I've also noticed is that when  it comes to HDR, for example, it's the strictest   requirements for all of HDR. And so, even with  HDR, you have an encoding ladder that ranges from   UHD all the way down to 360p or something, and  the requirements and the agreements are, well,   you must use hardware-based DRM and you must  use HDCP 2.3 for the whole HDR ladder. And so,  

it seems that that's the trend of the  industry is that we're actually moving   just towards using DRM for everything. What's the difference between hardware   and software? Hardware, is that a  browser versus mobile device thing?   Where is software DRM and where is hardware? So the difference is in the implementation of   the DRM client itself. And so, if you basically  want to get the highest security certificate from   either Google or Microsoft or their DRM systems,  you essentially have to bake in their DRM clients   into the secure video path of the system. So  that typically means they tight coupling with   the hardware decoder as well, so that essentially  when you send a video stream to the decoder,   once it goes past the decoder,  there's no getting those bits back. 

So essentially once you send it to the decoder,  at that point it's secured decoding and secured   decryption. Well, first, I guess, secure  decryption then secure decoding. Then it   goes straight to the renderer. And so, there's  no API call that you can make as an application   that says now that you've decrypted and  decoded these bits, hand them back to me.  And so, that's typically called a secure  video path or secure media path. And so,   that's what you get with a hardware-based  DRM. Software-based DRM does either some   or all of those aspects of decoding and  decryption in software and, therefore,   there's a risk that somebody could essentially  hack that path at some point and get those decoded   bits back and be able to steal the content. So if I'm watching 265 on a browser without  

hardware support, I'm likely to be limited in the  resolution I can view if it's premium content,   because the publisher says I don't want  anything larger than 360p going to software.  Exactly, yeah. Today, for example, if you're  using Chrome, for example, so Widevine DRM   is available in Chrome, but only L3, which is the  software-based implementation of Widevine. And so,   oftentimes if you're using Chrome, you  actually get worse video quality with   some of the premium streaming services than  if you're using Edge or Safari, for example,   because both Safari on Mac and Edge on Windows do  support hardware DRM, because they're just more   tightly integrated with the operating system.  And so, they're able to essentially achieve  

that secure video path between the browser  and the operating system and the output.  So let's jump to the packaging, because you ...  Are you in the HLS, DASH, or CMAF camp these days?  Both. So at both Warner Brothers Discovery and  then my previous job at Hulu, we've been using   both HLS and DASH, and, interestingly enough,  actually even distributing it ... The split   between those two is almost identical.  So we use HLS for Apple devices and we   use DASH for streaming to all other devices. What's common to them is the CMAF format. And so,  

one of the things that I get a little bit  annoyed about in our industry is when people   refer to CMAF as a streaming protocol, and I  always feel like I need to correct it and say,   "No, no, it's not a streaming protocol," because  CMAF is really two things. CMAF is, on one hand,   a standardized version of what we frequently call  fragmented MP4, the ISO-based media file formats.  What the CMAF spec did is basically just defined,  look, if you're going to use fMP4 in HLS and DASH,   here's the boxes you need to have and here's  how common encryption gets applied to that and   so on. And so, it's really just a more buttoned  down version of what we have always called fMP4.  And so, in many cases, if you have been  packaging either DASH or HLS in fMP4   media segments, you're most likely already  CMAF-compliant. You're already using CMAF.  But the other thing that CMAF is, the  CMAF spec also defines a hypothetically   logical media presentation model. And so,  it essentially describes what really when   you read through the lines will sound a  lot like HLS or DASH without HLS or DASH.  

It's really defining here's the relationship  between tracks and segments and fragments and   chunks and here's how you address all those  different levels of the media presentation.  And so, you can then think of HLS and DASH  really being the physical manifestations of   that hypothetical presentation model. There's a  really great spec that CTA authored, so I think   it's CTA, I think, 5005. That is the HLS-DASH  interoperability spec, and it's heavily based on   CMAF and using CMAF as really the unifying model,  and then really describing how both the HLS and   DASH plug into CMAF and how you can describe the  same concepts in both. And so, it's almost like  

HLS and DASH are just programming languages  that are describing the same pseudo code.  I want to come back to some other topics, but  one of the topics important to you is is the CTA   part of the organization that's going to make  it simpler for publishers to publish content   and just focus on the content development and  not the compatibility? Because it seems like   that's a pretty compelling issue for you. I hope that CTA will make some efforts in   that space. I think a lot of what they've been  doing is trying to improve the interoperability   in the streaming industry. And so, I think it does  feel like CTA WAVE is the right arena for that.  One of the issues that I think today makes  deploying streaming solutions really complex   and challenging is that we have a lot of different  application development platforms. Just before  

this call, I went and counted the number of app  platforms that we have at WBD that we just fill   out for Max, and it's basically about a dozen or  16 different application development platforms.  Now there's overlap between some of them.  So Android TV and Fire TV are more or less   the same thing with slight differences. But at  the end of the day, you're looking at probably,   at the very least, half a dozen different app  development platforms. Then worst-case scenario,   you're looking upwards of 20 or so app development  platforms, especially once you start considering   set-top boxes made in Europe or Asia that  might be like HbbTV-compatible and so on. 

And so, that's a lot of complexity because the  same app needs to be built over and over and   over again in different program languages, using  different platform APIs. I think, as an industry,   we're unique in that sense. I'm not actually  aware of any industry other than streaming   that needs to develop that many applications for  the same thing. If you're working in any other,   I think, industry, if you're working in fintech  or anything else, you typically have to develop   three applications, a web app, iOS app, and  an Android app, and you're done. And so,   it's crazy that in our industry, we have to  go build over a dozen different applications. 

But the practical challenges that  then brings when it comes to things   like encoding and packaging and so on is that  it's hard to know what the devices support,   because there is no spec, there is no standard  that essentially allows ... That specifies APIs,   for example, that every different device platform  could call and expect standardized answers.  So when we talk about media capabilities of a  device, what are we talking? We're talking about   we need to know what decoders are supported  for video, for audio, but also for images,   for text, time text. We need to know what  different segment formats are supported. Is   it CMAF? Is it TS? What brand of CMAF?  CMAF has this nice concept of brands,   but nobody's really using it. In  order for that concept to be useful,  

you need to be able to query a device and  say, well, what CMAF brands do you support?  Manifest formats. There's different versions  of HLS, there's different profiles of DASH,   there's different DRM systems. And so, these are  all things that we need to know if we want to play   something back on the device and play it well. So how do we standardize the playback side?  Probably one of the key steps I think we need  to take is I think we need to standardize device   media capabilities detection APIs. There has been  some efforts in W3C of defining those types of  

APIs in HTML, for example. But, again, not every  platform used as HTML. And so, when it comes to   Roku, when it comes to me Media Foundation and  other different media app development platforms,   we need essentially the same API,  really, to be present on every platform.  Then once we have APIs standardized  in a way they detect media support,   we need to also have a standardized method of  signaling those capabilities to the servers,   because if you want to, for example, target  specific devices based on their capabilities, the   next question becomes, well, how do you express  that? How do you signal that to the backend?   How do you take action on that? How do you do  things like manifest filtering based on that?  So I think there's a lot of space there for  standardization. There's a lot of room for   standardization. And so, yeah, I'm hoping that CTA  WAVE or one of the other industry organizations  

will take some steps in that direction. Final topic is going to be AV1 or new codec   adoption. You're in charge of choosing which  technologies you're going to support, when does   a technology like AV1 come on your radar screen  from a .. I mean you've heard of it since it was   announced, obviously, but when does it come  on your radar screen in terms of actually   supporting it in a Warner Brothers product? The first thing I typically will look at is   devicing option, because that's really, I think,  the most crucial requirement is that there has   to be enough devices out there that we can  actually deliver media to with a new codec   that makes it worthwhile, because there's going  to be cost involved in deploying a new codec.  First, cost comes from just R&D associated  with investigating a new codec, testing it,   measuring quality, then optimizing your  encoding settings and so on. And so,  

that's both time and then also either manual  or automation effort that needs to be done to   be able to just understand what is this  codec? Is it good? Do I want to use it?  Then if you suddenly decide you want to deploy  that codec, there's going to be compute costs   associated with that. There's going to  be storage costs associated with that.   Then in some cases there might be licensing costs  as well. If you're using a proprietary encoder,   maybe you're paying them, or if you're  using an open source encoder, well,   you still might owe some royalties on just usage. You're pretty familiar with that. I read one of  

your recent blog posts. So I know that  you've spent a lot of time looking   at royalties and different business  models that different codecs now have.  So in order to justify those costs, in order  to make those costs actually worthwhile,   there needs to be enough devices out there  that can be reached by that new codec.   So the first, really, question is what percentage  of devices that are ... Active devices on a   service are capable of using that codec? Interesting ... This goes back to that   previous question that you asked, which  is about device capabilities and how do   we basically improve those things? So without  good, healthy data coming back from players,   coming back from these apps that tell us what's  supported on the platforms, it's hard to plan   what your next codec is that you want to deploy. Right now, for example, if I wanted to estimate  

the number of AV1 decoders out there, my best  resource would be to go study all the different   hardware specs of all the different devices out  there and figure out which ones support AV1,   for example, or VVC or LCEVC, and then  try to extrapolate from that data,   okay, what does that mean? How do we project  that onto our particular active device base?  So, yeah, it's not straightforward today, but  I'm hoping that if we can improve the device   capabilities detection and reporting, then we can  also get to a point where we can just run a simple   query and say, "Okay, tell me what percentage of  devices that the service has seen in the last week   supports AV1 decoding, and specifically maybe  AV1 decoding with DRM support or AV1 decoding   of HDR." And so, it's like ... There's even  nuances beyond just which codec is supported.  What kind of pressure do you get, if  any, from your bosses or your coworkers   about new codecs? Because we love to talk about  them, we read them all the time, but are people   pounding on you and saying, "Where's AV1 support?  Where's VVC? When's VVC?" or do they not care? Is   that not part of what they're thinking about? I would say there's not a lot of pressure from   leadership to support specific codecs. I  think they're more interested in probably   cost-savings and looking at things  like how do we lower CDM costs?  But one of the things that I usually  always explain to them is that it's   not a perfect one-to-one relationship between  deploying a new codec and CDN cost-savings,   for example. Even if you save, for example,  20% on your encoding bitrate, for example,  

with a new codec, that doesn't necessarily  translate into 20% of CDN cost-savings,   because, in some cases, if somebody's  on a three-megabit connection speed,   for example, somebody's on 4G and the most  they can get is three-megabits per second,   you being able to lower your bitrate from  10 to six megabits per second is not really   going to impact them. They're still going to  be pulling the same amount of data. And so,   that's why it's not a clear one-to-one mapping. But, yeah, I would say most of the demand for   new codecs comes from that aspect, from that  direction, rather than somebody saying, 'Well,   we have to support VVC because it's the latest,  greatest thing out there." Generally that's not   the case. If anything, I'm usually the one  that's pushing for that and saying, "Well,  

we really should be moving on from H.264 and  moving on to the next generation of codecs   because, at some point, you do have to leave  old codecs behind and slowly deprecate them   as you move on to the new technology. I mean do you have a sophisticated   financial analysis for doing this, or do you  do the numbers on an envelope kind of thing?  It's more an envelope kind of thing right now.  Yeah, it would be something that would be based   on, again, number of devices supported and then  comparing that to average bitrate savings, and   comparing that to compute costs and potentially  licensing costs associated with it. So, yeah,   it is a back of a paper napkin calculation at  this point, but I think the ... The factors   are well-known. It's really coming up with the  data that feeds into those different variables. 

A couple of questions. What about LCEVC?  Are you doing enough live, or is that even   a live versus VOD kind of decision? With LCEVC, I don't think it's even a   live versus VOD decision. I think with LCEVC,  I think what's interesting with that codec is   that it's an enhancement codec. It's a codec  that really piggybacks on top of other codecs  

and provides better resolution, better  dynamic range, for example, at bitrates   that would typically be associated with lower  resolutions, more narrow dynamic ranges.  And so, the way LCEVC works is that  there's a pre-processor part of it that   essentially extrapolates the detail that is  then lost when the video is scaled down. So   you can start with a 1080p video, scale it  down to, let's say, 540p, encode as 540p,   and then with the LCEVC decoder on the other  end, it can now take some of that sideband data   and attempt to reconstruct to fulfill the other  1080p source signal. And so, that concept works  

the same, whether the baseline codec that  you're using is H.264 or 265 or VVC or AV1.  And so, I think that's what's interesting about  that codec is that it can always let you be a step   ahead of whatever the latest generation of codecs  is providing. Then the other nice thing about it   is that there's a backwards compatibility option  there, because if a decoder doesn't recognize that   sideband data that is specific to LCEVC decoding,  it'll just decode your base signal, which might   be half resolution or quarter resolution. So I think in ABR, I think it can be very   applicable in ABR, because typically you have  a lot of different resolutions in your ladder.   So it's like if you could potentially deliver  that 360p resolution in your ladder at 720p,   for example, to an LCEVC decoder, then why not? Well, we've got a technical question here. Are you  

able to deliver one CMAF package using one DRM,  or do you have to have different packages for   Apple and the rest of the delivery platforms? Yeah, that's a great question. So right now what   we do is we encrypt every CMAF segment twice,  once with CBCS encryption mode and the other   one with CTR, CENC encryption mode. And so, the  CBCS encrypted segments, those are the ones that   we deliver to the HLS, to FairPlay devices. Then at the moment, the CTR segments are the   ones that we then package with DASH and are used  with both PlayReady and Widevine. That said, both   Widevine and PlayReady have introduced support  for CBCS a while ago. It's actually, I think,   been probably over five years at this point. And so, theoretically, we could deliver those CBCS  

encrypted segments to all three DRM systems and  it would work. The challenge at the moment is that   not all devices that are Widevine or PlayReady  clients have been updated to the latest version   of PlayReady or Widevine, because in a lot of  cases there are hardware implementations. And so,   without basically firmware updates from the  device manufacturer, they're never going to   be up to date with the latest DRM client. And so, we're waiting to see when those  

last CTR-only Widevine and PlayReady  clients are going to be deprecated,   slowly move out of the lifecycle. Once the vast  majority of the PlayReady and Widevine clients out   there are CBCS-compatible, then that opens up the  path to even CBCS improvement segments everywhere.  Final question, AV1 this year  or not? What do you think?  I think probably not this year, I would say. I  mean I think we might do some experimentation,   I think just some research into encoder  quality and optimization this year with   AV1. But I wouldn't expect deployment of AV1  this year, not because of lack of support,   because I think the support is really starting  to be there in significant numbers. I think   the latest either Samsung or LG TVs, for  example, now include AV1 decoders as well. 

Yeah, yeah. And so, that's always,   I think ... Often people will look at mobile as  being the indicator of codec adoption, especially   Apple. People will be like, "Okay. Well, if Apple  will adopt it in iOS, then clearly it's here."  But when it comes to premium streaming services,  so whether it's Max or Hulu or Amazon Prime or   Netflix, most of that content is watched in living  rooms. And so, really the devices to watch are  

smart TVs and connected streaming sticks. So once  those devices have support for a particular codec,   then, in my opinion, that's really the big  indicator that, yeah, it might be ready.  We're running over, but this is a question  I need the answer on. But what's the HDR  

picture for AV1 and how clear does  that have to be? Because it seems   like there's a bunch of TV sets out there  that we know play Dolby Vision and HDR10+   with HEVC. Do we have the same certainty that  an AV1-compatible TV set will play AV1 in HDR?  I don't think that certainty is there yet.  I do need to do some more research into that   particular topic because I've been curious about  the same thing. So I think some standardization  

efforts have been made. I can't remember off  the top of my head if it's CTA or some other-  No. HDR10+ is now a standard for AV1. I just  don't know if TVs out there will automatically   support it. Right, yeah.  Then if it automatically doesn't work for you,  you've got to make sure, you've got to test.  Yeah, yeah. Then with Dolby Vision, it's  like, well, until Dolby says so. Then it's   not a standard. So, yeah, I mean I think that's  an excellent question, is that there's nothing  

from a technical perspective that should be  stopping somebody from using AV1 or VVC or   any other new codec with HDR, because there's  nothing specific to the codec that HDR needs.  And so, it's really just a matter of  standardization, a matter of companies   implementing that standard. So, yeah, I'm with  you on this one in that it is one of those where,   yeah, it should work, but until it's been tested  and it's been tested on many different devices,   it's not a real thing, right? Listen, we are way out of time. Alex,  

I don't think we've ever done this for an hour,  but it's great. I really appreciate you spending   time with us, being so open and honest about  how you're producing your video, because I think   that helps everybody. Thanks. This has been great. Absolutely. Thank you so much for having me. Yeah,   this has been really great. I feel like we could  probably keep talking for another hour or two,   and I think we'd have still  plenty of topics to discuss.  Yeah. I was taking   some notes while we were doing this, and, yeah,  I think I have notes for another hour at least. 

Okay. We'll talk to Anita about that. I'll see  you at IBC? You're going to go to that show?  Yeah, I think I'll be at IBC, so  I most likely will see you there.  Cool. Take care, Alex. Thanks a lot. All right. Thanks so much.

2023-06-18 07:33

Show Video

Other news