Breaking Analysis: Uber’s architecture represents the future of data apps…meet its architects

Breaking Analysis: Uber’s architecture represents the future of data apps…meet its architects

Show Video

from the cube studios in Palo Alto in Boston bringing you data-driven insights from the cube and ETR this is breaking analysis with Dave vellante Uber has one of the most amazing business models ever created the company's mission is underpinned by technology that helps people go anywhere and get anything the results have been stunning in just over a decade Uber has become a firm with more than 30 billion dollars in annual sales and a market capitalization of nearly 90 billion as of today moreover the company's productivity metrics when you measure things like Revenue per employee are three to five times greater than what you'd expect to find in a typical technology company in our view Uber's technology stack represents the future of Enterprise data apps where organizations will essentially create real-time digital twins of their businesses and in doing so deliver enormous customer value hello and welcome to this week's wikibon Cube insights powered by ETR in this breaking analysis Cube analyst George Gilbert and I will introduce you to one of the Architects behind Uber's groundbreaking fulfillment platform we're going to explore their objectives the challenges they had to overcome and how Uber has done it and we believe the company is a Harbinger for the future of technology now the technical team behind Uber's fulfillment platform went on a two-year journey to create what we see as the future of data apps and it's our distinct pleasure to Welcome to the program Uday Kiran medicine who's a distinguished engineer at Uber and he's LED bootstrapped and scaled major real-time platform initiatives and his time at Uber and has agreed to share how the team actually accomplished this impressive feat of software and networking engineering Uday welcome to the program it's great to see you hello George all right start if you would by telling us a little bit about yourself and your role at Uber yeah hi George hi uh hi Dave uh super nice to be here uh um I joined Uber back in 2015 uh when we were primarily doing on-demand uberX and we were primarily in North America and over the last eight years I have witnessed Uber's tremendous growth uh you know how we have expanded from on-demand Mobility to all kinds of personal Mobility how we have expanded from this Mobility to all kinds of delivery and the mission that you just just said go anywhere and get anything that is the total accessible Market of that is insane around the world and that's what drives us here and that's what kept me here uh with the same energy even after eight years uh so I have I work on the core Mobility business and a bunch of foundational business platforms that are leveraged across mobility and delivery I also lead Uber wide senior engineering community where we set best practices so that we can move at the same Pace across all of the engineering team across Uber um so yeah that's my quick intro yeah I remember the first time I ever used the Uber app I was stuck in the Hinterlands outside of Milan no couldn't get a cab and I said I'm gonna try this Uber thing and this is like the early part of last decade and and I was I was it was like my it was like a chat GPT moment now back in March George and I and just last week as well introduced to the audience this idea of uber as the future of Enterprise data apps and we put forth the premise that the future of digital business is going to manifest itself as a digital twin that represents people places and things and that increasingly business logic is going to be embedded into Data versus the way it works today and applications are going to be built from this set of coherent data elements so when we go back and look at the progression of Enterprise apps throughout history it we think it's useful to share where we think we are on this journey so George put together this graphic to describe the history in simple terms starting with 1.0 which was departments and back office Automation and then in the Orange is the sort of Erp movement where a company like Ford for example could integrate all its financials and supply chain and all its internal resources into a coherent set of data and activities that really drove productivity kind of in the 90s and then web2o for the Enterprise so here we're talking about using data and machine intelligence and a custom platform to manage an internal value chain and where we're using modern techniques that we use the the here the example of amazon.com not AWS but the retail side of the operation and then in the blue we show Enterprise ecosystem apps this is where we place Uber today really one of the first if not the first to build a custom platform to manage an external ecosystem different of course from the gaming industry that we show there on the right hand side and our fundamental premise is that what Uber has built and we're going to get into this because Uber is on its own Journey even within that blue ellipse but our premise is that eventually mainstream companies are going to want to use AI to orchestrate an Uber like ecosystem experience using packaged off-the-shelf software and services and so you see most organizations they don't have a team of udays they can't afford it they can't attract the talent so we think this is where the industry is headed and Uber is a Harbinger example and George you have a burning question for Uday so go ahead it's a big picture question but it has to do with like helping people like understand not just the the consumer experience of the app but the the architecture of an application that is trying to orchestrate an ecosystem and how different that is from where we've been which is these packaged apps that manage repeatable processes that were you know pretty much almost the same across different businesses with maybe room for customization it's so radical and and we are so accustomed to living in it out here in Tech bubble land but tell us you know help us understand um sort of big picture what a big transformation that is from the the applications point of view yeah so one of the fascinating things about building any platforms for Ubers how we need to interconnect what's happening in real world and build large-scale real-time applications that can orchestrate all of this at scale you know like um there is a real person waiting in the real world to get a response from our application whether they can continue with the next step or not um if you think about our scale like you know last FIFA World Cup we had 1.6 million con concurrent consumers interacting with our platform at that point in time this includes Riders eaters Merchants drivers couriers and all all of this uh different entities they are trying to do things in real world and our applications has to be real time they need to be consistent uh they need to be performant and we need to be and on above all of this we need to be cost effective at scale because if we are not performing if you are not leveraging the right set of resources then we can explore that overall cost of managing the infrastructure so these are all some unique challenges in building Uber like application um and we can go into more details on various aspects and both its breath and also in depth right yeah so Uday I mean this Vision that you sort of laid out it requires an incredible amount of data to be available as you said in in real time or near real time uh uday's Team a couple of key blogs that we'll put into the show notes uh I mean I've probably got seven hours into them and I'm still like going back and trying to squint through them so I really appreciate your sort of up leveling it here and helping our audience understand it but what was it about the the earlier 2014 architecture you described this in one of your blogs that limited the realization of your mission at scale and catalyze this architectural rewrite and we're particularly interested in the trade-off that you had to make that you've talked about in your your paper your blog to optimize for availability over consistency why was that problematic and let's talk about how you solved that yeah you know if you think about um back in 2014 and what was the um what was the most production ready databases that were available at that point um we could not have used at that point in time traditional SQL like systems because of the scale that we had even at that point in time and the only option we had which provided us some sort of scalable real-time databases was nosql kind of systems um so our app so we were leveraging uh Cassandra uh uh and the entire application that drives the state of the online order state of the driver sessions all of the jobs all of the waypoints all of that has been was stored uh on in Cassandra and over the last eight years we have seen you know the kind of fulfillment use cases that we need to build that has changed a lot so uh whatever assumptions that we have made in our core data models and what kind of entities we can interact it has completely changed so we had to if not anything else change our application just for that reason the second because the entire application was designed with availability as the main requirement and latency was more of a best effort and consistency was more of a best effort mechanism whenever things went wrong it it's made it really hard to debug for example like we don't want a scenario where if you request a right two drivers show up at your pickup point because the system could not reconcile whether this trip was already assigned to a particular driver or it wasn't assigned to anyone and those were real problems that would happen if we don't have a consistent system and um so the three prob three main areas of problems at the infrastructure layer at that point one is consistency that I mentioned uh already and because we didn't have any atomicity we had to make sure the system automatically reconciles and patches the data when things go out of sync based on what we expect the data to be um there was a lot of scalability issues um because we were getting to a best effort consistency we were using at the application layer some sort of hash ring and what we would do is oh let's get all of the updates for a given user routed to a same instance and have a queue in that instance so that even if a database is not providing consistency we have a queue of updates so we make sure there's only one update at any point in time that works when you have updates only in two entities so then at least you can do application Level orchestration to ensure you know they might eventually get in sync but it doesn't scale beyond that and because you're using hash ring like we could not scale our cluster to Beyond a Vertical Limit and that also inhibited our scale challenges and especially if we want like large cities that you want to handle we couldn't go beyond a certain scale so these were the key infrastructure problems that we had to like fundament like we had to fix so that we can set ourselves up for the next decade or two yeah makes sense so if when the last update wins it may not be the most accurate update so yeah all right and then George when you and I were talking about this you said Dave you know it might not just be scale it was sort of uber thinking about the future but elaborate on that George so Uday what I wanted to know was like you guys had to think about a platform more broadly than just like drivers and Riders because you had new verticals new businesses that you wanted to support and you know well the application layer manages things the database generally manages strings but the new capabilities in the database allowed you as you were describing to to think of like consistency differently and and latency but can you talk about also how you generalize the platform to support new businesses yeah uh so that that's a that's a great question you know like uh one of the things we had to make sure was as the kind of entities change within our system as we have to build new fulfillment flows we need to build a modular and leverageable system at the application Level at the end of the day we want the engineers building core applications and core fulfillment flows abstracted away from all of the underlying complexities around infrastructure scale provisioning latency consistency like they should get all of this for uh for free and they don't need to think about it when they build something they get the right experience out of the box so what we had to do was add our programming layer we had a modular architecture where every entity like let's say there is a order there is an order representation there's a merchant there's a there's a user or a organization representation and we can assume we can store these objects as individual tables and we can store the relationships between them as as a as a sec as another table that stores the relationships between these objects so whenever new objects get into the system and whenever we need to introduce new relationships they are stored transactionally within our system um we we use the core database as uh you can think of it as a transactional key Value Store uh the database layer we still only store the key columns that we need and rest of the data is told as the serialized block uh so that we don't have to continuously update the database anytime we add new attributes for a merchant or for a user uh we don't we want to make that use that operational overhead but at the high level every object is a table and then every relationship is a row in another table and then all whenever new objects or relationships get introduced they are transactionally committed um Dave I just want to add that what's interesting is he just described an implementation of a semantic layer in the database right right we've been talking about this uh for months George and and the importance of it and and I want to come back to that um let's let's help the audience understand at a high level today the critical aspects and principles of the new architecture we're showing here is a chart from Google Engineering in one of the blogs and we want to understand how your approach again differs from your previous architecture so and you've touched on some of that so the way we understand this is the green is the application layer which is sort of intermixed the left-hand side uh shows that and on the right hand side you've separated the application services at the top from the data management below and that's where spanner comes in so how should we understand this new architecture in terms of how it's different than the previous architecture yeah so in the previous architecture we we went we went through some of the details right like the core data uh was stored in Cassandra and because we want to uh have low latency reads so we had a redis cache as a backup whenever thing well whenever Cassandra fails so whenever we want some low latency reads and we went through Ring Pop which is the application layer Shard management so that the requests get routed to the instance we need and there was one pattern I didn't mention which was on Saga pattern which was a paper from a few decades ago ultimately there was a point in time where the kind of transactions that we had to build it evolved from just two objects like imagine a case of we want to have a concept of a batch offer which means a single driver should accept multiple trips at the same time or not now you don't have now one is to an association you have a single driver I have maybe two trips four trips five trips and you have some other object that is establishing this Association now if we need to now create a transaction across all of these objects we tried using Saga as a pattern extending our application layer transaction coordination but again it became even more complex because if things go wrong we have to also write compensating actions so that system is always in a state where they can proceed we don't want users to get stuck and then not get new trips so in the new architecture like the key foundations we mentioned one was around strong consistency so um and linear scalability so the new SQL kind of databases provide that and we love we went through exhaustive evaluation in 2018 across multiple choices we had and at that point in time we picked a spanner as as the as the option um and so we get we move all of the transaction coordination and scalability concerns at the at the database layer and at the application layer we focus on building the right programming model for building new fulfillment flows and the core transactional data is stored in spanner we limit the number of rpcs that we go from our on-prem data centers to Google Cloud because there it's a latency sensitive operation right and we don't want to have a lot of chatter between these two worlds um and we have an on-prem cache which is which will still provide you point in time um snapshot reads across multiple entities so that they are consistent with each other so for most use cases they can read uh they can read from the cache and spanner is only used if I want strong reads for a particular object and if I want cache reads across multiple objects I go to my cache if I want to search across multiple objects then we have a our own search system with which is indexed on specific properties that we need so that if I want to get all of the nearby orders that are currently not assigned to assigned to anyone we can do that low latency search at scale um and obviously we also emit Kafka events within Uber stack so then we can build all sorts of near real time or roll-up applications and then it's also go show raw tables then you can build more derived tables using spark jobs and but with all of those things are happening within Uber's infrastructure and we use panel for strong reads and core transactions that we want to commit across all of the entities and establishing those relationships that I mentioned all right so George coming back to the sort of premise this is how you've taken Uday these business entities the drivers Riders routes etas orders and you've reconciled the the trade-offs between latency availability and consistency would it be fair to say Uday that because you did such a good job matching between the things in the application and the things in the database that you were able to inherit that transactional strengths of the database at both layers at the database level and to simplify the that coordination at the application Level and that um you also did something that people talk about but don't do much which is a deep hybrid architecture where you had part of the application on-prem and part you know using a Google service that you couldn't get elsewhere often Google Cloud uh yeah uh absolutely yeah absolutely and then I think um one more interesting uh fact is how for most Engineers they don't even need to understand behind the scenes it's being powered by spanner or any database the the guarantees that we provide to more application developers who are building you know fulfillment flows is they have a set of entities and they say hey for this user action these are the entities that need to be transactionally consistent and these are the updates I want to make to them and then behind the scenes are application layer leverages spanners transaction buffering make updates to each and every entity and then once all the updates are made we commit so then all the updates are reflected in this in the storage so that the next strong read will see the latest update so the database decision obviously is very important we're curious what was it about spanner that led you to that choice it's globally consistent it's a globally consistent database what about it made it easier for all the applications data elements to to share their status how did you you said you did a detailed evaluation how did you land on spanner yeah you know like any kind of choice requires a lot is this a lot of Dimensions that we uh that we evaluate uh but one is we wanted to build using a new SQL database um because we want to have the mix of you know asset guarantees that uh SQL systems provide and horizontal scalability that nosql kind of systems provide and new SQL and building large scale applications using new SQL databases like at least around that time when we started that was still we didn't have that many examples to choose from uh even within Uber we were kind of the first application for for managing live orders using a new SQL based system um but the specific properties that you know in some sense we need are external consistency right like I kind of mentioned which is uh spanner provides the strictest concurrency control guarantee for transactions so that when the transactions are committed in a certain order any specific read after that they see the latest data because that is very important because you know imagine we uh assigned a particular job to a specific uh driver or Courier and then Next Movement if we see that oh this driver is not assigned to anyone we might make wrong business decision and then assign you one more trip and that that will lead to wrong outcomes um and then horizontal scalability uh because spanner automatically starts and then it will rebalance the shards and so then we we have this horizontal scalability in fact we have our own Auto scaler that listens to our load and standard signals and constantly adds new nodes and remove nodes because the traffic pattern Uber has changes based on time of the day day and then hour of the day and then also day of the week it's it's it's very uh curvy so then we can make make sure we have the right number of nodes that are provisioned to handle scale at that point in time um I've kind of mentioned the server side transaction buffering that was very important for us so that we can have in modular application so that each application each entity that I am representing they can commit update to that entity independently and layer above is coordinating across all of these entities and once all of these entities have updated their part then we can commit the overall transaction so we can live so the trans transaction buffering on the server side helped us at the application layer to make it modular now then uh all the things around stale reads point in time reads bounded Stillness reads these help us build the right caching layer so that for most reads or cash it rate probably is like on high 60 70. so for most reads we can go to our on-prem cache and only for um when there's a cache miss or strong reads we can go to our storage system um so these were the key um things one we want from new SQL and then spanner was the one because like with without the time to Market because it's already a production and we can leverage that solution but all of these interactions are behind an orm layer with the spec with the guarantees that we need so this will help us you know like over time uh figure out if we need to evaluate other options or not but right now um for most developers they don't need to understand what is powering behind the scenes yeah in the in the the outcome for your customers is pretty remarkable I mean George and I today were really interested George was sort of alluding to this before the aspects of the system that enable this coherency across all these data elements of the system that that it has to manage in other words your ability to get agreement on the meaning of a driver or ride or a price Etc and how you design and Achieve that layer to enable that coherence that that is Tech that you had to develop correct yeah absolutely you know I think um there is there are many objects also you know we need to really think about what attributes of what a user sees in the app need to be coherent and what can be can be kind of stale but you don't necessarily notice because not everything need to have the same amount of guarantees same amount of same amount of latency and so on right like so if you think about some of the attributes that we managed we talked about like the concept of orders uh if if a consumer places any intent that is an order within a system a single intent might require us to uh decompose that intent into multiple sub objects like for example if you place an Uber Eats order with this one job for the restaurant to prepare the food and there is one job object for The Courier to pick up and then drop off um and when Courier job object like we have many waypoints which is the pickup Waypoint drop-off Waypoint each Waypoint can have its own set of tasks that you need to perform like for example it could be taking a signature taking a photo paying at the uh at the store uh all sorts of tasks right and all of these are composable and leverageable so I can I can build new things using the same set of objects um and if in any kind of marketplace we have supply and demand and we need to ensure there is a right kind of um dispatching and matching paradigms in some cases you know we are we assign we offer one job to one Supply in some cases it could be mh2n in some cases it is blast to many uh supplies in some cases they might see in uh some other surface where these are all of the nearby jobs that you can particularly potentially uh handle so this kind of this is another set of objects which is super real time uh because like when you swim a driver sees an offer card in the app it goes away in 30 seconds and it they need to 30 40 seconds they need to make a decision and based on that we have to figure out the next step because you know within Uber's application we have changed users expectation of how quickly we can perform things if we are off by few seconds we will start canceling um then Uber is hyper local then we have a lot of attributes around latitude longitude route line driver's current location or etas these are probably like some of the hardest to get right because you know we constantly ingest the current driver location every four seconds we have lot of latitude longitude like this throughput of this system could itself is like in hundreds of thousands of uh updates per uh per second but not every update will require us to change um the ETA right like ueta is not changing every four seconds uh your route line is not changing every four seconds so we do some magic behind the scenes to make sure that okay I'll have you crossed City boundaries only then we might require you to update something have you crossed some product boundaries only then we require you to do some things so we do that inferences to limit the number of updates that we are making to the core transactional system and then we only store the data that we need and then there's a complete parallel system that manages the whole pipeline of you know how we receive the driver side of equations and generate navigations and stuff for drivers and then how we convert these updates and then show it on the rider app that stream is completely decoupled from the core orders and jobs and uh and you know if you think about Uber system it's not just about building the business platform layer like we have a lot of our own sync infrastructure at the edge API layer because we need to make sure all of the applications data is kept in sync they are going through choppy Network conditions they might be unreliable and we need to make sure that they give the updates as quickly as possible as with low latency irrespective of what kind of network condition they are in so there's a lot of engineering challenges at that layer as well ultimately all of this work in together to provide you the visibility that hey you know I can exactly see what's going on because if you're waiting for your driver if they don't move you might cancel assuming that hey they might not show up and it's and we need to make sure that those updates flow through not just through our system but also from our system back to the rider app as quickly as possible this so hopefully we're George you had a question yeah this is I I mean this is something new we're on new territory at least as far as Dave what what we've explored before right what I'm taking away is that the you're not just managing this layer at the application where you you've got Uber's entities or things but you're also and and translating that down to the database and the database is you know transactional semantics making it sort of easier to manage and orchestrate those things but what you're describing is something where the data is the data is sort of liveliness is an attribute that makes managing it separately separate from just mapping it down to the database you manage how it gets updated and how it gets communicated separately based on properties that are specific to each data element and by data element I mean property not not like a driver you know or a courier and that is interesting because Dave just as a common Walmart talked about prioritizing data and you know for for communications from stores and the edge right um and that may lead into a follow-on question this is a sorry for for the the long Preamble but the the question I have Uday is what happens when you are orchestrating an ecosystem with 10 or 100 times as many things as you are now and more data on all those things than you have now have you thought about what a world looks like where the centralized database may not be the Central Foundation see I think that's where the trade-offs come in we need to be really careful about not putting so much data in the core system that manages these entities and these relationships and overwhelm with so much data that I think we'll we'll probably hit some then we'll end up hitting scale bottlenecks you know for example um the fair item that you see both on the rider app or on the on the driver app that item is made up of hundreds of line items with different business rules specific to different GEOS different localities different tax items we don't store all of that in the core object but one attribute for for fair that we can that we can Leverage is Affair only changes if the core properties of riders object Riders requirements change so every time the my like you change your drop off then we regenerate the fare so I have one Fair uid every time we regenerate we create a new version of that fare and store these two uids along with the my core order object so that I can store in a completely different system my fair uid fair version and all of the data with all of the line items all of the contacts that we use to generate that line items because that what we need to save transactionally is the fair version uid when we save the order we don't need to save all of the fair attributes along with that so these are some design choices that we do to make sure that you know we limit the amount of data that we store for these entities in some cases we might store the data in some cases we might version the data and then store along with that in some cases if it is okay to tolerate the data and it doesn't need to be coherent with the core orders and jobs it can be saved in a completely different story online storage and then we have at the presentation layer where we generate the UI screen there we can enrich this data and then generate the screen that we need so all of this will make sure that we limit the scale of growth of the core transactional system and then we leverage other systems that are more suited for the specific needs of those data attributes but still all of them tie into the order object and then there's an association that we maintain so this is really important I want to we're going to actually revisit this is a guide to the future but so I just want to take a pause and reset here and kind of hopefully the audience understands that what Uber has built is different of course than conventional apps we try to sort of put this together in a slide to describe the sort of 3.0 apps uh if Alex should bring

up the next one um so starting at the bottom you have the platform resources and then the data layer to provide that single version of the truth and then the application services that govern and orchestrate the digital representations of the real world entities drivers Riders Packages Etc and that all supports what the customer Sees In The Uber app so the big difference from the cloud stack that we all know and love is you know Uber's not selling us compute or storage we don't even see that or other Ubers offering up things access to drivers and Merchants and services and so who they where are the lines between sort of your thinking in commercial off-the-shelf software that you were able to use and versus the IP that Uber had to develop itself to achieve these objectives can you describe sort of that that thinking and what went into that build versus Buy yeah in general um we rely on a lot of Open Source Technologies um commercial off-the-shelf software and in some cases in-house developed Solutions ultimately depends on you know the kind of specific use case time to Market maybe you want to optimize for cost um optimize for maintainability all of these factors come into picture um for the the core Audits and the code fulfillment system we talked about spanner and how we leverage that with some specific guarantees we use panel for even our identity use cases where we want to manage you know especially in large organizations you want to make sure your business rules your ad groups your stuff and how we capture that for our consumers that has to be in sync um but there is a lot of other services across micro Services across Uber that leverage Cassandra if if their use cases uh high right throughput and we leverage redis for uh for all kinds of caching needs we leverage hcd and zookeeper for low level infrastructure platform storage needs and we also have a system that is built on top of MySQL with a raft-based algorithm called doc store so for majority of the use cases that is our go to solution where it it provides you Shard local transactions and it's a multi-modern database so it's it's useful for most kind of use cases and it's optimized for cost because uh we manage the stateful layer we manage and we deploy it on on our nodes so for most applications that will give us the balance of cost and efficiency and for applications that need the strongest level of requirements where like fulfillment or identity where you spanner for higher right throughput we use Cassandra and Beyond this you know if you think about our metric system uh one m3db it's an open source software Open Source by uh by Uber contributed to the community few years ago it's a Time series database uh like we ingest millions of metric data points per second and we have to build something on our own and now uh and now it's an active community and there's a bunch of other companies leveraging m3dp for metric storage so so ultimately it's you know in some cases we might have built something and open searched it in some cases we leverage off the shelf uh in some cases we use completely open source and like I know contribute some new features for example for uh for data Lake Uber pioneered parachute back in 2016 and contributed so then we have one of the largest transactional data uh lake with maybe 200 plus petabytes of data um that we manage got it okay this next snippet that we're going to share comes from an ETR Roundtable which is our data partner um and they do these private round tables uh we'll pull it up and I'll read the quote from a pretty famous technical group who's going to remain unnamed only because I'm not sure I have permission to name this individual but he said he says everybody in the world is thinking about real-time data and whether it's Kafka specifically or something that looks like Kafka real-time stream processing is fundamental when people talk about data-driven businesses they very quickly come to the realization that they need real time because that's where there's more value architectures built for batch don't do real time well person mentioned cockroach says it's super exciting I feel weird endorsing a small startup he said but Google spanner is amazing and cockroach is the closest thing that you could actually buy off the shelf and run yourself rather than be married to a managed service from a single Cloud vendor so Uday a couple of questions here um I'm curious as to how you changed the engine in mid-flight going from the previous architecture your pre-2014 and post um and it's you know George mentioned what happens when Real Time overwhelms the centralized databases ability to manage all this data in real time and it sounds like architected at least quite a a Runway to avoid that but talk about two questions there how do you change the engine in mid-flight and when do you see it running out of gas yeah uh you know the the first question now one of the things I I think is there's designing a new Greenfield system is one thing but moving from whatever you have to that green Greenfield system is 10x harder and the complex the hardest engineering challenges that we had to solve was for how we go from A to B without impacting any real any user we don't have the luxury to do a downtime where hey you know we're gonna shut off Uber for an hour and then let's do this migration behind the scenes and then we went through the previous system was using Cassandra with the with some in-memory queue and then the new system is strongly consistent how do you go from the the core database guarantees are different the application apis are different so what we had to build was a proxy layer um that um for any user request we have a backward compatibility so then uh we Shadow what is going to the old system and new system but then because the properties of what transaction gets committed in old and you are also different it's it's extremely hard to even Shadow and get the right metrics for us to get the confidence um but ultimately um so that is the shadowing part and then we what we do is what we did was uh we tagged a particular driver and a particular order that gets created whether it's created in the old system or new system and then we kind of gradually migrate all of the drivers and orders from old to new so there would be at a point in time you might be seeing that Marketplace is kind of uh split where half of those orders and earners are in the old half of them are in the new and then once all of the orders are moved we switch over the state of remaining earners from old to new so one they had to do a lot of unique challenges on shadowing and two uh we have to do a lot of unique tricks to make sure that we give the perception of there is no downtime and then move that state without losing any context without losing any jobs in flight and so on yeah so and then if there is a driver who's currently completing a trip in the old stack we let that complete complete and the moment they are done with that trip we switch them to the new stack so that their state is not transferred Midway through a trip so then once you create new trips and new earners through new and then switch them after complete the trip we have a safe point to migrate you know this is similar to uh like 10 years ago I was at VMware and like we used to work on how do you do uh vmotion like virtual machine migration other Host this was kind of like that kind of challenge what is the point at which you can you can move the state uh without having any application impact uh so those are kind of the tricks that we had to do um and the second question and how do we make sure we don't run out of gas you know we kind of went through that right like um one uh obviously we are doing our own scale testing our own projected testing to make sure that we are constantly ahead of our growth and make sure the system can scale and then we are also very diligent about uh looking at the properties of the data choosing the right technology um so that we limit the amount of data that we store for that system and then use um specific kind of systems that are catered to those use cases like for example like all of the our matching system if it wants to query all of the nearby jobs and nearby supplies we don't go to the transactional system to query that we have our own inbuilt search platform where we are doing real-time ingestion of all of these data using CDC and then like so and then we have all kinds of rankers so that we can do real time on the Fly generation of all of the jobs because the more context you have the better Marketplace optimization we can make and that can give you the kind of um efficiency at scale right uh the otherwise we'll know we'll make imperfect decisions which will hurt the overall Marketplace uh efficiency yeah and in your blog post you had said you had to build this architecture to support your business for the next decade so if from inferring you don't see any at least in the near term all these data elements and all this real-time data overwhelming uh the system because of the way you've architected it is that a fair assertion yeah yeah absolutely I think you're confident at least um you know for the foreseeable future what we have is a stable foundation and you know since then you could see the kind of new use cases that we are building right like you know like uber Reserve now you can reserve 30 days in advance now we are indented into grocery we are doing uh where a courier is going and then shopping for you we are doing decent Liberty announcements on Party City on Pet Smart like so we want to make sure that we can go anywhere and get anything we can unbundle every use case uh that you need a car for and then provide a affordable scalable Transportation solution so that we can handle all of your Mobility needs on demand at scale at your fingertips and then we can capture every single Merchant in the world and then capture it in our system every single catalog every single item manage relationships across all of them we have millions and millions of catalog items around the world and then so that you can go and get anything that you need um whether it is a food whether it's alcohol whether it is some party item whether it's some pet food whether it's convenience whether it's Pharmacy everything is handled uh so that is uh um so we we at least right now uh at least I'm confident that we can scale to those needs and then we have the system uh that can scale to that needs right you know last question is George and I have been sort of look into the future using Uber as an example of the future so what do you see coming or what do you hope to see if you think about just a broader industry with respect to commercial tools over the next day three to five years that might make it dramatically easier for a mainstream company that doesn't necessarily have Uber's technical bench and depth to build this type of application um in particular how might other companies that need to manage hundreds of thousands of digital twins design their applications using more off-the-shelf technology do you expect that will be possible in let's call it the the midterm future yeah you know I think um the whole landscape around developer tools applications it's it's a rapid evolving space you know what was possible um now was not positive five years ago and like it's constantly changing um but what we see is you know we need to provide value at upper layers of the stack right and then wherever if there is some solution that can provide something of the Shelf we we move to that so then we can focus up the layer like it's not just building taking off the shelf IAS or past Solutions just taking the sheer complexity of representing configuration representing the geodiversity around the world and then building something that can work for any use case in any country uh adhering to those specific local rules that that is what I see is like the core strength of uber like we can manage any kind of payment uh payment disbursements or payments on in the world uh we have the largest support for many payment like any payment method around the world for earners we are disbursing like billions of uh layouts to whatever bank account and whatever payment method they need money in um we have a risk system that can handle nuanced use cases around risk and fraud uh our system around fulfillment that's managing this our system around maps that is managing all of the ground through tolls or charges uh navigation all of that so we have probably one of the largest global map stack where we manage our own navigation and leveraging some data from external providers um so this is the like the core IP and Core Business strength of uber and that is what is allowing us to do many verticals but again the the systems that I can use to build this that over time absolutely I see you know it makes it easier for many companies to leverage this maybe 15 years ago we didn't have spanner so it was much harder to build this now uh with spanner or with similar new SQL other of the Shelf databases it it solves one part of the challenge but then now we need to uh think about like the other Center other layer of the challenge I am so excited that you were able to come on George because uh David was able to come on because George you and I have been talking about this as the future and uh just I think who they just solidified it but I think George we set a new record for breaking analysis in terms of time but uh but uh George what are your takeaways anything last words that you would have to add before we break take away sir I think this is one of those applications that people will look back on many years from now and say you know that really um was the foundation for a new way of doing business not just a building software but of doing business like Amazon was the first one to manage their own internal processes you know where they're orchestrating the people places and things with an internal platform but you guys did it for an external ecosystem and you know made it accessible to Consumers you know in real time um and I think the biggest question I I have and it's it's not really one that you can answer but it's one that we'll have to see the industry answer is to what extent the industry will make technology make it possible for mainstream companies to start building their own Uber platforms to manage their own ecosystems that's that's my my takeaway in my question yeah so okay we're gonna leave it there thanks so much I really appreciate uh your time and your insights um and love to have you back yeah absolutely anytime ring me up I'll be there all right thank you thank you so much it was a pleasure talking to both of you today and on on being on making analysis uh see you soon fantastic on behalf of George Gilbert I want to thank Uday and his team for these amazing insights on the past present and future of data driven apps or I also thank Alex Meyerson who's on production and manages the podcast Ken schiffman as well Kristen Martin and Cheryl Knight helped get the word out on social media and in our newsletters and Rob hoef is our editor-in-chief over siliconangle.com thank you so much everybody remember all these episodes are available as podcasts all you got to do is search breaking analysis podcasts pop in the headphones go for a long walk on this one I publish each week on wikibon.com and siliconangle.com you can email me directly at david.velante at siliconangle.com to DM me or comment on our LinkedIn posts and check out etr.ai they got

great survey data on Enterprise Tech this is Dave vellante for the cube insights powered by ETR thanks for watching and we'll see you next time on breaking analysis foreign [Music] foreign

2023-06-25 16:19

Show Video

Other news