Hacking Developer Productivity by Reducing Complexity | Daniel Gebler (CTO @Picnic) - CTOSummit23

Hacking Developer Productivity by Reducing Complexity | Daniel Gebler (CTO @Picnic) - CTOSummit23

Show Video

Thanks a lot for having me. So I'm Daniel, cto of Picnic, an online supermarket that we started a couple of years back in Netherlands, and then we went to Germany and to France, and our mission is very simple We want to make grocery shopping simple, fun and affordable for everyone. And actually and this is pretty much true we started here.

We started actually also in Hamburg just two weeks ago, so that's actually a very much coincidence with OMR and with the Alphalus CTO Summit. Maybe a quick question Who of you guys has heard about Picnic? Just a quick show of hands Okay, super, then I will not talk too much about as an intro, about Picnic, but I will talk a bit about developer productivity, and developer productivity is something which is a bit of a bad reputation, because obviously we need to be productive, we want to be efficient, and some people call it at some point, developer enablement. But it's not so much about working faster or working harder, but it's more about getting faster two results with the same type of effort, or maybe less effort.

And that is especially important in a time where we have LLMs, GTP3, 4, 5 and whatever may come there, because everything is becoming even faster. So maybe just a quick introduction. These are the vehicles. I will not talk too much about it, but these are the vehicles that you will see all around Hamburg very soon. You see already Nordrhein-Westphalia around 1500 of those kind of vehicles and these are the vehicles that we are using to deliver our groceries.

And if you think a bit about the proposition that we started to deliver to customers is that we said we deliver all the goods that you like without delivery fees and at a lowest price, and that means that opens up completely the market. That is very different to Rheve, where you need to pay for deliveries. Here we are actually saying we deliver without delivery fees, and this is exactly the same what happened a couple of years back also in many other verticals. For instance, the same happened also in the fashion industry. Zalando was the first one in Europe that said we make free deliveries and free returns for fashion, and that broke up completely the market. It moved the market in fashion from 3% online to now by around 25%.

And we want to do the same also for food. And if you think a bit about what do you need technology wise for this, then you have obviously an app where you can do the ordering, but that is only around 5% to 10% of our entire tech stack. The majority is the entire tech stack that is required to run a supply chain Buying products, running a fulfillment process, a last mile process, planning and the people process. By now we have already 20,000 people in picnic that need to be planned, scheduled and etc. So this is certainly a very, very complex process and the majority of the tech that we are building is around supply chain and logistics and not so much around the consumer proposition.

But if you think a bit about maybe one challenge and then I will go a little bit into the topic of developer productivity one challenge that we are tackling now is the automation and the robotization of the fulfillment process. So what happens in the normal fulfillment center is that an order comes in where we say somebody has ordered 30 items and then an order picker goes through the warehouse, picks the first, second, third product. That is obviously very inefficient. You can do this much smarter. What you can do is you can obviously reverse the process.

Instead of you as an order picker going to the product itself, you can have robots that bring the products to you. Sounds very obvious, but it's actually very hard to do and nobody really has mastered it yet for food, and we are the first one that have built this. So let me show you a very quick video how this looks like in production. So this is a fulfillment center that we launched in the beginning of last year. So this is all what I wanted to say about Picnic, but now let's zoom in a little bit into what means actually developer productivity.

And there are many, many factors that drive efficiency of a process, of a development process, and everybody knows it's about cycle time and PR review times, it's about the time that people spend in meetings and interruption between kind of different kind of PRs, et cetera. But I don't want to talk too much about that part, because that is very individual to different kind of organizations. Some have a kind of bigger problem, some less. But one thing is common across all startups, scale-ups, most kind of tech companies, and that is the complexity of the tech stack. And let me show you a little bit, or let me talk a little bit about what this means exactly.

So the first thing is, the complexity of a tech stack is driven by many, many different things, so let me take you, maybe, one example. So the first thing is that there is a bias towards kind of new technologies, and all of us have been in meetings where somebody said well, i have this kind of new database, i have this new kind of library that I want to try out, and if you allow this, then you get to a kind of a tech stack that is becoming impossible to maintain. So therefore, being a little bit kind of boring, having a more lean kind of tech stack, have a narrow tech stack, will certainly on the long term payout. But there is also on the architectural side, something We have all been in meetings where somebody was arguing OK, let's split up a service, let's build a microservice for that, let's build a separate service, et cetera. What we learned and I think this is something that is very, very important is that in many cases you are better off with a monolith Not cool, not sexy, all your engineers will hate you for that, but actually with a monolith in many cases you can much faster iterate over kind of the first version of a product proposition.

So what we did in the beginning is, for the kind of the service proposition that was not yet clear, we kept it a complete monolith And then we very quickly iterated and we made only microservices out of those parts of the tech stack where we had a pretty stable interface, where we knew how the interfacing with consumers, but also with third parties, would look like, and that has paid off quite a bit. If you want to continue to move fast And if you then think a little bit further, then you need to know a little bit what actually are you up to and what is actually the complexity that you are fighting with. And the real interesting thing is there are kind of different factors that are driving complexity in software system. So the most obvious one is the domain complexity. And domain complexity is you build something in a specific domain, maybe a game or maybe an e-commerce proposition or maybe a logistical proposition.

That is pretty obvious. This is the most right one. But interesting ones and the ones that are driving a real complexity of large-scale systems. A good example is, for instance, twitter is actually a completely different kind of complexity system complexity parameters. The first one is legacy complexity, so complexity that is based on all technologies that have been built in the past. This kind of typical quote technology rods applies also for modern systems.

And the second one is solution complexity. And the solution complexity is actually a very important one. We all have typically a tendency to over-generalize the solution, to build a solution that is maybe more complex than it should be, and therefore we are introducing insolutions complexities that is not really warranted for the overall setup. So let me show you a bit what happens if we have a different kind of setups. So the first one is let's look a bit into the legacy and the solution complexities. Obviously, where we have control over What we have no control over is the kind of the domain complexity.

Domain complexity is what makes the business proposition off And, even more important, the domain complexity is the actual complexity that somebody will pay you for if you have a product or if you sell a service, for Everything else is something that you have control over and what you will be able to either minimize or basically let explode one or the other way. An interesting one is that if you work with a lot of engineers, especially junior engineers, you end up in a situation where we fall in love with the technology instead of falling in love with the solution, and therefore we are ending up with a kind of a tech stack that is becoming extremely complex to maintain. So the interesting thing here is that there's a tendency that many people are trying out technologies not only for pet projects but also in start and scale-ups, and I have been a part of a couple of those.

So some of my portfolio companies are actually deploying technologies purely for the reasons that they have tried it out, that they want to have it on their CV and they want to actually explore it. But what then happens is that you're using technologies that usually will not survive the next kind of next type And the best example for this is usually all the kind of web frameworks that we have around. If you think a bit about what we have now with Angular, and those kind of frameworks have met you quite a bit, but the time before Angular was a total wild west with web frameworks, and whoever has chosen a web framework here was basically outdated a year or two years later. So if you look a bit into how it looks like, if you have multiple projects, then let's think a bit about how a typical tech stack evolves. So the first one is you have a project, one You build maybe just the app, or you're building just a kind of a sales solution, and then a project has obviously domain complexity plus a solution complexity, but that is only the starting point, because now it becomes really tricky because you're starting your second project.

And if you start your second project then the first project is not completely abandoned. You have a little bit of maintenance. But over time this kind of first project will become all kind of a legacy project And there's all kind of legacy that will accumulate it because libraries no longer up to date, there's some development processes that are no longer up to date, you're using CISD processes that are no longer up to date, so you have all kinds of legacies here. The second project is pretty much up to date, but the first one not. So now the question is how do you handle that? And you typically have no time to really tackle this at this point in time because you start your third project.

So the third project will obviously a project be that is again cutting edge. And then you have two other projects that will have all kinds of complexity, angles, solution complexity, as usual, but even more it has also a complexity not only from the legacy but it basically have exponentially increased complexity on the legacy, because what happens is if you have two text that are interconnected on the API side, then you have certainly a kind of a challenge that those kind of complexities will nobody any longer wants to maintain. And you end up in the situations and I'm just using Twitter as an example because I know pretty well the internal set up there but there are many, many other organizations where you have large parts of your text that nobody wants to any longer work with And nobody dares to change anything any longer.

So therefore you end up in a situation where then many people will just write new software or the same solution again new, instead of working on the old solution. So there's obviously not something that you want to have, and with the fourth project it becomes even bigger. So this is becoming a total mess. So what to do now Or how to tackle this? So the first thing is you're realizing that every increase of the complexity will increase the cognitive load or the requirement on your team, and there is a kind of a limit that your team can handle from a cognitive complexity perspective. So you have maybe a couple of projects and you have some lead engineers that can handle maybe the first few projects or the complexity of the first few projects, but at some point you're reaching just a limit what the team can handle, and then you're ending up also in the second problem, where you have a few early engineers that know most of the text stack and then a few new ones have no possibility to any longer catch up. So this is becoming totally impossible.

So what I'm painting here is a scenario is pretty much a doom scenario, obviously And that is something which is not something what we had in this form in picnic but I've seen this in quite a few start and scale ups that are usually in the range of something like five to seven years that have accumulated so much tech debt that the only option was to rewrite a large part of the existing code base. So the question remains what to do here, and if you're at this stage, then obviously you certainly have the only one option You make a stop the world, you're on a stop the world scenario, you rewrite the entire stack, no feature goes out for a year and you praise that you will survive. So that is obviously not cool, but it's the only thing what those kind of companies can do. But you can prevent this from an earlier stage And kind of the five or six complexity drivers that we have identified you can tackle from day one, and also with every kind of new project.

So one thing is obviously the code quality, and code quality is a kind of a tricky beast Because in a sense, this is something which is based on guidelines that you have set up in your organizations, but also standard guidelines that you take. Maybe from some parties. There's all kind of poor request guidelines that you're using, but there's a kind of a standard that you want to apply that is hopefully as close as possible to the industry guidelines, because what you want to achieve is that you have a new engineer that joins you that as quickly as possible finds its way through the code, and that is something which you can achieve with a well-written code, but certainly also with a proper software architecture And the architecture. And this is an interesting thing.

Everybody talks about the scalable architectures and resilience and all kind of different factors, but one thing what we learned is, if you don't know your requirements well enough in the beginning, or if you know that you need to make five, seven, ten changes, that the only real driving factor for your architecture should be the cost of change. So how expensive is it to change a feature? And that may mean that you have a non-scalable implementation. That is okay if you anyway need to change it five, six, seven times until you will scale it up.

So therefore the cost of change is probably the most important architectural driving factor in the beginning. And then documentation, obviously, when I close to the code. And you come then to two angles, and this is on the left-hand side here the number of tools and the tool complexity, which is usually called, in complex terms, the breadth of the complexity and the depth. And what is here important is actually two things.

Number one is to keep it as narrow as possible, so reduce the number of tools, which is pretty obvious. If you look very carefully into your organizations, i'm very sure that it's just growing over time into a very extensive one. The problem is not using new tools. The really hard part is actually facing out all tools.

So what we did is we had a very simple rule. At some point we said whenever you add a new technology, whenever you're in a new library, anything new, something needs to go. You decide what it is, but something needs to go. And that is obviously a very kind of black-white-ish kind of rule, and certainly it needs a little bit more differentiation, but it forces you to also, in a very explicit way face out again tools and kind of old technology. And the other one is obviously tool complexity. And the interesting thing is most of the tools are getting very complex And you see, also now with all this kind of discussion about LLMs and ML and deep learning attacks, it becomes so complex that sometimes a much simpler solution that has a little bit less performance is just much better.

Let me give you an example. For us, it's very important to forecast what our customers ordering tomorrow, day after tomorrow, how many bananas, how many cucumbers, and you can use extremely complex models which nobody understands, but you will still be able to predict, maybe in a little bit better, but they're using actually a simple regression mechanism that everybody, every market here, every category manager can very easily understand And that helps us to have also kind of business teams to be designers of the system. So that is something which is certainly, at least in some areas of a tech stack applicable.

So very clear restrict your tech stack is always a very important principle to apply if you want to remain scalable around your technologies. And the other one is and this is not so much about the restriction but it's about which kind of tools you are using So there's obviously for all kind of use cases. So we are currently building up a kind of an observability strategy And if you want to build a really a full-fledged observability stack, then you end up not only with one solution.

Even if many suppliers tell you that you need only Dynatrace or only DataDoc or whatever, you end up easily with a stack that is 7, 8, 9, 10 different kind of solutions for all kind of different use cases. But you need to ask yourself do you want to have those kind of large and wide tech stack, or is maybe one single solution for a problem also good enough? And one solution that we applied that has helped us to keep this tech stack very narrow is for every problem there can only be one tool. Engineers or engineering teams can choose what is the kind of the tool to choose, but if you want to use something else, we stop with the old tool to use and we will migrate drastically to the new tool And that helps certainly in a very scalable way to keep the tech stack very lean. So one thing what is very interesting and if you look a bit to the benefits, one thing what is very interesting with a narrow tech stack is that you get actually the possibility to have flexibility between teams. So we are by now around 400 engineers.

But if you have a narrow tech stack, if you have a tech stack where you have only a couple of different tools that you need to know, if you work with a service, then you can also move to a new service because you will find the same tools with the same principles again. If you go the opposite way, if you go very wild west, then you immediately end up in a kind of a deadlock situation where you can no longer move easily between engineers, between teams, and there is kind of a very limited setup that you have. Certainly also the cross team kind of help is automatically possible if you have the same stack across all teams. And the third one, and this is probably the most important one we tracked over time how long does it take that the developer becomes fully productive? So what we measure here is the number of pull requests that the developer can do per week or per month. It is a very rough measure. It certainly has a lot of drop downsides, but it is kind of an indication.

And what you see is if you have a narrow stack, then in essence most developers can become pretty productive already after the latest months And with juniors obviously a little bit different with seniors, but you don't need much more than four weeks to be pretty much into the tech stack. So that is certainly a very interesting one. And then it is also interesting to see if you use the latest technologies or more battle tested technologies. So in the beginning we looked so we started Picnic in 2015, and there was, for instance, scala, a very sexy technology.

Everybody was raving about it. If you now think about it, probably Scala is a little bit less interesting, but nobody was saying. Everybody was saying use Scala instead of Java.

By now, it has been a pretty smart choice to stick with Java, if you think a bit about the complexity or kind of the breadth of the ecosystem that we can build on now, and that is certainly more on the level of using a kind of a battle tested technology. And then, and this is probably the most important one, you certainly need to know a bit what you don't know. Usually, you have also stuff that you don't know, and the thing is that if you have, if you work with technologies that are too cutting edge or where you don't have too much experience internally, then simply the unknowns unknowns and the knowns unknowns you don't have internally certainly not, but also nobody in industry. That was, for instance, with the Aka framework in 2015 around Scala, one of the kind of the big topics.

Nobody really knew how to use it, but it was pretty unclear. Will this at some point? will people figure this out? Well, the reality is now that it certainly didn't work out so well for Scala and therefore the kind of Java world has really catch up. And if you think a bit about typical graphs that you see from different kind of companies, about how many new libraries are used, how many kind of articles are published, then you have always for the new shiny technologies. They are very well covered, both in new projects and both in new articles, et cetera.

And the real power here is and that is especially for us CTOs very important. You need to withstand a bit of kind of this hype-driven development where there's a tendency that you run with the shiny technology because it is on the right-hand side of this graph, because everybody is talking about. We will also see how the latest kind of HTTP kind of hype will work out in this respect. So if you look now a little bit about what actually can be learned from this.

Then we have one interesting thing If you, if you think about which kind of problems do you want to solve or how you solve a problem, then it's always about a simple, battle-tested technology. But there's one more cultural element that we really had to apply as a kind of a transformation in the organization And the culture is that you need to transform what, you need to train your engineers not to fall too much in love with technology The technology that's a use, but in love and fall love, fall in love with either the problem or the kind of the solution that they are working on. So therefore, if everybody becomes really happy to work on kind of meaningful having meaningful work, then you certainly can have a very productive team.

So maybe one last thing that I wanted to mention here what we saw is a very concrete problem. So all of us are probably working with pull requests and reviews. But there's one challenge, because you have actually teams that are inside a team reviewing the pull request, but the cross-team reviews are usually not working so well. If you have very strong product teams, then usually you commit to your own code base, but working on other code bases is not so strong.

So therefore we introduced something what we call a kind of a leadership dashboard, and this kind of a dashboard shows the following It is a dashboard that shows how much our engineers are reviewing not only their own code but also the code of other engineers, and you actually score very well if you review more of other teams work, and then basically there's a feedback cycle and you can even earn some picnic coins et cetera. But the real thing is to gamify the review process where you actually get feedback from, or where you get some feedback and encourage those that are providing feedback to other teams. So that has helped quite a bit.

So if you look now a little bit and this is the last slide if you look a little bit into what did we learn from this kind of journey to grow from kind of a team of three engineers to now close to 400 engineers, then there are probably three main things that stand out. So the first thing is everybody at some point needs to reorganize, but don't reorganize too early. Actually, it needs to be super painful in the organization. At the point that you organize And this is a very counterintuitive because you try to proactively organize that doesn't become painful, but you have no idea how the next organization should look like if you don't feel in a very heavy way to pain.

Second one is, certainly, while you're scaling, you certainly need to tune a bit your tech culture and your tech organization. And the third one is at some point, leadership is responsible for efficiency, not only for delivering on time a kind of a specific product or feature, but also about the efficiency of the organization. So these are kind of three things that I wanted to give you on your way, and that brings me to the end of my presentation. Thanks for your attention, daniel, i hope I didn't pressure you too much. Got a message. We as we're all hungry and we have to hurry up a bit, but, like I would give space for one question now and many questions later.

So is there one question? No question, you're very hungry guys, right? So there's one. Okay, this is a very quick question. So you mentioned Scala, but what about Kotlin? So the Scala topic was a big topic in 2015. Kotlin is obviously a big topic now, a big topic for the last few years. We have not yet completely decided what to do.

To be frank, we are using it now in a couple of services to try a little bit on how it goes, but at this point in time, our stack is mainly Java and Python based. So essentially the entire machine learning park is Python and the rest is Java based, but certainly Kotlin is, I think, better positioned, since Scala has been in 2015. Yeah, so Kotlin is a default answer now, right, there's a bit of nuance to it, but we can probably agree on the part of it. Okay, thank you. So, daniel, thanks a lot.

It was a pleasure having you here.

2023-06-13 11:06

Show Video

Other news