AWS Summit ANZ 2023: Build highly scalable serverless applications on AWS | AWS Events
Good afternoon, everyone. Welcome to the session: Build highly scalable serverless applications on AWS. My name is Frank. I'm a Senior Solutions Architect at AWS. Joining me today is Cheyne, the SVP of Global Engineering at Playvox. In the last 18 months, I have been working with Cheyne and his team building and scaling their serverless application, their workforce management SaaS product on AWS.
Today, we would like to share the learnings, the techniques, and architecture patterns that can help you to scale your serverless applications as your business go through rapid growth or have a hyper-scale moment. I will take you through how you can optimise time-sensitive synchronous requests. Use caching on different layers to improve performance at scale. Leverage asynchronous architecture patterns. And use multiple AWS account to scale your application and your engineering teams.
You will hear from Cheyne on the Playvox journey, building their SaaS product using serverless microservices in an event-driven architecture. This is a level 300 session with the assumption that you are already familiar with AWS and basic AWS services. To put all the techniques and architecture patterns together, let's close our eyes for a second and try to visualise a typical three-tier web application you have worked on before, regardless of the technology used. When your user opened your web application, they first get a set of static web pages. As your users navigate through your web application, the application will interact with your application backend, send and pull dynamic content through a set of APIs.
You have one or more data storage to save user data and application states. You have some background processing jobs for things like data processing or integration with other internal or external systems. When we implement these components in serverless, in its simplest form, this is usually where we get started. We have the static web content hosting the Amazon S3 server through Amazon CloudFront. The interaction with the dynamic content are implemented using Amazon API Gateway and AWS Lambda. We use Amazon Aurora and Amazon DynamoDB as the transactional databases.
And we use Amazon SNS for sending push notifications and Amazon SES for sending emails. This is a very basic form of a serverless architecture. What I would like to call out is even this basic serverless architecture can give you high availability and scalability out of the box. It can go a long way without the need for extra tuning. The techniques we cover today are the best practices when you are growing into a large scale where you want extra flexibility to fine-tune your application.
Let's see what scaling may look like. The scaling in this first hypothetical example is influenced by organisation's business growth. The application might have always been stable and performed over a long period of time until it hit a certain threshold. In the second example, your load might be usually low, but occasionally, you get a burst of traffic due to special events.
For example, a Christmas sales in an e-commerce application. And the techniques we cover in this session can potentially help you to reach tens to 100,000 requests per second and potentially millions of users. Let's start with the most common synchronous request. Synchronous requests are for business transactions that your application users expect the instant responses. In our initial architecture, synchronous request are build on top of Amazon API Gateway, AWS Lambda, Amazon Aurora, and Amazon DynamoDB. Let's start by looking at Amazon API Gateway.
Behind the scene, API Gateway is implemented through a full fleet of nodes across different Availability Zones. As a result, it's highly available and scales very fast. It can go from zero to hundreds of thousands of requests per second almost instantly without the need to rewarm it. The default quota for API Gateway is 10,000 requests per second in an AWS account in a Region.
If it ever becomes a bottleneck to scale your application, you can easily contact the AWS support and request a quota increase. Practically, API Gateway is not likely to be the bottleneck in your system. The bottleneck is more likely to be in the downstream systems like compute or database. API Gateway got a feature called request validation that can fail invalid request at the earliest point and reduce unnecessary costs to your downstream systems.
This will save the valuable resources on your downstream systems and provide better scalability for your system overall. The next component in line is AWS Lambda. Lambda is a serverless compute engine with building security, scalability and high availability out of the box. To handle a large number of synchronous requests, let's first try to understand how Lambda concurrency works. Each Lambda invocation runs in a dedicated micro VM. And we call this micro VM the Lambda's Execution Environment.
A single execution environment can process one request at a time, and this is to ensure your Lambda function invocation runs in a secure and isolated environment. When a new request come in, if there's an existing execution environment available, it will be reused. And if there's no available execution environment, Lambda will start a new execution environment for that request. And the Lambda concurrency is defined as the total number of execution environment processing requests at a given point in time. Lambda concurrency is different from the request per second. Remember, a Lambda execution environment can be reused multiple times within the same second right after the previous request finishes.
And to calculate the Lambda concurrency your application needs, a practical formula is that your concurrency will equal to the average request per second, multiply the average request duration in seconds. For example, if your application is processing 10,000 requests per second at its peak and each request takes around 100 milliseconds on average to complete, your required concurrency will be 1,000. We now understand how Lambda concurrency works. Let's look at the account level concurrency you need to be aware of.
Lambda has a default account level concurrency of 1,000 for each Region. And this can be increased to tens of thousands through AWS support. Before you take any applications to production, you should do a load testing to make sure the concurrency quota in your AWS account can meet your application needs. And configure monitoring and alert on your account concurrency quota so when your application is approaching the quota limit, you can proactively contact AWS support and request a quota increase. And when you expect your application usage to grow quickly, or if you are planning for a bigger event, run frequent load testing and reviews to make sure your account concurrency quota fit your scaling needs.
Now let's look at how we can write better Lambda functions to improve performance at scale. The core principle here is to make your Lambda function package really small so the Lambda execution environment can start quickly, process requests quickly, and get reused quickly. This will improve the request response time and also the requests per second your Lambda function can handle. You can achieve this by having small single-responsibility Lambda functions and avoid writing a large Lambda monolith. With this practice, you will end up with many small Lambda functions to support your application.
So it's very important to have CI/CD to automate the packaging unit testing and the deployment of your Lambda function to different environments. Run minification of your production code to further reduce your function's package size. Only introduce libraries and dependencies for your functions runtime. For example, instead of using frameworks such as Flask in Python or Express in Node.js, you can consider using API Gateway's native routing feature to route request to small individual Lambda functions. You should also take advantage of the execution environment they reuse to improve the performance of your function.
Like in this example, initialise the SDK clients and the database connection outside the function handler. So the subsequent invocation reusing the execution environment do not need to initialise those resources again. Next is to optimise your Lambda's resource setting.
AWS Power Tuning is an open source tool that can help you to optimise your Lambda's resource setting for performance and cost. You can run Power Tuning to your AWS account, and the tool will collect all the executions and present you a graph that can help you to find the best configuration for your Lambda function. Like in this example, the pink line is the execution time and the blue line is the cost. As we can see, 1,024 megabyte is the sweet spot for this Lambda function.
Now, your API Gateway and Lambda can process tens to hundred thousand of requests per second. The next bottleneck is likely to be happening in your database. Each of your Lambda function invocation may open and close relational database connections at a very high rate, and this may exhaust your database connections and your database resources.
Amazon RDS Proxy allow application to pool and reuse connections established with the database. It will make the connections more efficient and make your application more scalable. RDS Proxy support many Amazon Aurora and RDS database engines, and you can find more details in this link.
It's not only the database connections that may cause the bottleneck. Repeated database read queries in high volumes consume your valuable database resources as well. Amazon ElastiCache is an in-memory caching service that can reduce repeated read operations on your database. When you're using DynamoDB, you can also use DynamoDB Accelerator in combination. DynamoDB Accelerator is an in-memory write-through cache.
For example, when you have a big sales event in an e-commerce application, some popular product information served through API GET request can be cached on API Gateway and you might not need to query it every time from your data store. You might only need to refresh it every few minutes to an hour through a TTL setting on your API Gateway. Now, we got caching on all the layers, including edge caching in Amazon CloudFront, API GET request and response caching in API Gateway, Lambda caching through execution environments reuse and database caching through Amazon ElastiCache and DynamoDB Accelerator.
We have now optimised the synchronous request for time-sensitive business transactions where your users are willing to wait for instant responses. In other parts of your application, requests might not be that time critical. For example, sending a notification or email usually just needs to happen sometime in the near future. Your app users do not need to wait there watching and waiting for it to happen. However, our initial asynchronous design keeps your users waiting.
On the technical side, synchronous request means we cannot apply batching to improve system throughput. When there's a large volume of requests, synchronous Lambda invocations consume your account-level Lambda concurrency. It may impact the more time-sensitive Lambda functions and business transactions.
The downstream of a Lambda function here, shown in this example, is SNS and SES. They are both highly scalable. But in the real world, your downstream system might be a legacy application that's less scalable or a third-party integration where you do not have direct control.
The synchronous design may easily overload those systems when there's a large volume of requests. And lastly, some long-running backend jobs may also go beyond API Gateway's 29 seconds timeout, which is a hard limit and cannot be increased. So how can we do better? Asynchronous messaging is here to rescue. Instead of really sending out those emails and notifications right away, we can drop them in SQS queues for the worker Lambda function to pick up from there and send them in batches in its own time. Once the request is in the SQS queue, we can return to the app user and saying, "Your message is with us.
Rest assured we will send them shortly". Under the hood, we can process those messages in batches. These solve the API Gateway's 29 seconds timeout hard limit and improve the throughput of your system.
SQS has a maximum concurrency feature that allows you to control the Lambda function concurrency per SQS queue. Lambda also offers a reserved concurrency setting that allows you to set a concurrency ceiling at the Lambda function level. Both settings allow you to cap the concurrency for less time-sensitive requests and leave your Lambda concurrency for more time-critical requests. This will also allow you to protect your downstream system from being overloaded. What is even better is that API Gateway got native integration with SQS, so you don't need a Lambda function to enqueue the requests. This further reduced the Lambda concurrency needed to execute asynchronous requests.
SQS is not the only AWS service there that can help you build decoupled microservices using asynchronous messaging patterns. You can also choose to use SNS for pub/sub, EventBridge for event bus, and Amazon Kinesis for data streams. Apart from making your application more scalable, another benefit of using asynchronous messaging patterns is it allows you to build more decoupled microservices, and this enable the microservices to be deployed independently, scale independently, and fail independently. And this in turn allows you to have autonomous teams managing those microservices end to end. As your application becomes more complex with many microservices and your engineering teams grow, it may become more complex to manage account level quota and control the blast radius of your microservices. It may require careful calculation and coordination between different engineering teams.
This might not be ideal because we want our teams to stay autonomous and move fast. If your business get to this level of scale, consider having different AWS account for different areas of your application. In this simplified example of an e-commerce application, we can have product-related services managed by Team A running in AWS account A. And we can have other related services managed by Team B running AWS account B1. And email notification services also managed by Team B but running in AWS account B2. This setup allow Team A and Team B to manage their own dedicated service quota without the unnecessary inter-team coordination.
The account separation between orders and email services allows each service to operate independently at its full capacity without need to constraining the throughput due to Lambda concurrency considerations. So in the real world, try to design your multi-account structure to fit your business needs and evolve your design as your business evolves. So to recap, we look at ways of improving synchronous request at scale.
We applied caching on different layers to improve the performance and the scalability of your application. We turned non time-sensitive requests to asynchronous requests. And we explored a multi-account structure to allow your microservices and your teams to operate more independently at scale. Next, let's hear from Cheyne on the Playvox story and see how they scaled their SaaS product on AWS. Thanks, Frank.
So first off, a bit about Playvox. So we're a digital-first workforce engagement platform. We're a global company, and we have three engineering teams distributed across the globe.
So we're in Australia, the UK and Colombia. Today, I'm going to take you on the journey of our Australian engineering team and our journey to serverless. So our early architecture was typical of what you would see in most web stacks. So we had Amazon CloudFront at the front serving up our web pages, routing via ALB towards Amazon ECS for hosting our APIs, and it was backed by RDS.
However, as our customer base grew, what we started to find was we started to have some challenges with this architecture. Now, some of these came out in operational challenges, and it was starting to become harder to manage. Architecturally and from a product focus, our platform is becoming more integrated with other SaaS platforms, so the requirements were changing. So what we really needed to get to was an event-driven architecture.
Now, the event-driven architecture with serverless was a natural fit. Now, today, our architecture is a mixture of these patterns that I'm showing on the screen where we use the combination of the asynchronous patterns Frank spoke about earlier, and we use different data stores and then publish events between our microservices domain. So our key benefits of having a domain-based microservices architecture is that we can scale up and scale down in different areas, and so we can throttle the throughputs in certain areas to manage the isolation. We're able to create a better DevOps alignment with our teams. So as we grow as a business and we're bringing on more teams, we can align the teams better to the different domain services as we scale the org. So as I just mentioned, so one of the key benefits was our DevOps alignment and the ability to enable and empower teams to own their product end to end.
So we give our teams the whole domain service to run and own. They can manage that from a CI/CD pipeline, they deploy it to production, and they're able to get the observability platform to give them the information they need to measure and learn. This increases their ownership and their accountability in the teams, and they're able to run and own the services themselves. The other key benefit, and Frank touched on before around scaling, is that we're able to scale effortlessly.
So before, we'd have to manage to go through the different parts of scaling. Over 12 months, as you can see on the chart there, we were able to go approximately three X and we had to make minimal to no architectural changes. We're also able to meet increasing bursting traffic loads, and we still didn't really have to make any changes. So running high throughput services was really easy. Another benefit was cost. So as compared to our original architecture, as we normalise the cost per invocation, our new serverless architecture is actually three times more efficient.
So we're actually getting the benefit of scale and scaling up effortlessly, but it's also a lot cheaper to run. However, it wouldn't be a talk, and it's my favourite part of most talks, is when we go what broke and what did you learn? So Frank spoke before about AWS account limits. So quick show of hands, who's broken AWS quota on their account? Yeah, so I'm not alone, alright. So we found out the hard way that, yes, Lambda Edge functions and authorisers can also be included in that. And so we saw some interesting behaviour. So we use reserve concurrency in certain areas of our architecture to manage that risk.
But you really need to continue to pay attention to those limits as you grow. The other part is we really invested in observability early on, and so we're able to measure and learn from our architecture. And on those incidents where we did have problems with our concurrency limits, it was very obvious and we're able to get the mean time to recovery quite low because we knew where the problem was. So it's really critical for understanding how your architecture is performing.
The other benefit of it is that we're able to leverage this to then build out FinOps dashboards. And so we're able to give developers more information about the cost of their product so that they're able to look at it and optimise the solutions. One of the other benefits that Frank spoke about or touched on it earlier is caching. So we had this conversation going for a while around should we add a cache to a particular service.
And we deferred it to the last responsible moment as you do as an agile way. But what we noticed was that as soon as we implemented it, we kind of regretted not doing it sooner because as soon as we implemented it, it not only reduced the number of Lambdas that were invoking because we put this at the API Gateway, but it also took a significant load off our database. And our databases were starting to scale down because they were using... And so we found that it took a lot of effort out of that as well. So what's next as we go through the rest of our journey? So as we move forward, we're going to continue to invest in the control plane because we've started to find that we're managing serverless at a particular scale now that we need to do some custom engineering on that. We're going to start looking at workload optimisation.
So, so far, we've been looking at APIs and the CPU processing on that, but we're starting to get more into data processing. And so we're starting to look at the patterns that we need to do around that so we can optimise there. Based on the way we run our multi-tenancy on our SaaS platform, we're starting to look at advanced patterns around canaries and what are the best use cases for us. So there are some tools that we can use out of the box, but there are going to be some standard... sorry, some custom patterns that we're going to implement ourselves.
And as Frank also touched on, around the performance and load testing. So we're good at that, but we need to get better, and we're always going to be striking that right balance between agility and getting the product out the door and safety. So if I look back on our journey and the way that we've gone, so our Playvox cultural values, we have five pillars.
Here's three of them. So we're always learning. And so as I look back on our journey, solutions may require custom solutions or alternative techniques. There's the pattern, but you might have to apply it to your use case. So don't be afraid to experiment. We always live by what worked yesterday may not be the best solution for today, so we love to keep on modifying, listening and learning, and we're engaged with our AWS account team so that they can help us with the latest techniques and so that we try to adopt those as we go.
We do it now. So we try and move fast. We don't procrastinate. And that means that there's no best time to start. So we just went, there's no best time, let's just do it.
So we're not afraid to fail, but we try to fail fast. And then, lastly on there, change is good. We don't fear change and we love the engineers being out of their comfort zone. We empower engineers to own the outcome and we support them in the way we can. So our goal is to provide the engineers and teams with the information that they need to be successful. So thank you for listening.
And I'm going to welcome Frank back onto the stage to do the closing. Thank you, Cheyne. We are getting to the end of this session. Hope you found this useful. If you are craving for more information, please scan the barcode or take a picture of this slide so you can explore more information that we covered today and also more general scaling techniques we didn't get time to cover. So in particular, Serverless Land is a site that brings together the latest information and patterns for solving common business problems using serverless.
And Amazon Builders' Library is a collection of articles on how Amazon develop and execute and operate large applications at scale. And again, Skill Builders is a place where you can explore hundreds of free courses, hands-on labs and games developed by the AWS Training and Certification Team. So if you find this talk helpful and want to hear more content like this, please make sure you complete the survey and give this session a five-star review. This will help us... This will help us to prioritise and plan for future sessions and educational content. Thank you very much, everyone, for attending this session.