AWS Lambda Reserved Concurrency v/s Provisioned Concurrency Scaling

5 min readJan 4, 2022

AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. You can trigger Lambda from over 200 AWS services and software as a service (SaaS) applications, and only pay for what you use.

Whenever you deploy your code within AWS Lambda it creates an executable package and that is executed whenever you invoke the Lambda function

The best part of the AWS Lambda is that you can invoke multiple instances of the same Lambda function to get the maximum parallelism in you architecture. But sometimes because of this parallelism there can be multiple issues like throttling within the processing pipelines.

Thus AWS provides you to configure the concurrency limits of your Lambda functions to have better management on the multiple invocation as per your use case

Types of Lambda Concurrencies

There are two types of concurrency controls available

Reserved concurrency — It guarantees the maximum number of concurrent instances for the function which can be invoked. When a function has being with a reserved concurrency configuration then no other lambda function within the same AWS account and region can use that concurrency. There is no charge for configuring reserved concurrency for a function.
Provisioned concurrency — This concurrency initializes a requested number of execution environments so that they are prepared to respond immediately to your function’s invocations. Note that configuring provisioned concurrency incurs charges to your AWS account.

Reserved Concurrency Configuration

Concurrency is defined as the number of requests that your function is serving at any given time. When your function is invoked, Lambda allocates an instance of it to process the event. When the function code completes the execution, it can handle another request.

If the function is invoked again while a request is still being processed, then another instance is allocated, which increases the lambda function’s concurrency. The total concurrency for all of the functions in your account is subject to a per-region quota that is defined by AWS. Thus the concurrency limit combining all the Lambda functions defined in one region of your AWS account cannot exceed the total concurrency limit which is defined by AWS for that specific region.

Note: You need to make a note on the maximum total concurrency limit for your region while designing the Lambda based architecture. You can still increase this limit by contacting AWS support. To get more info check this link

Steps to Configure Reserved Concurrency:

Open the Functions page of the Lambda console and select the function.
Choose Configuration and then choose Concurrency.
Under Concurrency, choose Edit.
Choose Reserve concurrency. Enter the amount of concurrency to reserve for the function and then click on Save.

You can reserve up to the Unreserved account concurrency value that is shown, minus 100 for functions that don’t have reserved concurrency. To throttle a function, set the reserved concurrency to zero. This stops any events from being processed until you remove the limit.

Advantages of using Reserved Concurrency setting

Other functions can’t prevent your function from scaling — All of your account’s functions in the same Region without reserved concurrency share the pool of unreserved concurrency (Maximum Concurrency Limit). Without reserved concurrency, other functions can use up all of the available concurrency. This disables your function from scaling up in critical period. Thus defining the reserved concurrency will prevent this from happening.
Your function can’t scale out of control — Reserved concurrency also limits your function from using concurrency from the unreserved pool, which caps its maximum concurrency. You can reserve concurrency to prevent your function from using all the available concurrency in the Region, or from overloading downstream resources.

Provisioned Concurrency Configuration

When a Lambda allocates an instance of your function, the runtime loads your function’s code and runs initialization code that you define outside of the handler. If your code and dependencies are large, or you create SDK clients during initialization, this process can take some time. When your function has not been used for some time, needs to scale up, or when you update a function, Lambda creates new execution environments. This causes the portion of requests that are served by new instances to have higher latency than the rest, otherwise known as a cold start.

By allocating provisioned concurrency before an increase in invocations, you can ensure that all requests are served by initialized instances with low latency. Lambda functions configured with provisioned concurrency run with consistent start-up latency, making them ideal for building interactive mobile or web backends and latency sensitive microservices.

Each version of a function can only have one provisioned concurrency configuration. This can be directly on the version itself, or on an alias that points to the version. Two aliases can’t allocate provisioned concurrency for the same version.

Steps to Configure Provisioned Concurrency:

Open the Functions page of the Lambda console and select the Lambda function.
Choose Configuration and then choose Concurrency.
Under Provisioned concurrency configurations, choose Add configuration.
Choose an alias or version of the lambda and then enter the amount of provisioned concurrency to allocate. Then Choose the Save option.

If you change the version that an alias points to, Lambda deallocates the provisioned concurrency from the old version and allocates it to the new version. You can add a routing configuration to an alias that has provisioned concurrency. For more information, see Lambda function aliases.

Conclusion

This blog will help you to understand and have more better control on the Lambda functions. This understanding is an important step while designing serverless based architectures using AWS Lambda. Using the concurrency configuration there are a number of throttling issues which can be avoided at multiple layers in the architecture.

I hope this blog was helpful for all the Cloud Engineers working on AWS and who are facing such issues and looking for a permanent solution. Also in case of any questions/doubts about AWS Lambda, please feel free to reach out to me in the comments below.

By Rajas Walavalkar

References

Lambda function scaling: https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html
Configuring concurrency with the Lambda API : https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html#configuration-concurrency-api
Optimizing latency with provisioned concurrency : https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.html
Managing provisioned concurrency with Application Auto Scaling : https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html#managing-provisioned-concurency