Lambda tuning for DBAs – 130X improvement

Problem description - Initial latency numbers

With the advent of cloud, people are using lambda functions to automate repetitive tasks . You can use python , node js etc .. as a programming language for your lambda function . Writing function is one thing but getting it to work optimally can be challenging sometimes . We are presenting a performance tuning scenario where we increased Lambda performance by more than 130X (130 times) – 30,000 ms(before tuning) Vs 230 ms(after tuning)

Our lambda function does bunch of inserts every few seconds and initially it was taking around 30 seconds and generating lot of errors . We are using Postgres as our backend server for this service

Below are the initial performance numbers. As you can see it was taking a max of 30 seconds for each execution and there were lot of errors ( Please ignore the part of graph where the latency was good in between – that was because there were fewer invocations and a restart of ingestion task)

Region change - Make sure your backend is in the same region

Initially our backend server was in Ohio region whereas our Lambda function was in California region , Once we shifted our postgres server to California region , average duration went down to 15 seconds (earlier it was 30 seconds )

Below diagram depicts the latency improvement after this change

How do you find where the time is being spent in Lambda ? Time breakdown will help in identifying the root cause

AWS x-ray is a tracing feature that can give you the breakdown – Time taken at each stage of execution.You can enable it by going to ‘configuration section’ of your lambda function . By default active tracing is not enabled and you can enable it to see the visual map

Once you enable active tracing you will be able to see a service map like below (NOTE – This is a sample image for your reference and not the one from our tests)

Lambda is closing the connection properly ?

One other issue we faced is about connection management . Depending on your runtime (Nodejs/python/..) please make sure that you are closing and managing connections properly . In postgres creating too many unwanted connections can add lot of overhead(using pgbouncer/pgpool can help)

As our lambda function was doing frequent inserts , we wanted to check the mean exec time of inserts . As you can see below our inserts were doing fine (0.2 to 32 milliseconds). While the total lambda was taking 15 seconds , inserts only took few milliseconds

Next thing we noticed is lot of idle connections .We fixed our lambda to close the connections properly and not to create unwanted connections . We also added pgbouncer to pool connections

Other clue came from below stats , Lot of sessions were being abandoned which is not good

Please note that you can use parameters like idle_session_timeout etc.. to control this behavior NOTE: When you set idle_session_timeout at global level you have to be extremely careful not to impact your needed application sessions .

One other thing we did is to merge multiple inserts into one single statement which helped in reducing the overhead

Cold starts can be a problem

When a lambda function is invoked , it internally downloads the code from s3 and prepares a runtime environment before actually executing the code . This stage is defined as “cold start” . Repetitive executions of the same lambda functions will reuse the environment and this is called as “warm start” . Ideally we want to see as less cold starts as possible

You can use below query (Log insights query) to see if cold starts are happening

filter @type="REPORT"

| fields @memorySize / 1000000 as memorySize

| filter @message like /(?i)(Init Duration)/

| parse @message /^REPORT.*Init Duration: (?.*) ms.*/
| parse @log /^.*\/aws\/lambda\/(?.*)/

| stats count() as coldStarts, median(initDuration) as avgInitDuration, max(initDuration) as

maxInitDuration by functionName, memorySize

Query source – AWS docs

In our case cold starts are happening sometimes (below is the screenshot for 12 hour range) . If you are seeing excessive cold starts , you should address this on an urgent basis . You can fix cold starts by having provisioned concurrency and other methods

Memory and computing power

You can configure the amount of memory allocated to a Lambda function, between 128 MB and 10,240 MB

In our case the default was 128 MB and changing it to suit our needs helped . You can also leverage https://github.com/alexcasalboni/aws-lambda-power-tuning to tune your memory/compute power settings

After changing it to 512 MB there was huge improvement (as shown below)

Cloudwatch logs - Check for errors

You can run cloudwatch log reports to check for errors etc..

You can check average latency using something like below

filter @type = “REPORT”

| stats avg(@duration), max(@duration), min(@duration)

by bin(5m)

You can check for errors using something like below

fields Timestamp, LogLevel, Message

| filter LogLevel == “ERR”

| sort @timestamp desc

| limit 100

Throttling issues

Sometimes you might notice throttling issues as seen below

In our case originally we set a low value for reserved concurrency and increasing it helped us . Please set these values according to your needs . Also there is account level global concurrency limits that might trigger throttles , Please open a ticket with AWS support to raise this limit as needed

Try and reuse connection

Below is an excerpt from AWS docs and it explains how opening of a database connection outside of the function handler can help

Usage of SQS queue can help

Depending on your use case using sqs queue can help , you can use sqs dlq to re execute failed invocations and also throttle the executions via queue

Conclusion

Following above steps we were able to reduce the latency average to 230 milliseconds (from 30,000 milliseconds). Lambda management is different when compared to cron jobs or other methods that we were using in non-cloud environments. Please also check our other performance articles – Pgbouncer multiple instances , Postgres and temporary files , Pgpool and performance tuning etc..

Looking for Postgres support ? Please reach us at support@klouddb.io

Share this Post :