Understand

Provisioned Concurrency for

AWS Lambda

Artem Arkhipov, Full Stack Web Expert at Techmagic

We love AWS Lambda for:

- Pay-as-you-go pricing model

- Burst scaling

- Faster development & TTM

- No infrastructure maintenance

What we wished during 5 years:

- Monitoring/debug tools

- Local development

- Cope with cold starts

- Better packaging and deployment

- Longer execution and workflows

sls plugins

from 5 to 15 min

warming approaches

provisioned concurrency

How AWS Lambda works?

Container

Container 2

Container n

Container Lifecycle

Screenshot from Re:Invent (2017) Video

Initialize

Code

Language and Cold Start Correlation

Lower is better

Initialisation Time

Cold Start depends on:

- Function's language

- Initialization code / out of handler

- VPC affiliation

- Package Size

- Memory allocation

Probability of Container to be destroyed

data source: investigation by Mikhail Shilkov (mikhail.io)

Serverless developers:

AWS Lambda:

*keeps containers alive for ~10 minutes of inactivity*

Let's call it every 5-10 minutes

Classic Solution

module.exports.handler = async (event) => {
  if (event.isWarmUp) return;
  
  // do the job here
};

The Simplest Approach

javascript

{ "isWarmUp": true }

every 6 mins

Container

1 X

constantly

exports.handler = async (event) => {
  // if a warming event
  if (event.isWarmUp) {
    if (event.concurrency) {
      // run N parallel calls to the same LF
    }
    await new Promise(r => setTimeout(r, 75));
    return 'done';
  }
  
  // main job code
}

More Parallel Containers

javascript

{ "isWarmUp": true, concurrency: 5 }

every 6 mins

call itself 5 times in parallel and add a short delay

Old style warm-up possible with:

- Code own solution considering main principles

- External packages

- External service (lol)

- Serverless plugin

npm serverless-plugin-warmup

npm lambda-warmer

const warmer = require('lambda-warmer');
exports.handler = async (event) => {
  if (await warmer(event)) return;
  // main job
}

AWS Lambda

Provisioned Concurrency

It does literally the same.

It keeps warm containers for the function.

Text

~ 2 - 5 mins

Let's Test It

- Standard, warm-up and provisioned approaches

- Heavy initialization job (one-time)

- Simulate load spike to 500 req in ~15 seconds

- ~200MB package size

const job = fs.readFileSync('./9MB_FILE.txt').toString().split('');

module.exports.handler = async (event) => {
  await new Promise(r => setTimeout(r, 450)); // main job delay
  return { statusCode: 200, body: {}};
}

javascript

	Standard AWS Lambda (1 GB/s)	Provisioned Concurrency (1 GB/s)
Requests	0.2 $ / 1 million	0.2 $ / 1 million
Execution Duration	0.000016667 $	0.000010841 $
Provisioning	0	0.000004646 $

Pricing

1 h Idle Time (total)

1 h Exec Time (100% util)

0.0167 $ / h x 1 concurrency

0.06 $ / h

0.0057 $ / h

0.167 $ / h x 10 concurrency

Utilization %

COST

Standard

Provisioned Concurrency

You should never enable provisioned concurrency carefreely

Old solution:

Example:

1 Lambda

1GB memory

10 concurrency

1 month

10 short invocations (100ms)

Per each 6 minutes (10 / hour)

24 hours, 30 days

(0.000016667/10) * 10 * 10 * 24 * 30 =

0.12 $

Provisioned Concurrency:

10 containers

0.000004646 $ / second

24 hours, 30 days

0.000004646 * 10 * 3600 * 24 * 30 =

120.4 $

0.000016667 * 10 * 10 * 24 * 30 =

1.20 $

1 sec per inv

T3.Med

116 $

T3.Med

Provisioned concurrency feature has a great potential, but at the moment it suits quite specific use cases and is not a "silver bullet".

Provisioned concurrency

- Official and reliable feature from AWS

- Cost-effective in case of high utilization

- Can provide scale to huge amount of containers

- Integration with AWS Auto-Scaling

When to use

Provisioned Concurrency

Regular, predictable

high load spikes

provisioned concurrency

250

concurrency

17:30

19:00

Reporting time

Scheduled provisioning

lambda warmer

250

concurrency

17:30

19:00

Reporting time

provisioned

You can combine the approaches

Scheduled provisioning

AWS Auto-Scaling, via CLI

AWS CLI

$ aws –region eu-west-1 application-autoscaling put-scheduled-action 
  –service-namespace lambda –scheduled-action-name MyScheduledProvision1 
  –resource-id function:report-lambda:provision 
  –scalable-dimension lambda:function:ProvisionedConcurrency
  –scalable-target-action MinCapacity=250,MaxCapacity=250 
  –schedule “at(2020-02-20T17:25:00)”

With a separate scheduled Lambda and AWS SDK

let lambda = new AWS.Lambda();
let params = {
  FunctionName: 'report-lambda', 
  ProvisionedConcurrentExecutions: 250, 
  Qualifier: 'provision'
};

lambda.putProvisionedConcurrencyConfig(params, function(err, data) {
  if (err) console.log(err, err.stack);
  else     console.log(data);
});

command line

javascript

Provisioning according to utilization

AWS Auto-Scaling, via CLI

AWS CLI

$ aws –region eu-west-1 application-autoscaling put-scaling-policy 
  –service-namespace lambda 
  –scalable-dimension lambda:function:ProvisionedConcurrency 
  –resource-id function:report-function:alias 
  –policy-name UtilTestPolicy –policy-type TargetTrackingScaling 
  –target-tracking-scaling-policy-configuration file://config.json

command line

{
“TargetValue”: 0.75,
“PredefinedMetricSpecification”: {
	“PredefinedMetricType”: “LambdaProvisionedConcurrencyUtilization”,
  	...
 }
}

config.json

trigger increasing concurrency

Provisioning according to utilization

provisioned

Time

Wavy and smooth load changes + AWS autoscaling

13:45

14:00

14:15

At the moment of preparing this presentation, January 2020

There are some issues with this approach.

Utilization autoscaling works in not effective way and going to be fixed

What we learned today

- Always-on provisioned concurrency is very expensive

- Warmers are still in the game

- Provisioned concurrency needs smart approach

- It may work for really high-load systems

- It may work if cold start's impact to the business is more expensive than enabling the feature

installs amount of serverless-plugin-warmup increased by 30% during last two months

"Make the “prewarmer” events 15% more expensive than a regular Lambda invocation if need be, to cover the overhead of scheduling or whatever other magic is happening. But don’t take away my scale-to-zero billing model. That’s what I came to Lambda for in the first place."

Forrest Brazeal, Cloud Architect & AWS Serverless Hero

Artem Arkhipov, Web Development Expert at Techmagic

ar.arkhipov@gmail.co

skype: tamango92

twitter: ar_arkhipov

Cheers!

AWS Lambda - Provisioned Concurrency in Details

By Artem Arkhipov

AWS Lambda - Provisioned Concurrency in Details

What are the cold starts? Should we cope with them and how? What is "provisioned concurrency" feature? Is it always worth to use this feature?

Artem Arkhipov

ar_arkhipov

Understand

Provisioned Concurrency for

AWS Lambda

We love AWS Lambda for:

What we wished during 5 years:

How AWS Lambda works?

Container Lifecycle

Language and Cold Start Correlation

Cold Start depends on:

Probability of Container to be destroyed

Serverless developers:

AWS Lambda:

Classic Solution

The Simplest Approach

More Parallel Containers

Old style warm-up possible with:

AWS Lambda

Provisioned Concurrency

Let's Test It

Pricing

Old solution:

Example:

1 Lambda

1GB memory

10 concurrency

1 month

10 short invocations (100ms)

Per each 6 minutes (10 / hour)

24 hours, 30 days

(0.000016667/10) * 10 * 10 * 24 * 30 =

0.12 $

Provisioned Concurrency:

10 containers

0.000004646 $ / second

24 hours, 30 days

0.000004646 * 10 * 3600 * 24 * 30 =

120.4 $

0.000016667 * 10 * 10 * 24 * 30 =

1.20 $

1 sec per inv

116 $

Provisioned concurrency

When to use

Provisioned Concurrency

Regular, predictable

high load spikes

Scheduled provisioning

You can combine the approaches

Scheduled provisioning

Scheduled provisioning

Provisioning according to utilization

Provisioning according to utilization

Wavy and smooth load changes + AWS autoscaling

What we learned today

AWS Lambda - Provisioned Concurrency in Details

More from Artem Arkhipov