AWS Lambda Cold Starts Explained

By Oleksandr Andrushchenko — Published on — Modified on

AWS Lambda Cold Starts Explained
AWS Lambda Cold Starts Explained

AWS Lambda cold starts are one of the most discussed topics in serverless architecture. A cold start happens when AWS Lambda needs to create a new execution environment before running your function. This extra initialization time can increase latency, especially for user-facing APIs.

Cold starts are not always a problem. For background jobs, scheduled tasks, SQS workers, and file processing, a few extra milliseconds or seconds may be acceptable. But for APIs, webhooks, real-time workflows, and latency-sensitive systems, cold starts can directly affect user experience.

Table of Contents

What Is an AWS Lambda Cold Start?

A cold start happens when Lambda does not have an existing execution environment ready for your function. AWS must create a new environment, prepare the runtime, load your code, initialize dependencies, and then call your handler.

Cold start:
Create execution environment
  -> Initialize runtime
  -> Load function code
  -> Run initialization code
  -> Invoke handler

A cold start adds extra latency before your business logic runs.

Lambda Execution Lifecycle

Lambda execution has two important phases: init and invoke.

Phase What Happens Performance Impact
Init phase Runtime starts, code loads, global code runs Affects cold start time
Invoke phase Handler processes the event Affects normal execution duration
import boto3

# Init phase
s3_client = boto3.client("s3")

def lambda_handler(event, context):
    # Invoke phase
    return {
        "message": "Hello from Lambda"
    }

Important: code outside the handler runs during initialization. Heavy imports, expensive setup, and network calls outside the handler can make cold starts slower.

Cold Start vs Warm Start

A warm start happens when Lambda reuses an existing execution environment. In that case, initialization has already happened, so Lambda can call the handler faster.

Invocation Type What Happens Typical Result
Cold start New environment is created Slower first invocation
Warm start Existing environment is reused Faster invocation
First request after scale-up:
cold start

Next request using same environment:
warm start

Key point: warm starts are not guaranteed. Lambda may reuse an environment, but your application should never depend on reuse for correctness.

What Causes Cold Starts?

Cold starts are affected by several factors. Some are controlled by AWS, but many are influenced by your code, dependencies, configuration, and architecture.

Factor Why It Matters
Runtime Some runtimes initialize faster than others
Package size Larger packages take longer to load and initialize
Dependencies Heavy imports increase init time
Initialization code Global setup runs before the handler
VPC configuration Private networking can add complexity and latency
Memory setting More memory also gives more CPU, which can speed initialization
Traffic pattern Bursty traffic may require many new environments

How to Detect Cold Starts

Cold starts should be measured, not guessed. You can detect them using CloudWatch logs, metrics, tracing, or a simple global variable flag.

CloudWatch Init Duration

For cold invocations, Lambda logs may include Init Duration. This shows how long the initialization phase took.

REPORT RequestId: abc...
Duration: 120.45 ms
Billed Duration: 121 ms
Init Duration: 450.32 ms

Manual Cold Start Flag

You can also track cold starts yourself with a global variable.

import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

is_cold_start = True

def lambda_handler(event, context):
    global is_cold_start

    logger.info(json.dumps({
        "requestId": context.aws_request_id,
        "coldStart": is_cold_start
    }))

    is_cold_start = False

    return {
        "status": "ok"
    }

Rule of thumb: track cold starts separately from handler duration. Otherwise, you may optimize the wrong thing.

Reduce Package Size

Large deployment packages can increase cold start time. A small Lambda package is easier to load, deploy, inspect, and maintain.

Remove Unused Files

Common package bloat:
- tests
- documentation
- local virtual environments
- cache directories
- unused libraries
- large example files
- development-only tools

Review Dependencies

Do not include dependencies just because they are convenient. Some packages pull large transitive dependency trees.

Situation Better Choice
Simple JSON transformation Use standard library
One HTTP call Use a lightweight client
Small validation logic Avoid importing a large framework if not needed
Heavy data processing Consider whether Lambda is the right compute model

Rule of thumb: every dependency should justify its cold start cost.

Optimize Initialization Code

Initialization code runs before your handler. Keep it small and predictable.

Bad Initialization Example

# Bad: remote call during initialization
config = load_config_from_remote_api()

def lambda_handler(event, context):
    return {
        "config": config
    }

This makes every cold start depend on a remote API call.

Better Initialization Example

config = None

def get_config():
    global config

    if config is None:
        config = load_config_from_remote_api()

    return config

def lambda_handler(event, context):
    return {
        "config": get_config()
    }

This loads configuration only when needed and reuses it during warm invocations.

Reuse SDK Clients and Connections

Reusable clients are one of the best things to initialize outside the handler. Creating AWS SDK clients on every invocation wastes time.

AWS SDK Client Reuse

import boto3

# Created during init and reused during warm invocations
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("Users")

def lambda_handler(event, context):
    response = table.get_item(
        Key={"id": event["userId"]}
    )

    return response.get("Item")

Database Connection Reuse

Database connections can also be reused, but they require more care because connections can become stale or closed.

import os
import psycopg2

connection = None

def get_connection():
    global connection

    if connection is None or connection.closed:
        connection = psycopg2.connect(
            host=os.environ["DB_HOST"],
            dbname=os.environ["DB_NAME"],
            user=os.environ["DB_USER"],
            password=os.environ["DB_PASSWORD"]
        )

    return connection

def lambda_handler(event, context):
    conn = get_connection()

    with conn.cursor() as cursor:
        cursor.execute("SELECT now()")
        row = cursor.fetchone()

    return {
        "databaseTime": str(row[0])
    }

Important: for relational databases, consider RDS Proxy and reserved concurrency to avoid connection exhaustion.

Use Lazy Loading

Lazy loading means importing or initializing something only when it is actually needed. This can reduce cold start time when a heavy dependency is used only by some requests.

Lazy Loading Example

def lambda_handler(event, context):
    if event.get("generateReport"):
        import pandas as pd
        return generate_report(pd, event)

    return {
        "message": "Report generation not needed"
    }

Trade-off: lazy loading moves cost from the cold start into the request path that uses the dependency. Use it when only some invocations need the heavy code.

Choose the Right Runtime

Runtime choice affects cold start behavior, dependency size, developer productivity, and ecosystem support.

Runtime Common Strength Cold Start Consideration
Python Simple, popular for automation and AWS integrations Usually good, but heavy libraries can slow init
Node.js Good for I/O-heavy workloads Dependency trees can grow quickly
Java Strong enterprise ecosystem Can have heavier cold starts without tuning
Go Single binary, fast startup Good for small focused services

Rule of thumb: choose a runtime your team can operate well. Then optimize package size, initialization, and memory.

Tune Memory and CPU

Lambda memory configuration also affects CPU allocation. Increasing memory can reduce both initialization time and handler duration for CPU-bound workloads.

Memory Tuning Example

Memory Average Duration Result
256 MB 1800 ms Too slow
512 MB 900 ms Better
1024 MB 380 ms Potentially best trade-off
2048 MB 340 ms Diminishing returns

Important: the lowest memory setting is not always the cheapest. A faster execution at higher memory can sometimes cost the same or less.

Use Provisioned Concurrency

Provisioned concurrency keeps execution environments initialized and ready before requests arrive. It is the most direct AWS feature for reducing cold starts.

When to Use Provisioned Concurrency

  • User-facing APIs where latency must be predictable.
  • Important business endpoints such as checkout, login, or payment.
  • Heavy runtimes or frameworks with noticeable initialization time.
  • Predictable traffic patterns where capacity can be planned.

When Not to Use Provisioned Concurrency

  • Low-traffic internal tools where occasional cold starts are acceptable.
  • Background workers where latency is less important.
  • Cost-sensitive experimental functions.
  • Functions that are already fast enough.
Problem Good Solution
Cold starts on important API Provisioned concurrency
Slow database query Optimize query or database access
Slow external API Timeouts, caching, async processing
Too much work in request path Move work to SQS or Step Functions

Rule of thumb: use provisioned concurrency after optimizing code and only where cold start latency actually matters.

Design Around Cold Starts

Sometimes the best cold start optimization is architectural. Not every workload needs to be synchronous, and not every function needs to respond directly to users.

Move Slow Work to a Queue

Slow API:
Client -> API Gateway -> Lambda -> heavy processing -> response

Better:
Client -> API Gateway -> Lambda -> SQS -> response
                                      -> worker processes later

Use Step Functions for Workflows

If a process has many steps, branches, retries, or waits, use Step Functions instead of one large Lambda function.

Validate order
  -> Reserve inventory
  -> Charge payment
  -> Send confirmation
  -> Update status

Separate Critical and Non-Critical Functions

Do not put latency-sensitive API logic and slow background work into the same Lambda function. Separate them so each can be optimized differently.

Function Type Optimization Focus
Public API Low latency, small package, provisioned concurrency if needed
SQS worker Throughput, batch size, retries, DLQ
Scheduled job Correctness, timeout, observability
File processor Memory, temporary storage, idempotency

Common Cold Start Mistakes

  • Optimizing cold starts before measuring them.
  • Using provisioned concurrency for every function.
  • Putting heavy imports at global scope unnecessarily.
  • Including unused dependencies in the deployment package.
  • Making network calls during initialization.
  • Using one large Lambda for unrelated workflows.
  • Ignoring memory tuning.
  • Trying to solve slow database queries with cold start fixes.
  • Depending on warm execution environment reuse for correctness.

Cold Start Optimization Checklist

  • Measure Init Duration before changing code.
  • Track cold starts with logs or metrics.
  • Remove unused dependencies from the deployment package.
  • Keep initialization code small.
  • Avoid unnecessary network calls outside the handler.
  • Reuse SDK clients outside the handler.
  • Use lazy loading for rarely used heavy dependencies.
  • Tune memory with realistic workloads.
  • Use provisioned concurrency for latency-sensitive APIs.
  • Move slow work to queues instead of blocking API responses.
  • Keep functions focused instead of building large monolithic Lambdas.
  • Monitor p95 and p99 latency, not only average duration.

Conclusion

AWS Lambda cold starts are real, but they are not always the biggest problem. For many workloads, database queries, external APIs, package size, memory settings, and architecture choices have a larger impact on performance than the cold start itself.

The best approach is to measure first. Identify whether latency comes from Init Duration, handler execution, network calls, database access, or downstream systems. Then optimize the right layer.

Key takeaway: cold start optimization is about keeping functions small, initialization light, dependencies controlled, clients reusable, memory tuned, and latency-sensitive functions protected with provisioned concurrency when necessary.

Comments (0)