AWS Lambda Cold Starts Explained
By Oleksandr Andrushchenko — Published on — Modified on
AWS Lambda cold starts are one of the most discussed topics in serverless architecture. A cold start happens when AWS Lambda needs to create a new execution environment before running your function. This extra initialization time can increase latency, especially for user-facing APIs.
Cold starts are not always a problem. For background jobs, scheduled tasks, SQS workers, and file processing, a few extra milliseconds or seconds may be acceptable. But for APIs, webhooks, real-time workflows, and latency-sensitive systems, cold starts can directly affect user experience.
Table of Contents
- What Is an AWS Lambda Cold Start?
- Lambda Execution Lifecycle
- Cold Start vs Warm Start
- What Causes Cold Starts?
- How to Detect Cold Starts
- Reduce Package Size
- Optimize Initialization Code
- Reuse SDK Clients and Connections
- Use Lazy Loading
- Choose the Right Runtime
- Tune Memory and CPU
- Use Provisioned Concurrency
- Design Around Cold Starts
- Common Cold Start Mistakes
- Cold Start Optimization Checklist
- Conclusion
What Is an AWS Lambda Cold Start?
A cold start happens when Lambda does not have an existing execution environment ready for your function. AWS must create a new environment, prepare the runtime, load your code, initialize dependencies, and then call your handler.
Cold start:
Create execution environment
-> Initialize runtime
-> Load function code
-> Run initialization code
-> Invoke handler
A cold start adds extra latency before your business logic runs.
Lambda Execution Lifecycle
Lambda execution has two important phases: init and invoke.
| Phase | What Happens | Performance Impact |
|---|---|---|
| Init phase | Runtime starts, code loads, global code runs | Affects cold start time |
| Invoke phase | Handler processes the event | Affects normal execution duration |
import boto3
# Init phase
s3_client = boto3.client("s3")
def lambda_handler(event, context):
# Invoke phase
return {
"message": "Hello from Lambda"
}
Important: code outside the handler runs during initialization. Heavy imports, expensive setup, and network calls outside the handler can make cold starts slower.
Cold Start vs Warm Start
A warm start happens when Lambda reuses an existing execution environment. In that case, initialization has already happened, so Lambda can call the handler faster.
| Invocation Type | What Happens | Typical Result |
|---|---|---|
| Cold start | New environment is created | Slower first invocation |
| Warm start | Existing environment is reused | Faster invocation |
First request after scale-up:
cold start
Next request using same environment:
warm start
Key point: warm starts are not guaranteed. Lambda may reuse an environment, but your application should never depend on reuse for correctness.
What Causes Cold Starts?
Cold starts are affected by several factors. Some are controlled by AWS, but many are influenced by your code, dependencies, configuration, and architecture.
| Factor | Why It Matters |
|---|---|
| Runtime | Some runtimes initialize faster than others |
| Package size | Larger packages take longer to load and initialize |
| Dependencies | Heavy imports increase init time |
| Initialization code | Global setup runs before the handler |
| VPC configuration | Private networking can add complexity and latency |
| Memory setting | More memory also gives more CPU, which can speed initialization |
| Traffic pattern | Bursty traffic may require many new environments |
How to Detect Cold Starts
Cold starts should be measured, not guessed. You can detect them using CloudWatch logs, metrics, tracing, or a simple global variable flag.
CloudWatch Init Duration
For cold invocations, Lambda logs may include Init Duration. This shows how long the initialization phase took.
REPORT RequestId: abc...
Duration: 120.45 ms
Billed Duration: 121 ms
Init Duration: 450.32 ms
Manual Cold Start Flag
You can also track cold starts yourself with a global variable.
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
is_cold_start = True
def lambda_handler(event, context):
global is_cold_start
logger.info(json.dumps({
"requestId": context.aws_request_id,
"coldStart": is_cold_start
}))
is_cold_start = False
return {
"status": "ok"
}
Rule of thumb: track cold starts separately from handler duration. Otherwise, you may optimize the wrong thing.
Reduce Package Size
Large deployment packages can increase cold start time. A small Lambda package is easier to load, deploy, inspect, and maintain.
Remove Unused Files
Common package bloat:
- tests
- documentation
- local virtual environments
- cache directories
- unused libraries
- large example files
- development-only tools
Review Dependencies
Do not include dependencies just because they are convenient. Some packages pull large transitive dependency trees.
| Situation | Better Choice |
|---|---|
| Simple JSON transformation | Use standard library |
| One HTTP call | Use a lightweight client |
| Small validation logic | Avoid importing a large framework if not needed |
| Heavy data processing | Consider whether Lambda is the right compute model |
Rule of thumb: every dependency should justify its cold start cost.
Optimize Initialization Code
Initialization code runs before your handler. Keep it small and predictable.
Bad Initialization Example
# Bad: remote call during initialization
config = load_config_from_remote_api()
def lambda_handler(event, context):
return {
"config": config
}
This makes every cold start depend on a remote API call.
Better Initialization Example
config = None
def get_config():
global config
if config is None:
config = load_config_from_remote_api()
return config
def lambda_handler(event, context):
return {
"config": get_config()
}
This loads configuration only when needed and reuses it during warm invocations.
Reuse SDK Clients and Connections
Reusable clients are one of the best things to initialize outside the handler. Creating AWS SDK clients on every invocation wastes time.
AWS SDK Client Reuse
import boto3
# Created during init and reused during warm invocations
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("Users")
def lambda_handler(event, context):
response = table.get_item(
Key={"id": event["userId"]}
)
return response.get("Item")
Database Connection Reuse
Database connections can also be reused, but they require more care because connections can become stale or closed.
import os
import psycopg2
connection = None
def get_connection():
global connection
if connection is None or connection.closed:
connection = psycopg2.connect(
host=os.environ["DB_HOST"],
dbname=os.environ["DB_NAME"],
user=os.environ["DB_USER"],
password=os.environ["DB_PASSWORD"]
)
return connection
def lambda_handler(event, context):
conn = get_connection()
with conn.cursor() as cursor:
cursor.execute("SELECT now()")
row = cursor.fetchone()
return {
"databaseTime": str(row[0])
}
Important: for relational databases, consider RDS Proxy and reserved concurrency to avoid connection exhaustion.
Use Lazy Loading
Lazy loading means importing or initializing something only when it is actually needed. This can reduce cold start time when a heavy dependency is used only by some requests.
Lazy Loading Example
def lambda_handler(event, context):
if event.get("generateReport"):
import pandas as pd
return generate_report(pd, event)
return {
"message": "Report generation not needed"
}
Trade-off: lazy loading moves cost from the cold start into the request path that uses the dependency. Use it when only some invocations need the heavy code.
Choose the Right Runtime
Runtime choice affects cold start behavior, dependency size, developer productivity, and ecosystem support.
| Runtime | Common Strength | Cold Start Consideration |
|---|---|---|
| Python | Simple, popular for automation and AWS integrations | Usually good, but heavy libraries can slow init |
| Node.js | Good for I/O-heavy workloads | Dependency trees can grow quickly |
| Java | Strong enterprise ecosystem | Can have heavier cold starts without tuning |
| Go | Single binary, fast startup | Good for small focused services |
Rule of thumb: choose a runtime your team can operate well. Then optimize package size, initialization, and memory.
Tune Memory and CPU
Lambda memory configuration also affects CPU allocation. Increasing memory can reduce both initialization time and handler duration for CPU-bound workloads.
Memory Tuning Example
| Memory | Average Duration | Result |
|---|---|---|
| 256 MB | 1800 ms | Too slow |
| 512 MB | 900 ms | Better |
| 1024 MB | 380 ms | Potentially best trade-off |
| 2048 MB | 340 ms | Diminishing returns |
Important: the lowest memory setting is not always the cheapest. A faster execution at higher memory can sometimes cost the same or less.
Use Provisioned Concurrency
Provisioned concurrency keeps execution environments initialized and ready before requests arrive. It is the most direct AWS feature for reducing cold starts.
When to Use Provisioned Concurrency
- User-facing APIs where latency must be predictable.
- Important business endpoints such as checkout, login, or payment.
- Heavy runtimes or frameworks with noticeable initialization time.
- Predictable traffic patterns where capacity can be planned.
When Not to Use Provisioned Concurrency
- Low-traffic internal tools where occasional cold starts are acceptable.
- Background workers where latency is less important.
- Cost-sensitive experimental functions.
- Functions that are already fast enough.
| Problem | Good Solution |
|---|---|
| Cold starts on important API | Provisioned concurrency |
| Slow database query | Optimize query or database access |
| Slow external API | Timeouts, caching, async processing |
| Too much work in request path | Move work to SQS or Step Functions |
Rule of thumb: use provisioned concurrency after optimizing code and only where cold start latency actually matters.
Design Around Cold Starts
Sometimes the best cold start optimization is architectural. Not every workload needs to be synchronous, and not every function needs to respond directly to users.
Move Slow Work to a Queue
Slow API:
Client -> API Gateway -> Lambda -> heavy processing -> response
Better:
Client -> API Gateway -> Lambda -> SQS -> response
-> worker processes later
Use Step Functions for Workflows
If a process has many steps, branches, retries, or waits, use Step Functions instead of one large Lambda function.
Validate order
-> Reserve inventory
-> Charge payment
-> Send confirmation
-> Update status
Separate Critical and Non-Critical Functions
Do not put latency-sensitive API logic and slow background work into the same Lambda function. Separate them so each can be optimized differently.
| Function Type | Optimization Focus |
|---|---|
| Public API | Low latency, small package, provisioned concurrency if needed |
| SQS worker | Throughput, batch size, retries, DLQ |
| Scheduled job | Correctness, timeout, observability |
| File processor | Memory, temporary storage, idempotency |
Common Cold Start Mistakes
- Optimizing cold starts before measuring them.
- Using provisioned concurrency for every function.
- Putting heavy imports at global scope unnecessarily.
- Including unused dependencies in the deployment package.
- Making network calls during initialization.
- Using one large Lambda for unrelated workflows.
- Ignoring memory tuning.
- Trying to solve slow database queries with cold start fixes.
- Depending on warm execution environment reuse for correctness.
Cold Start Optimization Checklist
- Measure Init Duration before changing code.
- Track cold starts with logs or metrics.
- Remove unused dependencies from the deployment package.
- Keep initialization code small.
- Avoid unnecessary network calls outside the handler.
- Reuse SDK clients outside the handler.
- Use lazy loading for rarely used heavy dependencies.
- Tune memory with realistic workloads.
- Use provisioned concurrency for latency-sensitive APIs.
- Move slow work to queues instead of blocking API responses.
- Keep functions focused instead of building large monolithic Lambdas.
- Monitor p95 and p99 latency, not only average duration.
Conclusion
AWS Lambda cold starts are real, but they are not always the biggest problem. For many workloads, database queries, external APIs, package size, memory settings, and architecture choices have a larger impact on performance than the cold start itself.
The best approach is to measure first. Identify whether latency comes from Init Duration, handler execution, network calls, database access, or downstream systems. Then optimize the right layer.
Key takeaway: cold start optimization is about keeping functions small, initialization light, dependencies controlled, clients reusable, memory tuned, and latency-sensitive functions protected with provisioned concurrency when necessary.
Comments (0)