AWS Lambda Performance Optimization
By Oleksandr Andrushchenko — Published on — Modified on
AWS Lambda performance optimization is not only about cold starts. A slow Lambda function can be caused by heavy dependencies, poor memory configuration, slow network calls, database connection problems, inefficient batch settings, or downstream systems that cannot handle Lambda concurrency.
This article explains practical ways to make Lambda functions faster, cheaper, and more predictable. We will cover cold starts, memory and CPU tuning, package size, connection reuse, network calls, batch processing, architecture choices, and observability.
Table of Contents
- What Does Performance Mean in AWS Lambda?
- Understand the Lambda Execution Lifecycle
- Optimize Cold Starts
- Tune Memory and CPU
- Reuse Connections and Clients
- Optimize Network Calls
- Optimize Dependencies and Package Size
- Use the Right Architecture
- Optimize Batch Processing
- Choose the Right Storage and Database Access
- Improve Observability
- Common Performance Mistakes
- Performance Optimization Checklist
- Conclusion
What Does Performance Mean in AWS Lambda?
Before optimizing Lambda, define what performance means for your workload. A function can be fast from a latency perspective but expensive. Another function can be cheap but too slow for an API. A queue worker may not care about single-request latency but may care about total throughput.
| Metric | Meaning | Common Problem |
|---|---|---|
| Latency | Total time the caller waits | Slow APIs, cold starts, network calls |
| Cold start time | Time needed to initialize a new execution environment | Heavy dependencies, large packages, VPC setup |
| Execution duration | Time spent running the handler | Inefficient code, slow database queries, external APIs |
| Throughput | How many events can be processed per second | Wrong batch size, downstream bottlenecks |
| Cost | Price based on requests, duration, memory, and architecture | Over-provisioning, slow execution, unnecessary invocations |
Latency
Latency matters most for synchronous workloads such as APIs, webhooks, and user-facing endpoints. If a user waits for the response, cold starts, slow imports, database queries, and external API calls directly affect user experience.
Cold Start Time
A cold start happens when Lambda creates a new execution environment before invoking your function. This includes preparing the runtime, loading code, initializing dependencies, and running code outside the handler.
Execution Duration
Execution duration is the time your function spends running after invocation starts. It is affected by your code, CPU allocation, memory, database access, network calls, and event size.
Throughput
Throughput matters for asynchronous workloads such as SQS workers, Kinesis consumers, DynamoDB Streams, and batch processors. The goal is not always to process one event as fast as possible, but to process many events efficiently and safely.
Cost
Lambda cost is connected to number of invocations, execution duration, configured memory, and optional features such as provisioned concurrency. A faster function can sometimes be cheaper even if it uses more memory.
Understand the Lambda Execution Lifecycle
To optimize Lambda performance, you need to understand where time is spent. Lambda execution has two major parts: initialization and invocation.
Cold invocation:
Create environment
-> Load runtime
-> Load function code
-> Run initialization code
-> Run handler
Warm invocation:
Reuse existing environment
-> Run handler
Init Phase
The init phase happens before the handler runs. It includes imports, global variables, SDK client creation, configuration loading, and framework initialization.
import boto3
# Runs during initialization
s3_client = boto3.client("s3")
def lambda_handler(event, context):
# Runs during invocation
return {
"message": "Hello"
}
Important: heavy code outside the handler increases cold start time. Useful reusable clients are fine, but expensive unnecessary initialization should be avoided.
Invoke Phase
The invoke phase is the actual handler execution. This is where your function processes input, calls databases, invokes APIs, writes logs, and returns a result.
def lambda_handler(event, context):
user_id = event["userId"]
user = get_user_from_database(user_id)
return {
"user": user
}
Cold Starts vs Warm Starts
| Invocation Type | What Happens | Performance Impact |
|---|---|---|
| Cold start | New execution environment is created | Slower |
| Warm start | Existing environment is reused | Faster |
What Runs Outside the Handler
Code outside the handler may run once per execution environment and then be reused for warm invocations. This is useful for reusable clients and cached configuration.
import os
import boto3
TABLE_NAME = os.environ["TABLE_NAME"]
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(TABLE_NAME)
def lambda_handler(event, context):
response = table.get_item(
Key={"id": event["id"]}
)
return response.get("Item")
Rule of thumb: initialize reusable clients outside the handler, but keep initialization small and predictable.
Optimize Cold Starts
Cold starts are most important for user-facing APIs and latency-sensitive workloads. They are usually less important for queue workers, scheduled jobs, and background processing.
Reduce Deployment Package Size
Large deployment packages take longer to download, unpack, and initialize. Remove unused dependencies, tests, documentation, local build artifacts, and unnecessary libraries.
Bad package:
app/
node_modules/
tests/
docs/
local-cache/
unused-libraries/
Better package:
app/
handler.py
only-required-dependencies/
Avoid Heavy Imports During Initialization
Heavy imports increase initialization time. Some libraries load many modules before your handler even runs.
# Avoid importing heavy libraries globally if they are rarely used.
import pandas as pd
def lambda_handler(event, context):
return process(event)
If a dependency is used only for rare branches, consider lazy loading.
Use Lazy Loading
def lambda_handler(event, context):
if event.get("generateReport"):
import pandas as pd
return generate_report(pd, event)
return {
"message": "No report needed"
}
Trade-off: lazy loading moves cost from cold start into the specific request path. Use it when the heavy dependency is not always needed.
Keep Initialization Code Small
Do not perform unnecessary network calls, database queries, or expensive computations during initialization.
# Bad: network call during initialization
CONFIG = load_config_from_remote_api()
def lambda_handler(event, context):
return CONFIG
# Better: load only when needed and cache after first use
config = None
def get_config():
global config
if config is None:
config = load_config_from_remote_api()
return config
def lambda_handler(event, context):
return get_config()
Use Provisioned Concurrency When Needed
Provisioned concurrency keeps execution environments initialized and ready to respond. This can reduce cold start latency for important APIs.
| Use Provisioned Concurrency When | Avoid It When |
|---|---|
| API latency must be predictable | Workload is mostly background processing |
| Cold starts are visible to users | Traffic is very low and cost-sensitive |
| Function has heavy initialization | Occasional cold starts are acceptable |
Rule of thumb: optimize code first, then use provisioned concurrency for latency-critical functions.
Tune Memory and CPU
Lambda memory configuration affects more than memory. As memory increases, Lambda also provides more CPU capacity. This means a function with more memory can run faster and sometimes cost less overall.
How Memory Affects CPU
If a function is CPU-bound, increasing memory can reduce duration significantly. If a function is waiting on a slow external API, increasing memory may not help much.
| Workload Type | Memory Increase Helps? | Reason |
|---|---|---|
| CPU-heavy JSON processing | Usually yes | More CPU can reduce execution time |
| Image processing | Usually yes | More CPU and memory can speed up processing |
| External API call | Usually limited | Most time is spent waiting on network |
| Database query | Sometimes | Depends whether Lambda or database is the bottleneck |
Why More Memory Can Be Faster and Cheaper
Lambda cost depends on memory and duration together. If doubling memory cuts duration by more than half, the total cost can decrease while performance improves.
Example idea:
512 MB function runs for 1000 ms
1024 MB function runs for 400 ms
Even with more memory, the faster function may be cheaper or similar in cost.
Finding the Right Memory Setting
Do not guess the best memory value. Test several memory settings with realistic inputs and compare duration, cost, and error rate.
| Memory | Average Duration | Cost Direction | Result |
|---|---|---|---|
| 256 MB | 2200 ms | Low memory, long duration | Too slow |
| 512 MB | 1100 ms | Balanced | Better |
| 1024 MB | 480 ms | Higher memory, shorter duration | Potentially best |
| 2048 MB | 430 ms | More expensive, small gain | Diminishing returns |
AWS Lambda Power Tuning
AWS Lambda Power Tuning is a common approach for testing different memory configurations and comparing performance against cost. The goal is to find the best trade-off, not simply the smallest memory setting.
Rule of thumb: tune Lambda memory with real workloads, not synthetic empty events.
Reuse Connections and Clients
Creating clients and connections on every invocation wastes time. Lambda may reuse execution environments, so you can often initialize clients outside the handler and reuse them during warm invocations.
Reuse AWS SDK Clients
import boto3
# Created once per execution environment
s3_client = boto3.client("s3")
def lambda_handler(event, context):
response = s3_client.list_buckets()
return {
"bucketCount": len(response["Buckets"])
}
Good: AWS SDK client is created outside the handler and can be reused.
import boto3
def lambda_handler(event, context):
# Less efficient: client is created on every invocation
s3_client = boto3.client("s3")
response = s3_client.list_buckets()
return {
"bucketCount": len(response["Buckets"])
}
Avoid: creating SDK clients inside the handler unless there is a specific reason.
Reuse Database Connections Carefully
Database connections are more complicated than SDK clients. Reusing them can improve performance, but stale connections, timeouts, and concurrency limits must be handled.
import os
import psycopg2
connection = None
def get_connection():
global connection
if connection is None or connection.closed:
connection = psycopg2.connect(
host=os.environ["DB_HOST"],
dbname=os.environ["DB_NAME"],
user=os.environ["DB_USER"],
password=os.environ["DB_PASSWORD"]
)
return connection
def lambda_handler(event, context):
conn = get_connection()
with conn.cursor() as cursor:
cursor.execute("SELECT now()")
row = cursor.fetchone()
return {
"databaseTime": str(row[0])
}
Use RDS Proxy for Relational Databases
Lambda can scale faster than a relational database can accept new connections. RDS Proxy helps pool and manage database connections.
Problem:
1,000 Lambda invocations
-> 1,000 direct database connections
-> RDS connection exhaustion
Better:
1,000 Lambda invocations
-> RDS Proxy
-> managed connection pool
-> RDS / Aurora
Avoid Creating Clients Inside the Handler
Rule of thumb: create reusable clients outside the handler, validate connections before reuse, and use managed pooling when connecting Lambda to relational databases.
Optimize Network Calls
Many Lambda functions are slow not because the code is slow, but because they wait on network calls: databases, external APIs, internal services, secrets managers, or storage services.
Reduce External API Calls
Every external call adds latency and failure risk. Avoid unnecessary calls, combine requests when possible, and cache stable data.
Bad:
Lambda
-> API call for user
-> API call for orders
-> API call for settings
-> API call for permissions
Better:
Lambda
-> aggregated endpoint
-> cached configuration
-> fewer network round trips
Set Timeouts
Never let external requests wait forever. Set explicit timeouts for HTTP clients, database calls, and SDK operations.
import requests
def lambda_handler(event, context):
response = requests.get(
"https://api.example.com/users/123",
timeout=3
)
return response.json()
Use Retries Carefully
Retries can improve reliability, but they can also make latency worse and overload downstream systems.
| Retry Situation | Good Strategy |
|---|---|
| Temporary network error | Retry with backoff |
| Rate limit response | Backoff or send to queue |
| Invalid input | Do not retry forever |
| Downstream outage | Use queue, DLQ, and alarms |
Parallelize Independent I/O
If multiple network calls are independent, running them sequentially can waste time. Use parallel execution carefully.
import concurrent.futures
def lambda_handler(event, context):
user_id = event["userId"]
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
user_future = executor.submit(get_user, user_id)
orders_future = executor.submit(get_orders, user_id)
settings_future = executor.submit(get_settings, user_id)
return {
"user": user_future.result(),
"orders": orders_future.result(),
"settings": settings_future.result()
}
Important: parallel calls can improve latency, but they also increase pressure on downstream systems.
Cache Static Data
If configuration, reference data, or public keys rarely change, cache them in memory during warm invocations.
cached_config = None
def get_config():
global cached_config
if cached_config is None:
cached_config = load_config_from_database()
return cached_config
def lambda_handler(event, context):
config = get_config()
return {
"featureEnabled": config["featureEnabled"]
}
Optimize Dependencies and Package Size
Dependencies affect both cold start time and deployment complexity. A simple function should not carry a full application framework unless it really needs one.
Remove Unused Dependencies
Review dependency files regularly. Remove packages that are no longer used. Avoid installing development-only dependencies into production artifacts.
Common package bloat:
- test frameworks
- local development tools
- unused SDKs
- large data files
- documentation
- example files
- unnecessary transitive dependencies
Avoid Heavy Frameworks When Not Needed
A small webhook handler does not always need a full web framework. For simple Lambda handlers, plain runtime code may be faster and easier to deploy.
| Situation | Better Choice |
|---|---|
| One simple endpoint | Plain Lambda handler |
| Many HTTP routes and middleware | API framework may be useful |
| Background queue worker | Plain handler usually enough |
| Complex existing application | Framework may reduce migration work |
Use Lambda Layers Carefully
Lambda Layers can share dependencies across functions, but they are not always a performance improvement. Too many shared layers can make dependency management harder.
- Use layers for shared libraries used by many functions.
- Avoid layers for one-off dependencies.
- Version layers carefully to avoid unexpected changes.
- Do not hide dependency bloat inside layers.
Choose the Right Runtime
Runtime choice affects cold starts, ecosystem, developer productivity, and performance. Choose based on your team and workload, not only benchmark numbers.
| Runtime | Common Strength | Common Concern |
|---|---|---|
| Python | Simple, strong AWS and data ecosystem | Heavy data libraries can increase package size |
| Node.js | Good for I/O-heavy workloads | Dependency trees can grow quickly |
| Java | Strong enterprise ecosystem | Cold starts can be heavier without optimization |
| Go | Fast startup and single binary deployment | Less dynamic than scripting languages |
Use the Right Architecture
Sometimes the best Lambda optimization is not inside the function. It is changing the architecture so Lambda does less synchronous work.
Move Slow Work to SQS
If an API endpoint performs slow work, move that work to SQS and return quickly.
Slow API:
Client -> API Gateway -> Lambda -> Send email -> Generate report -> Response
Better:
Client -> API Gateway -> Lambda -> SQS -> Response
-> Worker processes job later
Use EventBridge for Decoupling
Use EventBridge when multiple systems need to react to business events without tightly coupling services.
OrderCreated event
-> Email service
-> Analytics service
-> Inventory service
-> Fraud service
Use Step Functions for Workflows
Do not put a complex multi-step workflow into one large Lambda function. Use Step Functions when you need branches, retries, waits, compensation, or visibility into each step.
Validate order
-> Reserve inventory
-> Charge payment
-> Send confirmation
-> Update order status
Avoid Long Synchronous Requests
Long synchronous Lambda requests are fragile. They are more likely to hit timeouts, user disconnects, retries, and poor user experience.
Rule of thumb: keep APIs fast. Move slow, retryable, or heavy work to asynchronous processing.
Optimize Batch Processing
Batch processing is common with SQS, Kinesis, DynamoDB Streams, and Kafka. Batch settings can have a large impact on throughput, cost, and failure behavior.
Tune Batch Size
Larger batches can improve throughput and reduce invocation count, but they can also increase memory usage and make failures more expensive.
| Batch Size | Benefit | Risk |
|---|---|---|
| Small | Simple failure handling | More invocations, lower throughput |
| Medium | Balanced throughput and risk | Needs monitoring |
| Large | High throughput | Longer retries, more memory, harder debugging |
Use Partial Batch Failures
If one message fails, you usually do not want the entire batch to be retried. Use partial batch failure handling when supported.
def lambda_handler(event, context):
failed_items = []
for record in event["Records"]:
try:
process_record(record)
except Exception:
failed_items.append({
"itemIdentifier": record["messageId"]
})
return {
"batchItemFailures": failed_items
}
Handle Poison Messages
A poison message is a message that always fails. Without proper handling, it can be retried repeatedly and block useful work.
- Validate input early.
- Use dead-letter queues.
- Log enough context to debug failed records.
- Separate temporary failures from permanent failures.
Control Concurrency
High concurrency can improve throughput, but it can also overload downstream systems.
Queue has 100,000 messages
-> Lambda scales up
-> database receives too many writes
-> database becomes bottleneck
Better:
reserved concurrency + batch tuning + backpressure
Choose the Right Storage and Database Access
Lambda performance often depends on the storage system it calls. Choose the right database or storage service for the access pattern.
DynamoDB for Key-Value Access
DynamoDB is often a good fit for Lambda because it scales well and avoids connection pooling problems common with relational databases.
import boto3
import os
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(os.environ["TABLE_NAME"])
def lambda_handler(event, context):
response = table.get_item(
Key={"id": event["id"]}
)
return response.get("Item")
S3 for Object Storage
S3 is a good fit for files, documents, images, reports, backups, and large objects. Avoid storing large blobs directly inside databases when object storage is more appropriate.
RDS with RDS Proxy
RDS is useful when you need relational queries, joins, transactions, and SQL. But Lambda + RDS needs connection management.
- Use RDS Proxy for connection pooling.
- Reuse connections when safe.
- Limit concurrency to protect the database.
- Keep transactions short.
ElastiCache for Low-Latency Reads
ElastiCache can help when many Lambda invocations repeatedly read the same data. Cache reference data, computed results, tokens, or expensive database lookups when appropriate.
Rule of thumb: optimize the access pattern, not only the Lambda code.
Improve Observability
You cannot optimize what you cannot see. Before changing memory, architecture, or dependencies, measure where the time is going.
Measure Duration
Track average, p95, and p99 duration. Average duration can look fine while p99 latency is bad for real users.
Track Init Duration
For cold starts, look at Init Duration. This helps separate initialization problems from handler execution problems.
Use CloudWatch Metrics
| Metric | What It Tells You |
|---|---|
| Duration | How long the function runs |
| Errors | How often invocations fail |
| Throttles | Whether concurrency limits are being hit |
| ConcurrentExecutions | How many invocations run in parallel |
| IteratorAge | Whether stream processing is falling behind |
Use Structured Logs
import json
import logging
import time
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
start = time.time()
result = process(event)
logger.info(json.dumps({
"requestId": context.aws_request_id,
"durationMs": int((time.time() - start) * 1000),
"message": "Lambda invocation completed"
}))
return result
Use AWS X-Ray or Tracing
Tracing helps identify slow downstream calls, database queries, external API latency, and service-to-service bottlenecks.
Rule of thumb: measure first, optimize second.
Common Performance Mistakes
Doing Too Much in One Function
A Lambda function that validates input, calls five APIs, generates a file, writes to a database, sends email, and updates analytics is doing too much. Split workflows or move slow work to asynchronous processing.
Opening Connections on Every Invocation
Creating SDK clients or database connections inside every invocation adds unnecessary latency and can overload downstream systems.
Using Lambda for Long-Running Jobs
Lambda is designed for bounded execution. If a job is long-running, consider Step Functions, ECS, AWS Batch, or another compute model.
Ignoring Downstream Limits
Lambda can scale quickly. Your database, third-party API, or internal service may not. Always design around downstream capacity.
Optimizing Without Measuring
Guessing is one of the most common performance mistakes. Measure cold starts, duration, memory usage, errors, retries, and downstream latency before making changes.
Performance Optimization Checklist
- Measure before optimizing. Check duration, init duration, errors, throttles, and downstream latency.
- Reduce package size. Remove unused dependencies and development files.
- Keep initialization small. Avoid unnecessary work outside the handler.
- Use lazy loading. Load heavy dependencies only when needed.
- Reuse SDK clients. Create clients outside the handler.
- Reuse database connections carefully. Validate stale connections and use RDS Proxy when appropriate.
- Tune memory. More memory can reduce duration and sometimes cost.
- Set network timeouts. Do not let external requests hang forever.
- Move slow work to SQS. Keep synchronous APIs fast.
- Use Step Functions for workflows. Do not put complex multi-step processes into one huge function.
- Tune batch size. Balance throughput, memory usage, and failure handling.
- Use partial batch failures. Avoid retrying successful records when one record fails.
- Control concurrency. Protect databases and external systems.
- Use structured logs. Make debugging and performance analysis easier.
- Use tracing. Find slow downstream calls and service bottlenecks.
Conclusion
AWS Lambda performance optimization is a system design problem, not just a code problem. Cold starts matter, but they are only one part of performance. Real Lambda performance depends on initialization, memory and CPU tuning, dependency size, network calls, database access, event source configuration, concurrency, and observability.
The best optimization usually starts with measurement. Find where time is spent, then choose the right fix. Sometimes the answer is smaller packages or lazy imports. Sometimes it is more memory. Sometimes it is connection reuse, RDS Proxy, SQS buffering, Step Functions, or changing the architecture.
Key takeaway: fast Lambda functions are small, measured, event-driven, connection-aware, dependency-conscious, and designed around downstream limits.
Comments (0)