Threads, Async, Await, and Event Loops in AWS Lambda

By Oleksandr Andrushchenko — Published on

Threads, Async, Await, and Event Loops in AWS Lambda
Threads, Async, Await, and Event Loops in AWS Lambda

AWS Lambda already gives you horizontal scalability by creating more execution environments when more events arrive. But inside one Lambda invocation, your Python code may still spend a lot of time waiting for APIs, databases, S3, DynamoDB, Redis, or other services.

This article explains how threads, async, await, and the event loop behave inside AWS Lambda, when they help, when they do not help, and how to use them safely in real serverless applications.

Table of Contents

AWS Lambda Execution Model

One Invocation Per Execution Environment

An AWS Lambda function runs inside an execution environment. When an event arrives, Lambda invokes your handler. In Python, the handler is usually a normal synchronous function such as lambda_handler(event, context). The invocation runs until the handler returns a response, exits, or times out.

def lambda_handler(event, context):
    return {
        "statusCode": 200,
        "body": "Hello from Lambda"
    }

For one execution environment, you should think about one active invocation at a time. If more requests arrive concurrently, AWS Lambda scales by creating more execution environments instead of running many unrelated invocations inside the same environment.

Lambda Already Scales Horizontally

Lambda concurrency is not the same as Python concurrency. AWS Lambda scales the function by running multiple execution environments. Python threads or async do not make Lambda itself scale horizontally. They only help your code do more work inside a single invocation.

Request 1 -> Lambda environment 1
Request 2 -> Lambda environment 2
Request 3 -> Lambda environment 3
Request 4 -> Lambda environment 4

This distinction is important. If your Lambda function receives 1,000 API Gateway requests at the same time, AWS can run many Lambda environments. If one invocation needs to call three APIs, threads or async can help that single invocation call those APIs concurrently.

Cold Starts and Warm Starts

A cold start happens when Lambda creates a new execution environment. Python imports modules, initializes global variables, creates clients, and prepares the runtime. A warm start happens when Lambda reuses an existing environment for another invocation.

Cold start:
Create environment
Import modules
Initialize global objects
Run handler


Warm start:
Reuse environment
Run handler again

Global variables may survive between warm invocations. This is useful for reusing SDK clients, database pools, configuration, and sometimes event loop-related objects. However, you should never assume a warm start is guaranteed.

Memory and CPU Allocation

In Lambda, memory configuration also affects available CPU. Increasing memory can improve CPU-bound work, import speed, compression, JSON processing, cryptography, and some network-heavy workloads. This means performance tuning is not only about code; Lambda memory size can change execution speed and cost.

Rule of thumb: for IO-heavy Lambda functions, use threads or async to reduce waiting time. For CPU-heavy Lambda functions, test higher memory settings or move the work to a more appropriate compute model.

Concurrency Inside One Lambda Invocation

Why Invocation-Level Concurrency Matters

Inside one Lambda invocation, code can still be slow if it waits for multiple external operations sequentially. For example, an API Lambda may need to fetch a user profile, order history, permissions, and feature flags before returning a response.

Sequential flow:

Get user       300 ms
Get orders     300 ms
Get payments   300 ms

Total: about 900 ms

If these operations are independent, they can be performed concurrently.

Concurrent flow:

Get user       300 ms
Get orders     300 ms
Get payments   300 ms

Total: about 300 ms

This matters in Lambda because duration affects latency and cost. Reducing a function from 900 ms to 300 ms can make the system faster and cheaper, assuming the added complexity is justified.

IO-Bound vs CPU-Bound Work

IO-bound work spends most of its time waiting for external systems. Examples include HTTP calls, DynamoDB requests, S3 operations, Redis calls, database queries, and queue interactions. Threads and async can help because they allow other work to continue while one operation waits.

CPU-bound work spends most of its time using the processor. Examples include image processing, compression, encryption, large JSON transformations, heavy calculations, and machine learning inference. Threads and async are usually not the best tools for CPU-heavy Python code, especially because of the GIL in standard CPython.

Workload Examples Good Lambda Concurrency Option
IO-bound HTTP APIs, S3, DynamoDB, Redis, SQL queries Threads or async
CPU-bound Image processing, compression, heavy calculations More memory/CPU, multiprocessing, worker service, external compute
Mixed Fetch data, transform it, store result Measure bottleneck first

Threads in AWS Lambda

How Threads Work in Lambda

Threads run multiple execution paths inside the same Lambda process. They are useful when your code uses blocking libraries such as boto3, requests, traditional SQL drivers, or blocking SDKs.

When one thread waits for a network response, another thread can continue. This can reduce the total duration of one Lambda invocation when several independent blocking operations need to happen.

Lambda invocation
  |
ThreadPoolExecutor
  |
+-------------+-------------+-------------+
|             |             |
API call      S3 call       DynamoDB call

ThreadPoolExecutor Example

Use ThreadPoolExecutor instead of manually creating many threads. It limits concurrency and keeps the code easier to control.

from concurrent.futures import ThreadPoolExecutor
import requests

def fetch_json(url):
    response = requests.get(url, timeout=5)
    response.raise_for_status()
    return response.json()

def lambda_handler(event, context):
    urls = [
        "https://api.example.com/user",
        "https://api.example.com/orders",
        "https://api.example.com/payments",
    ]

    with ThreadPoolExecutor(max_workers=3) as executor:
        results = list(executor.map(fetch_json, urls))

    return {
        "statusCode": 200,
        "body": results,
    }

This can be much faster than calling each URL sequentially if the APIs are independent and most time is spent waiting for the network.

Threads with boto3

boto3 is synchronous and blocking. If you need to perform multiple independent AWS SDK calls inside one Lambda invocation, threads can be a practical option.

from concurrent.futures import ThreadPoolExecutor
import boto3

s3 = boto3.client("s3")

def get_object_text(bucket, key):
    response = s3.get_object(Bucket=bucket, Key=key)
    return response["Body"].read().decode("utf-8")

def lambda_handler(event, context):
    objects = [
        ("my-bucket", "files/a.txt"),
        ("my-bucket", "files/b.txt"),
        ("my-bucket", "files/c.txt"),
    ]

    with ThreadPoolExecutor(max_workers=3) as executor:
        contents = list(
            executor.map(lambda item: get_object_text(item[0], item[1]), objects)
        )

    return {
        "statusCode": 200,
        "body": {
            "files": len(contents),
        },
    }

This pattern is useful for parallel S3 reads, multiple independent DynamoDB queries, or several independent API calls. Keep the worker count reasonable because each thread consumes memory and increases pressure on downstream systems.

When Threads Help

  • You use blocking libraries such as boto3, requests, or synchronous database drivers.
  • You have multiple independent IO operations inside one invocation.
  • You want to improve existing synchronous Lambda code without rewriting everything to async.
  • You need moderate concurrency, not thousands of concurrent tasks.
  • You want simple parallelism for AWS SDK calls within one invocation.

Thread Limitations

Threads are useful, but they are not free. Too many threads can increase memory usage, scheduling overhead, and downstream pressure. Threads also share memory, which can create race conditions if multiple threads mutate the same data.

Important: threads do not make CPU-heavy Python code truly parallel in standard CPython because of the GIL. For CPU-heavy Lambda workloads, increasing memory, using multiprocessing carefully, using native libraries, or moving work to another service may be better.

Async and Await in AWS Lambda

What Async Means in Lambda

Async IO allows a single thread to handle many waiting operations using an event loop. Instead of blocking while one request waits, the coroutine pauses at await, and the event loop runs another coroutine.

Async is useful in Lambda when you have many network calls and async-compatible libraries, such as httpx.AsyncClient, aiohttp, async database drivers, or async Redis clients.

Sync Handler with Async Main

In normal Python Lambda code, keep the top-level Lambda handler synchronous and call async code from inside it. This keeps the Lambda entry point simple while still allowing async concurrency internally.

import asyncio

async def main(event):
    return {
        "message": "Hello from async code",
        "event": event,
    }

def lambda_handler(event, context):
    result = asyncio.run(main(event))

    return {
        "statusCode": 200,
        "body": result,
    }

This pattern is simple and works well for many Lambda functions. It creates an event loop, runs the async function, and closes the loop when finished.

asyncio.run Example

Here is a more realistic example using async HTTP calls.

import asyncio
import httpx

async def fetch_json(client, url):
    response = await client.get(url, timeout=5)
    response.raise_for_status()
    return response.json()

async def main(event):
    urls = [
        "https://api.example.com/user",
        "https://api.example.com/orders",
        "https://api.example.com/payments",
    ]

    async with httpx.AsyncClient() as client:
        results = await asyncio.gather(
            *(fetch_json(client, url) for url in urls)
        )

    return results

def lambda_handler(event, context):
    results = asyncio.run(main(event))

    return {
        "statusCode": 200,
        "body": results,
    }

When one request waits for the network, the event loop can continue with another request. This can reduce total Lambda duration when operations are independent.

Event Loop Reuse

For many Lambda functions, asyncio.run() inside the handler is simple and acceptable. However, it creates and closes an event loop for each invocation. In more advanced cases, especially frameworks or libraries that manage an event loop, you may need a different pattern.

For example, if you use FastAPI with Mangum, you normally let the framework and adapter manage the ASGI lifecycle. If you build a custom async Lambda handler, keep the event loop management consistent and avoid creating loops in multiple places.

Simple Lambda:
lambda_handler()
  -> asyncio.run(main())


ASGI Lambda:
API Gateway
  -> Lambda
  -> Mangum
  -> FastAPI
  -> ASGI event loop behavior

Rule of thumb: use asyncio.run() for simple standalone async Lambda functions. Let frameworks manage the event loop when using ASGI frameworks such as FastAPI with Mangum.

When Async Helps

  • You call many external APIs inside one invocation.
  • You use async-compatible HTTP clients or database drivers.
  • You build API handlers with FastAPI or another async framework.
  • You process SQS batches where each message triggers async IO.
  • You want high concurrency with lower overhead than many threads.

Event Loop in Lambda

What the Event Loop Does

The event loop runs coroutines, pauses them when they wait, and resumes them when IO is ready. It does not make slow APIs faster. It prevents the Lambda invocation from wasting time while one operation waits.

Task A starts API call
Task A waits
Event loop runs Task B

Task B starts database call
Task B waits
Event loop runs Task C

Task A response is ready
Event loop resumes Task A

What Happens During await?

When Python reaches await, the current coroutine pauses. The event loop can run another coroutine while the first one waits. When the awaited operation completes, the event loop resumes the original coroutine.

async def get_user(client, user_id):
    response = await client.get(f"https://api.example.com/users/{user_id}")
    return response.json()

The code after await runs only when the HTTP response is ready. Other coroutines can run during that wait.

Blocking the Event Loop

The biggest async mistake in Lambda is blocking the event loop. Marking a function as async does not make blocking code non-blocking.

import requests

async def bad_example():
    response = requests.get("https://api.example.com/users/1")
    return response.json()

This is still blocking because requests.get() blocks the thread. A better async version uses an async HTTP client.

import httpx

async def good_example():
    async with httpx.AsyncClient() as client:
        response = await client.get("https://api.example.com/users/1")

    return response.json()

Important: if one blocking operation runs inside the event loop, it can delay every other coroutine in that invocation.

Threads vs Async in Lambda

Comparison Table

Feature Threads Async
Execution model Multiple threads in one Lambda process Coroutines managed by an event loop
Best for Blocking IO libraries Async-compatible IO libraries
Works with boto3 Yes Not directly, because boto3 is blocking
Works with requests Yes No, use httpx.AsyncClient or aiohttp
Memory usage Higher as thread count grows Lower for many waiting tasks
Complexity Shared-memory risks Event loop and cancellation complexity
CPU-heavy work Poor fit in CPython Poor fit
Good Lambda use case Parallel S3 or DynamoDB calls with boto3 Many async HTTP calls in one invocation

Decision Table

Situation Recommended Approach
Simple Lambda with one database call Sequential code
Multiple independent boto3 calls ThreadPoolExecutor
Multiple independent calls using requests ThreadPoolExecutor or switch to async HTTP client
Many HTTP calls with httpx.AsyncClient or aiohttp Async IO
FastAPI application on Lambda Async routes where useful, managed through Mangum
CPU-heavy image processing Increase memory/CPU, use workers, or use another compute service
Small script-style Lambda Keep it simple and synchronous

FastAPI and Mangum in Lambda

Why FastAPI Uses Async

FastAPI is built around ASGI and supports async route handlers. This is useful when routes perform IO, such as calling APIs, querying databases with async drivers, reading from Redis, or waiting for external services.

from fastapi import FastAPI
import httpx

app = FastAPI()

@app.get("/users/{user_id}")
async def get_user(user_id: int):
    async with httpx.AsyncClient() as client:
        response = await client.get(f"https://api.example.com/users/{user_id}")

    return response.json()

While the route waits for the external API, the event loop can handle other async work in the same invocation lifecycle. In Lambda, this matters most when one request needs multiple IO operations.

What Mangum Does

Mangum adapts API Gateway or Lambda Function URL events to an ASGI application such as FastAPI. It acts as a bridge between Lambda’s event format and the ASGI interface expected by FastAPI.

API Gateway
  |
AWS Lambda
  |
Mangum
  |
FastAPI
  |
Route handler
from fastapi import FastAPI
from mangum import Mangum

app = FastAPI()

@app.get("/health")
async def health():
    return {"status": "ok"}

handler = Mangum(app)

With this pattern, you normally do not call asyncio.run() manually inside every route. The ASGI adapter handles the framework integration.

Common FastAPI Lambda Mistakes

  • Using requests inside async def routes.
  • Creating a new database client on every request instead of reusing global clients when safe.
  • Making every function async even when no async IO is used.
  • Assuming async makes CPU-heavy code faster.
  • Forgetting timeouts on external API calls.
  • Opening too many connections during one Lambda invocation.

Real-World Examples

Calling Multiple External APIs

A common Lambda pattern is aggregating data from several APIs. Sequential calls increase latency. Async can reduce total duration if the APIs are independent.

import asyncio
import httpx

async def fetch(client, url):
    response = await client.get(url, timeout=5)
    response.raise_for_status()
    return response.json()

async def main():
    urls = [
        "https://api.example.com/profile",
        "https://api.example.com/orders",
        "https://api.example.com/permissions",
    ]

    async with httpx.AsyncClient() as client:
        profile, orders, permissions = await asyncio.gather(
            *(fetch(client, url) for url in urls)
        )

    return {
        "profile": profile,
        "orders": orders,
        "permissions": permissions,
    }

def lambda_handler(event, context):
    result = asyncio.run(main())

    return {
        "statusCode": 200,
        "body": result,
    }

Parallel S3 Operations

Because boto3 is synchronous, threads are often the practical option for parallel S3 operations.

from concurrent.futures import ThreadPoolExecutor
import boto3

s3 = boto3.client("s3")

def read_s3_object(key):
    response = s3.get_object(Bucket="my-bucket", Key=key)
    return response["Body"].read()

def lambda_handler(event, context):
    keys = [
        "reports/a.json",
        "reports/b.json",
        "reports/c.json",
    ]

    with ThreadPoolExecutor(max_workers=3) as executor:
        files = list(executor.map(read_s3_object, keys))

    return {
        "statusCode": 200,
        "body": {
            "objects_read": len(files),
        },
    }

Keep max_workers small and intentional. Too much parallelism can increase memory usage and hit downstream limits.

Multiple DynamoDB Queries

If a Lambda function needs several independent DynamoDB queries, threads can reduce duration. This is useful when the queries do not depend on each other.

from concurrent.futures import ThreadPoolExecutor
import boto3

dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("AppTable")

def get_item(pk, sk):
    response = table.get_item(
        Key={
            "pk": pk,
            "sk": sk,
        }
    )

    return response.get("Item")

def lambda_handler(event, context):
    keys = [
        ("USER#1", "META"),
        ("USER#1", "SETTINGS"),
        ("USER#1", "PERMISSIONS"),
    ]

    with ThreadPoolExecutor(max_workers=3) as executor:
        items = list(executor.map(lambda key: get_item(key[0], key[1]), keys))

    return {
        "statusCode": 200,
        "body": {
            "items": items,
        },
    }

This can improve latency, but it does not remove DynamoDB limits. You still need to consider read capacity, hot partitions, retries, and throttling.

SQS Batch Processing

Lambda can receive a batch of SQS messages. If each message requires an external API call, processing sequentially may be slow. Threads or async can help process independent messages concurrently.

from concurrent.futures import ThreadPoolExecutor
import requests

def process_record(record):
    payload = record["body"]

    response = requests.post(
        "https://api.example.com/process",
        json={"payload": payload},
        timeout=5,
    )

    response.raise_for_status()
    return response.json()

def lambda_handler(event, context):
    records = event.get("Records", [])

    with ThreadPoolExecutor(max_workers=5) as executor:
        results = list(executor.map(process_record, records))

    return {
        "processed": len(results),
    }

Be careful with partial failures. For SQS, your error handling strategy should match how you want messages retried or sent to a dead-letter queue.

Common Mistakes

Using async def as the Lambda Handler Directly

In normal Python Lambda functions, keep the Lambda entry point synchronous and call async code from inside it. Do not rely on Lambda automatically awaiting your coroutine handler.

# Avoid this as the direct Lambda handler.
async def lambda_handler(event, context):
    return {"statusCode": 200}

Use a synchronous wrapper instead.

import asyncio

async def main(event):
    return {"statusCode": 200}

def lambda_handler(event, context):
    return asyncio.run(main(event))

Creating Event Loops Everywhere

Event loop management should be centralized. Creating loops in many helper functions makes code harder to reason about and can cause runtime issues.

Better pattern: keep one async entry point such as main(), and call it once from the synchronous Lambda handler.

Blocking the Event Loop

Using blocking libraries inside async functions prevents the event loop from switching efficiently.

import requests

async def bad():
    response = requests.get("https://api.example.com")
    return response.json()

Use async-compatible libraries, or move blocking work to threads.

Assuming Background Tasks Continue After Return

Do not assume that background threads or async tasks will safely continue after the Lambda handler returns. Lambda is designed around the invocation lifecycle. If work must be completed reliably, finish it before returning or move it to another service such as SQS, EventBridge, Step Functions, or another Lambda invocation.

Using Async for CPU Work

Async does not make CPU-heavy Python code faster. If the function spends most of its time calculating, the event loop has nothing useful to switch to. Consider increasing memory, using optimized native libraries, using multiprocessing carefully, or moving the work to ECS, Batch, Step Functions, or another compute layer.

Unlimited Concurrency Inside One Invocation

Running too many threads or async tasks can overload downstream systems. Lambda concurrency, internal thread concurrency, async task concurrency, database connection limits, and API rate limits all interact.

import asyncio

async def limited_gather(tasks, limit):
    semaphore = asyncio.Semaphore(limit)

    async def run(task):
        async with semaphore:
            return await task()

    return await asyncio.gather(*(run(task) for task in tasks))

Rule of thumb: limit internal concurrency intentionally. Faster is not better if it causes throttling, retries, or downstream outages.

Production Recommendations

  • Remember that Lambda already scales horizontally. Use threads or async only to reduce waiting time inside one invocation.
  • Use simple synchronous code when it is fast enough. Do not add concurrency without a measured bottleneck.
  • Use ThreadPoolExecutor for blocking libraries. This is practical for boto3, requests, and synchronous database drivers.
  • Use async for async-compatible libraries. Async works best with httpx.AsyncClient, aiohttp, async Redis clients, and async database drivers.
  • Keep the Lambda handler synchronous. Wrap async logic inside a synchronous handler with a clear async entry point.
  • Do not block the event loop. Avoid blocking SDKs inside async functions unless you move them to a thread pool.
  • Use timeouts everywhere. Lambda has a timeout, but every network call should also have its own timeout.
  • Limit internal concurrency. Use thread pool sizes, semaphores, connection pools, and rate limits.
  • Reuse clients in global scope when safe. This can reduce warm invocation overhead.
  • Do not rely on background work after return. Use SQS, EventBridge, Step Functions, or another invocation for reliable async work.
  • Measure duration and cost. Concurrency should reduce real execution time, not only make the code look more advanced.

Conclusion

AWS Lambda already scales horizontally by creating more execution environments. Threads and async are not used to scale Lambda itself. They are used to improve concurrency inside a single invocation.

Use threads when you need concurrency around blocking libraries such as boto3, requests, or synchronous database drivers. Use async when your stack supports async libraries and you need many concurrent IO operations. For CPU-heavy work, neither threads nor async are usually the right primary solution.

Key takeaway: the best concurrency model in Lambda is the simplest one that reduces invocation duration without making the function difficult to maintain. Start simple, measure the bottleneck, add concurrency only where waiting time is actually hurting performance, and always protect downstream systems with limits and timeouts.

Comments (0)