Database Transactions
By Oleksandr Andrushchenko, Published on
Database transactions are the foundation of correctness in almost every non-trivial system. Any system handling money transfers, inventory updates, or user state changes relies on transactions to prevent partial writes and inconsistent reads. A payment service that debits an account but fails to credit another illustrates how missing transactional boundaries quickly lead to data corruption and operational incidents.
In production environments, transaction design is rarely about theory alone. Choices around isolation levels, locking strategies, and retry behavior directly affect latency, throughput, and failure modes. A high-throughput order processing system often discovers that default transaction settings work during development but fail under peak load, exposing deadlocks, long lock waits, or cascading retries.
What a Transaction Actually Guarantees
A transaction is a logical unit of work that groups multiple database operations into a single, atomic outcome. For example, creating an order record and decrementing product inventory should either both succeed or both fail. Without a transaction boundary, a system crash between statements leaves the database in an invalid intermediate state.
The classic ACID properties—Atomicity, Consistency, Isolation, and Durability—define these guarantees. In practice, most relational databases provide configurable isolation while always enforcing atomicity and durability. A banking ledger using PostgreSQL relies on write-ahead logging (WAL) to ensure committed balances survive power loss, while isolation determines how concurrent balance reads behave.
Atomicity and Consistency in Practice
Atomicity ensures that partial changes are never visible. A typical example is user registration, where an account row and a profile row must be created together. If profile creation fails due to a constraint violation, the account row must not exist either.
Consistency enforces database rules such as foreign keys, uniqueness constraints, and application invariants. For example, an inventory table may enforce that stock never goes below zero. A transaction that attempts to oversell items fails at commit time, preserving consistency even under concurrent purchase attempts.
- Wrap logically dependent SQL statements in a single transaction block.
- Rely on database constraints as a last line of defense, not only application checks.
- Fail fast inside transactions to minimize lock holding time.
Isolation Levels and Their Real-World Impact
Isolation controls how concurrent transactions interact. A reporting query reading orders while another transaction inserts new rows illustrates the difference between isolation levels. Under READ COMMITTED, the report may see new orders mid-query, while under REPEATABLE READ, the dataset remains stable.
Higher isolation reduces anomalies but increases locking and contention. SERIALIZABLE isolation prevents phantom reads but can cause transaction aborts under high concurrency. A flash-sale system often experiences a spike in serialization failures if SERIALIZABLE is used indiscriminately.
| Isolation Level | Prevents | Common Use Case | Trade-Off |
|---|---|---|---|
| READ COMMITTED | Dirty reads | CRUD APIs, dashboards | Non-repeatable reads |
| REPEATABLE READ | Dirty + non-repeatable reads | Financial summaries | Higher lock contention |
| SERIALIZABLE | All anomalies | Critical invariants | Transaction aborts |
Locking, Blocking, and Deadlocks
Locks are the mechanism databases use to enforce isolation. A transaction updating a row typically acquires a row-level exclusive lock. If another transaction attempts to update the same row, it blocks until the first transaction commits or rolls back.
Deadlocks occur when two transactions wait on each other’s locks. For example, transaction A updates row X then row Y, while transaction B updates row Y then row X. Most databases detect this pattern and abort one transaction, which must be retried by the application.
- Access shared resources in a consistent order across transactions.
- Keep transactions short and avoid user interaction inside them.
- Monitor deadlock metrics and lock wait times in production.
Transaction Boundaries
Defining correct transaction boundaries is as important as choosing isolation levels. A common anti-pattern is starting a transaction at request entry and committing at response exit. A REST endpoint performing external API calls inside a transaction holds locks far longer than necessary.
A better pattern isolates only the critical database mutations. For example, an order service may validate input and pricing outside the transaction, then open a transaction only to insert the order and adjust inventory. This approach minimizes contention and improves throughput.
-- Successful transaction:
BEGIN;
SELECT stock
FROM products
WHERE id = 42
-- Lock the row so concurrent transactions must wait
FOR UPDATE;
UPDATE products
SET stock = stock - 1
WHERE id = 42;
INSERT INTO orders (user_id, product_id, qty)
VALUES (10, 42, 1);
COMMIT;
-- Failed transaction (automatic rollback):
BEGIN;
SELECT stock
FROM products
WHERE id = 42
FOR UPDATE;
-- Assume stock = 0
UPDATE products
SET stock = stock - 1
WHERE id = 42;
-- Business rule violation detected
ROLLBACK;
Keep in mind the fact - transactions and locks solve different problems.
Retries and Idempotency
Transactions may fail due to deadlocks, serialization errors, or transient network issues. Retrying blindly can duplicate side effects unless idempotency is designed in. A payment API that retries a failed transaction without an idempotency key risks double charges.
Idempotent writes allow safe retries. For example, using a unique request identifier stored in a table ensures that repeated attempts return the same result. Combined with retry-on-conflict logic, this pattern improves reliability under contention.
- Retry only on known transient errors such as serialization failures.
- Use unique constraints or idempotency keys to prevent duplicates.
- Apply exponential backoff to reduce retry storms.
Distributed Transactions and Their Limits
Transactions across multiple databases or services introduce significant complexity. Two-phase commit (2PC) attempts to coordinate commits but often reduces availability. A microservices system updating an order service and a billing service synchronously may block if one service becomes unavailable.
Many modern systems avoid distributed transactions in favor of eventual consistency. For example, an order is created in one transaction, followed by an asynchronous billing event. Failures are handled through compensating actions, such as refund workflows.
| Approach | Consistency | Availability | Operational Cost |
|---|---|---|---|
| 2PC | Strong | Low under failure | High |
| Eventual Consistency | Eventual | High | Moderate |
Observability and Metrics
Transaction issues rarely surface during local testing. Production observability reveals slow commits, lock waits, and retry rates. For example, a sudden increase in average transaction duration often correlates with new queries added to critical paths.
Key metrics include transaction duration percentiles, deadlock counts, and rollback rates. Logging slow transactions with query plans allows targeted optimization, such as adding missing indexes or restructuring update patterns.
- Track p95 and p99 transaction durations.
- Alert on sustained increases in rollback or retry rates.
- Review slow query logs after each schema or query change.
Choosing the Right Trade-Offs
There is no universally correct transaction strategy. A financial ledger prioritizes correctness over throughput, often using higher isolation. A social media feed prioritizes availability and speed, accepting occasional anomalies.
Effective system design evaluates data criticality, concurrency patterns, and failure modes. Explicit transaction design, combined with observability and disciplined retries, prevents subtle data corruption and scaling bottlenecks as systems evolve.