Scalability for Dummies — Part 1: Clones

By Oleksandr Andrushchenko, Published on Nov 24, 2025

Simple, practical guide to the first and often-most-used approach to scaling: making copies of what works. No PhD required — just common sense, a few engineering patterns, and awareness of the pitfalls.
By Your Friendly Engineer · Part 1 in a short series on practical scalability.

What are "clones"?

A clone is simply another identical instance of your application — a process, container, VM, or server running the same code and configuration. When load grows, you scale horizontally by adding more copies.
Cloning is intuitive: instead of turning one server into a monster machine, you add more normal-sized ones. Like adding more chefs to a kitchen instead of making one chef eight times bigger.

Why use clones?

Simplicity: Same code, same behavior, fewer surprises.
Predictability: If one instance handles X traffic, N instances handle roughly N×X.
Fault isolation: One bad clone doesn’t bring down the whole system.
Elasticity: Add/remove instances based on load.
Parallel processing: Multiple clones handle concurrent work.

The hard parts

1. Statefulness

If each clone keeps session data or files locally, switching users between clones breaks things. Stateless apps are easy to clone; stateful apps get painful fast.

2. Sticky sessions

Sticky sessions send a user to the same clone every time, avoiding state issues but creating hotspots and reducing reliability and flexibility.

3. Data consistency and shared resources

Multiple clones hitting the same database can cause race conditions and overload. Cloning your app doesn’t automatically clone your database capacity.

4. Configuration drift

If clones come from different builds or configs, subtle differences cause hard-to-debug issues. Always build from a single, versioned source of truth.

Patterns for scaling with clones

Stateless architecture

Move state out of the app and into external stores, such as:

Redis or Memcached for sessions
Object storage for files
Databases with connection pools and replicas

Load balancing

A load balancer distributes traffic across clones and removes unhealthy ones. Works with ALB/ELB, NGINX, HAProxy, Envoy, or Kubernetes Services.

Health checks and graceful shutdown

Clones must report health and stop accepting traffic correctly before shutting down. Otherwise, requests get dropped.

Autoscaling

Add or remove clones based on CPU, latency, queue depth, or custom signals. Tune carefully to avoid rapid thrashing.

Shared caches and rate limits

Shared caches reduce repeated expensive work; shared rate limits protect downstream systems when you suddenly add more clones.

A simple architecture example

A web API scaling from 1 to 100 clones might look like this:

Clients → Load Balancer → API clones
Clones are stateless; sessions go to Redis; files go to object storage
Database has read replicas for load distribution
Autoscaler manages clone count based on metrics

Client
   |
   v
+------------------+
|   Load Balancer  |
+------------------+
   |      |      |
   v      v      v
+------+ +------+ +------+
|API 1| |API 2| |API N|   ← clones (stateless)
+------+ +------+ +------+
    \       |       /
     \      |      /
      v     v     v

   +------------------+
   |   Redis / Cache  |
   +------------------+
            |
            v
   +------------------+
   | Primary Database |
   +------------------+
            |
            v
   +------------------+
   |  Read Replicas   |
   +------------------+

Quick example: Docker compose file

# docker-compose.yml (example)
version: '3.8'
services:
  web:
    image: myapp:latest
    deploy:
      replicas: 3
    ports:
      - "8080:80"
    environment:
      - REDIS_URL=redis:6379

  redis:
    image: redis:6-alpine

In Kubernetes, set the replicas field in a Deployment to create clones.

Pre-flight checklist

Is your app stateless or can it be made stateless?
Are health checks and graceful shutdown implemented?
Are your configurations reproducible?
Can downstream systems handle more concurrent traffic?
Do you have metrics, logging, and tracing?

Common mistakes

No observability — scaling blind.
Ignoring database bottlenecks.
Assuming clones won’t fail.
Relying on sticky sessions instead of fixing state.

Conclusion

Clones are the simplest and most reliable way to scale horizontally — especially for stateless services. But cloning doesn’t solve all bottlenecks. Prepare your data layer, eliminate local state, and monitor everything. When done right, cloning becomes your most dependable scaling tool. As we’ve seen, even perfectly cloned servers can get slowed down by the database. In Scalability for Dummies - Part 2: Database, we will dive into why databases become bottlenecks, how to scale them, and strategies like sharding, replication, and caching to keep your system fast and resilient.