Communication Protocols in Distributed Systems

By Oleksandr Andrushchenko, Published on

Modern distributed systems rely on communication protocols that define how services exchange data. As a software developer designing APIs, microservices, real-time systems, or event-driven platforms, choosing the correct protocol directly impacts latency, scalability, observability, cost, and developer productivity.

In this article, we will analyze the most widely used protocols in system design:

Protocol Built On Model Performance Real-Time Strengths Weaknesses Examples
HTTP / REST TCP + TLS Request / Response Medium No Simple, ubiquitous, cacheable, stateless scaling Over-fetching, chatty APIs Public APIs, CRUD services
gRPC HTTP/2 over TCP + TLS RPC + Streaming High Streaming High performance, strongly typed contracts Binary tooling complexity Internal microservices
WebSocket HTTP Upgrade → TCP Bidirectional Persistent High Yes Low latency push, persistent connection Connection scaling complexity Live dashboards, chat, real-time apps
GraphQL HTTP (usually) Query-Based Medium No (Subscriptions possible) Flexible data fetching, single endpoint Query abuse risk, high schema complexity Frontend data aggregation, flexible APIs
MQTT TCP / TLS Publish / Subscribe High Yes Lightweight, efficient for unstable networks Limited native security, broker dependency IoT telemetry, device communication
AMQP TCP Message Queue High Async Reliable delivery, routing flexibility Broker management overhead Reliable async workflows
Kafka Protocol TCP Event Streaming (Log) Very High Async High throughput, replayable events Operational complexity Event-driven architectures

1. HTTP / REST

HTTP is the foundation of the web. REST (Representational State Transfer) is an architectural style built on top of HTTP using verbs like GET, POST, PUT, DELETE.


HTTP/REST example
HTTP/REST example

Advantages

  • Universal support
  • Easy debugging
  • Human-readable (JSON)
  • Cache support
  • Stateless design

Disadvantages

  • Over-fetching and under-fetching
  • High latency in chatty services
  • No built-in schema enforcement
  • Text-based overhead

When to Use / Real-World Use Cases

  • Public APIs and client-server communication with simple request/response models (e.g., mobile apps, web apps, third-party integrations)
  • CRUD-based applications and resource management systems (e.g., user management, product catalogs, content platforms)
  • Systems requiring strong caching support and HTTP ecosystem tooling (e.g., CDN caching, browser caching, API gateways)
  • External integrations and public-facing services (e.g., payment APIs, SaaS APIs, partner integrations)
  • Simple architectures where transparency, simplicity, and wide adoption are priorities (e.g., small to medium backend services)

Example

Server (get/create user API, Node.js Express):

const express = require('express');
const app = express();

app.use(express.json());

// Get user by ID
app.get('/users/:id', (req, res) => {
  res.json({ id: req.params.id, name: "John" });
});

// Create new user
app.post('/users', (req, res) => {
  res.status(201).json({ id: 1, ...req.body });
});

app.listen(3000);

Client (get/create user API calls, JavaScript):

// Get user by ID
async function getUser(id) {
  const response = await fetch(`http://localhost:3000/users/${id}`);
  if (!response.ok) {
    throw new Error("Failed to fetch user");
  }
  return await response.json();
}

getUser(1).then(console.log).catch(console.error);

// Create new user
async function createUser(userData) {
  const response = await fetch("http://localhost:3000/users", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify(userData),
  });

  if (!response.ok) {
    throw new Error("Failed to create user");
  }

  return await response.json();
}

createUser({ name: "Alice" }).then(console.log).catch(console.error);

2. gRPC

gRPC is a high-performance RPC framework built on HTTP/2 and Protocol Buffers.


gRPC
gRPC

Key Characteristics

  • Binary serialization (Protobuf)
  • HTTP/2 multiplexing
  • Strongly typed contracts
  • Streaming support

Advantages

  • High performance
  • Low latency
  • Code generation
  • Streaming support

Disadvantages

  • Harder debugging
  • Binary protocol not human-readable
  • Less browser-native support

When to Use / Real-World Use Cases

  • High-performance internal microservices communication (e.g., auth service → user service → billing service)
  • Backend-to-backend RPC communication with strict contracts (e.g., payment processing, order validation)
  • Systems requiring low-latency and high-throughput communication (e.g., trading systems, real-time data processing)
  • Polyglot environments with strongly typed service definitions (e.g., services written in Go, Java, Python, Node)
  • Streaming data between services or from client to server (e.g., live log streaming, data ingestion pipelines)

Example

Protobuf definition:

syntax = "proto3";

service UserService {
  rpc GetUser (UserRequest) returns (UserResponse);
}

message UserRequest {
  int32 id = 1;
}

message UserResponse {
  int32 id = 1;
  string name = 2;
}

Server (get user service, Node.js):

const grpc = require("@grpc/grpc-js");
const protoLoader = require("@grpc/proto-loader");
const PROTO_PATH = "./user.proto";
const packageDefinition = protoLoader.loadSync(PROTO_PATH);
const proto = grpc.loadPackageDefinition(packageDefinition);

const userService = {
  GetUser: (call, callback) => {
    const userId = call.request.id;

    // Example fake DB
    const user = {
      id: userId,
      name: "John Doe",
    };

    callback(null, user);
  },
};

const server = new grpc.Server();

server.addService(proto.UserService.service, userService);

server.bindAsync("0.0.0.0:50051", grpc.ServerCredentials.createInsecure(), () => {
  console.log("🚀 gRPC server running on port 50051");
  server.start();
});

Client (get user service, JavaScript):

const grpc = require("@grpc/grpc-js");
const protoLoader = require("@grpc/proto-loader");
const PROTO_PATH = "./user.proto";
const packageDefinition = protoLoader.loadSync(PROTO_PATH);
const proto = grpc.loadPackageDefinition(packageDefinition);

// Create client
const client = new proto.UserService(
  "localhost:50051",
  grpc.credentials.createInsecure()
);

// Call RPC
client.GetUser({id: 1}, (error, response) => {
  if (error) {
    console.error("Error:", error);
    return;
  }

  console.log("User received from gRPC:", response);
});

3. WebSocket

WebSocket enables full-duplex persistent connections between client and server.

Advantages

  • Real-time communication
  • Low overhead after handshake
  • Bidirectional

Disadvantages

  • No built-in reconnection logic
  • Harder horizontal scaling
  • Requires sticky sessions or pub/sub backend

When to Use / Real-World Use Cases

  • Real-time communication between client and server (e.g., chat applications, live support systems)
  • Live dashboards and monitoring tools (e.g., system metrics, stock prices, analytics updates)
  • Collaborative applications requiring instant updates (e.g., shared editing, whiteboards, multiplayer apps)
  • Applications requiring server-to-client push notifications (e.g., alerts, status updates, activity feeds)
  • High-frequency data streaming with persistent connections (e.g., IoT live monitoring, gaming events)

Example

Server (get/create user, Node.js):

const WebSocket = require("ws");
const wss = new WebSocket.Server({port: 8080});

let users = [{id: 1, name: "John"}];

wss.on("connection", (ws) => {
  ws.on("message", (msg) => {
    const data = JSON.parse(msg);

    if (data.action === "getUser") {
      const user = users.find(u => u.id === data.id);
      ws.send(JSON.stringify(user || {}));
    }

    if (data.action === "createUser") {
      const newUser = {
        id: users.length + 1,
        name: data.name,
      };

      users.push(newUser);
      ws.send(JSON.stringify(newUser));
    }
  });
});

console.log("WebSocket server running on ws://localhost:8080");

Client (get/create user, JavaScript):

const ws = new WebSocket("ws://localhost:8080");

ws.onopen = () => {
  ws.send(JSON.stringify({action: "getUser", id: 1}));
  ws.send(JSON.stringify({action: "createUser", name: "Alice"}));
};

ws.onmessage = (event) => {
  console.log("Response:", event.data);
};

4. GraphQL

GraphQL is a query language for APIs allowing clients to request exactly the data they need.

Advantages

  • Eliminates over-fetching
  • Strong schema
  • Single endpoint

Disadvantages

  • Complex caching
  • N+1 query problem
  • More complex backend

When to Use / Real-World Use Cases

  • Frontend-driven applications requiring flexible data fetching (e.g., React/SPA apps, mobile apps)
  • Applications with complex or nested data relationships (e.g., user → orders → products → reviews)
  • Backend-for-Frontend (BFF) layer for aggregating multiple services (e.g., combining user + billing + profile data)
  • APIs where clients need control over response fields to avoid over-fetching (e.g., dashboards, analytics tools)
  • Public or internal APIs requiring schema-based contracts and introspection (e.g., developer platforms, ecosystem APIs)

Example

Server (get/create user API, Node.js Apollo):

const {ApolloServer, gql} = require("apollo-server");

let users = [{id: 1, name: "John"}];

// Schema (Types + Operations)
const typeDefs = gql`
  type User {
    id: Int!
    name: String!
  }

  type Query {
    getUser(id: Int!): User
  }

  type Mutation {
    createUser(name: String!): User
  }
`;

// Resolvers (Business Logic)
const resolvers = {
  Query: {
    getUser: (_, {id}) => {
      return users.find(user => user.id === id);
    },
  },

  Mutation: {
    createUser: (_, {name}) => {
      const newUser = {
        id: users.length + 1,
        name,
      };

      users.push(newUser);
      return newUser;
    },
  },
};

const server = new ApolloServer({typeDefs, resolvers});

server.listen({port: 4000}).then(({url}) => {
  console.log(`🚀 GraphQL server running at ${url}`);
});

Client (get/create user API calls, JavaScript):

async function req(query, variables = {}) {
  const response = await fetch("http://localhost:4000/", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      query,
      variables,
    }),
  });

  const result = await response.json();
  return result.data;
}

// Get User
req(
  `
  query GetUser($id: Int!) {
    getUser(id: $id) {
      id
      name
    }
  }
  `,
  {id: 1}
).then(console.log);

// Create User
req(
  `
  mutation CreateUser($name: String!) {
    createUser(name: $name) {
      id
      name
    }
  }
  `,
  {name: "Alice"}
).then(console.log);

5. MQTT

MQTT is a lightweight publish/subscribe protocol designed for IoT.

Characteristics

  • Low bandwidth usage
  • Pub/Sub model
  • QoS levels (0,1,2)

Advantages

  • Very lightweight
  • Battery-efficient
  • Reliable delivery options

Disadvantages

  • Not ideal for complex APIs
  • Limited message size

When to Use / Real-World Use Cases

  • IoT device communication with lightweight messaging (e.g., sensors, smart home devices, industrial equipment)
  • Real-time telemetry data collection (e.g., temperature monitoring, GPS tracking, device metrics)
  • Unstable or low-bandwidth network environments (e.g., remote devices, mobile-connected hardware)
  • Publish/subscribe systems for event distribution (e.g., device status updates, live alerts)
  • Systems requiring persistent lightweight connections with QoS guarantees (e.g., remote device control, fleet management)

Example:

Publisher (Python, sends temperature data):

import paho.mqtt.client as mqtt
import json
import random
import time

BROKER = "localhost"
PORT = 1883
TOPIC = "devices/temperature"

client = mqtt.Client()
client.connect(BROKER, PORT)

print("✅ Publisher connected to broker")

while True:
    message = {
        "deviceId": 1,
        "temperature": random.randint(10, 40),
        "timestamp": int(time.time())
    }

    client.publish(TOPIC, json.dumps(message))
    print("📤 Published:", message)

    time.sleep(2)

Subscriber (Python):

import paho.mqtt.client as mqtt
import json

BROKER = "localhost"
PORT = 1883
TOPIC = "devices/temperature"

def on_connect(client, userdata, flags, rc):
    print("✅ Subscriber connected")
    client.subscribe(TOPIC)
    print(f"📡 Subscribed to {TOPIC}")

def on_message(client, userdata, msg):
    data = json.loads(msg.payload.decode())
    print("📩 Topic:", msg.topic)
    print("📊 Received:", data)

client = mqtt.Client()
client.on_connect = on_connect
client.on_message = on_message

client.connect(BROKER, PORT)
client.loop_forever()

6. AMQP

Advanced Message Queuing Protocol used by systems like RabbitMQ.

Advantages

  • Reliable messaging
  • Flexible routing
  • Durability

Disadvantages

  • Operational complexity
  • Broker dependency

When to Use / Real-World Use Cases

  • Reliable async communication with guaranteed message delivery (e.g., order events, payment confirmations)
  • Background job processing and task queues (e.g., image processing, report generation, email sending)
  • Complex service-to-service routing via message broker (e.g., microservices event distribution)
  • Order, payment, and transaction processing systems (e.g., e-commerce checkout pipeline)
  • Event-driven architectures requiring durability and retries (e.g., audit logging, analytics ingestion)
  • Enterprise system integration and workflow automation (e.g., legacy system integration, ERP communication)

Example

Publisher (Python, order service):

import pika
import json

connection = pika.BlockingConnection(
    pika.ConnectionParameters("localhost")
)

channel = connection.channel()

# Create queue
channel.queue_declare(queue="orders")

order = {
    "orderId": 1,
    "amount": 250,
    "status": "CREATED"
}

channel.basic_publish(
    exchange="",
    routing_key="orders",
    body=json.dumps(order)
)

print("📤 Order published:", order)

connection.close()

Consumer (Python, payment service):

import pika
import json

def callback(ch, method, properties, body):
    order = json.loads(body)
    print("📩 Received order:", order)

    # Simulate processing
    print("💳 Processing payment...")

    # Acknowledge message
    ch.basic_ack(delivery_tag=method.delivery_tag)

connection = pika.BlockingConnection(
    pika.ConnectionParameters("localhost")
)

channel = connection.channel()
channel.queue_declare(queue="orders")
channel.basic_consume(
    queue="orders",
    on_message_callback=callback,
    auto_ack=False
)

print("✅ Waiting for messages...")
channel.start_consuming()

7. Kafka Protocol

Apache Kafka protocol is optimized for distributed streaming and event sourcing.

Advantages

  • High throughput
  • Durable log storage
  • Event replay

Disadvantages

  • Complex operations
  • Eventual consistency challenges

When to Use / Real-World Use Cases

  • High-throughput event streaming systems (e.g., user activity tracking, clickstream analytics)
  • Event-driven microservices architectures (e.g., order events, inventory updates, payment events)
  • Real-time data pipelines and stream processing (e.g., log aggregation, metrics ingestion, fraud detection)
  • System decoupling through asynchronous event communication (e.g., service-to-service event publishing)
  • Data integration between distributed systems with replay capability (e.g., audit logs, data replication, CDC)

Example

Producer (Python, order service):

from kafka import KafkaProducer
import json
import time

producer = KafkaProducer(
    bootstrap_servers="localhost:9092",
    value_serializer=lambda v: json.dumps(v).encode("utf-8")
)

topic = "orders"
order_id = 1

while True:
    event = {
        "orderId": order_id,
        "amount": 100 + order_id,
        "status": "CREATED"
    }

    producer.send(topic, event)
    print("📤 Published:", event)

    order_id += 1
    time.sleep(2)

Consumer (Python, analytics or payment service):

from kafka import KafkaConsumer
import json

consumer = KafkaConsumer(
    "orders",
    bootstrap_servers="localhost:9092",
    auto_offset_reset="earliest",
    enable_auto_commit=True,
    value_deserializer=lambda x: json.loads(x.decode("utf-8"))
)

print("✅ Waiting for messages...")

for message in consumer:
    event = message.value
    print("📩 Received:", event)

    # Simulate processing
    print("⚙ Processing order:", event["orderId"])

Final Thoughts

There is no universally “best” protocol. The correct choice depends on:

  • Latency requirements
  • Data size
  • Traffic patterns
  • Consistency model
  • Operational expertise

The best architects understand trade-offs, not just technologies.