ClickHouse vs PostgreSQL: Real-World Query Performance at Billion-Row Scale

By Oleksandr Andrushchenko — Published on

ClickHouse vs PostgreSQL
ClickHouse vs PostgreSQL

This benchmark compares PostgreSQL and ClickHouse using identical data and queries to evaluate performance in a read-heavy analytical workload.

Notes

Each query was executed once. Both PostgreSQL and ClickHouse were run with default configurations (no custom tuning).

The goal of this article is to demonstrate ClickHouse’s capabilities for analytical workloads, especially for teams storing historical data in PostgreSQL.

Benchmark Environment

ClickHouse: 3.9B rows (300M/month × 13)
PostgreSQL: 390M rows (30M/month × 13)

Both systems use the same dataset structure and indexing strategy. Full data generator (Laravel): ClickhouseOutgoingSmsTestsCommand.php.

  • CPU: Intel i7-1355U (10c/12t, up to 5.0 GHz)
  • RAM: 62 GB
  • Storage: 1.9 TB NVMe (ext4)
  • OS: Ubuntu 24.04 (Linux 6.17, x86_64)
  • Deployment: Docker Compose
  • PostgreSQL: 14.5 (postgres:14.5-bullseye, default config)
  • ClickHouse: 26.3 (clickhouse/clickhouse-server:26.3, default config)

ClickHouse

Table Schema:

f5801fb7ab8c :) SHOW CREATE TABLE outgoing_sms FORMAT TSVRaw;
...
CREATE TABLE default.outgoing_sms
(
    `account_id` UInt16,
    `gateway` LowCardinality(String),
    `phone` UInt64,
    `created_at` DateTime,
    `sender` LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (created_at, phone, account_id, sender, gateway)
SETTINGS index_granularity = 8192
...
Elapsed: 0.002 sec.

Table rows count:

f5801fb7ab8c :) SELECT count() FROM outgoing_sms;
...
   ┌────count()─┐
1. │ 3900000000 │ -- 3.90 billion
   └────────────┘
...
Elapsed: 0.003 sec.

Table disc space used:

2b23d97d412e :) SELECT formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE table = 'outgoing_sms' AND active = 1;
...
   ┌─size──────┐
1. │ 35.42 GiB │
   └───────────┘
...
Elapsed: 0.005 sec. 

Table rows examples:

2b23d97d412e :) SELECT * FROM outgoing_sms ORDER BY rand() LIMIT 3;
...
   ┌─account_id─┬─gateway───┬──────phone─┬──────────created_at─┬─sender─────┐
1. │        942 │ gateway8  │ 1139249300 │ 2025-04-07 20:59:24 │ sender_31  │
2. │        819 │ gateway7  │ 1297747971 │ 2025-11-19 12:05:55 │ sender_100 │
3. │        771 │ gateway10 │ 1175999072 │ 2025-04-28 21:19:38 │ sender_20  │
   └────────────┴───────────┴────────────┴─────────────────────┴────────────┘
...
Elapsed: 2.663 sec. Processed 3.90 billion rows, 46.74 GB (1.46 billion rows/s., 17.55 GB/s.). Peak memory usage: 72.67 MiB.

PostgreSQL

Table schema:

examples=> \d outgoing_sms;
                        Table "public.outgoing_sms"
   Column   |            Type             | Collation | Nullable | Default
------------+-----------------------------+-----------+----------+---------
 account_id | integer                     |           | not null |
 gateway    | text                        |           | not null |
 phone      | bigint                      |           | not null |
 created_at | timestamp without time zone |           | not null |
 sender     | text                        |           | not null |
Indexes:
    "outgoing_sms_idx_main" btree (created_at, phone, account_id, sender, gateway)

Table rows count:

examples=# SELECT count(*) FROM outgoing_sms;
   count   
-----------
 390000000
(1 row)

Time: 4226.194 ms (00:04.226)

Table disc space used:

examples=> SELECT pg_size_pretty(pg_total_relation_size('outgoing_sms')) AS size;
 size
-------
 49 GB
(1 row)

Time: 0.737 ms

Table rows examples:

examples=> SELECT * FROM outgoing_sms ORDER BY random() LIMIT 3;
 account_id |  gateway  |   phone    |         created_at         |  sender
------------+-----------+------------+----------------------------+-----------
        780 | gateway10 | 1005334874 | 2025-12-22 13:30:33.889572 | sender_35
        780 | gateway10 | 1006531245 | 2025-11-29 23:42:48.363665 | sender_20
        951 | gateway1  | 1009439392 | 2025-12-11 18:48:20.520782 | sender_19
(3 rows)

Time: 49903.070 ms (00:49.903)

Benchmark Queries

1. COUNT + GROUP BY phone

ClickHouse:

2b23d97d412e :) SELECT phone, count() FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY phone
LIMIT 3;
...
   ┌──────phone─┬─count()─┐
1. │ 1117500375 │       1 │
2. │ 1124490014 │       2 │
3. │ 1035951478 │       1 │
   └────────────┴─────────┘
...
Elapsed: 3.341 sec. Processed 298.70 million rows, 3.58 GB (89.41 million rows/s., 1.07 GB/s.).
Peak memory usage: 13.06 GiB.

PostgreSQL:

examples=> SELECT phone, count(*) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY phone
LIMIT 3;
   phone    | count
------------+-------
 1000000000 |     1
 1000000001 |     1
 1000000003 |     1
(3 rows)

Time: 5543.397 ms (00:05.543)

Results:

Engine Time Rows Processed Notes
ClickHouse 3.34 sec 298.7M High throughput, higher memory usage
PostgreSQL 5.54 sec ~30M (scaled dataset) Slower despite smaller dataset

Summary: ClickHouse outperforms PostgreSQL on large-scale aggregations, processing significantly more rows in less time despite higher memory consumption.

2. COUNT + GROUP BY phone (top-N)

ClickHouse:

2b23d97d412e :) SELECT phone, count()  FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY phone
ORDER BY count() DESC
LIMIT 3;
...
   ┌──────phone─┬─count()─┐
1. │ 1176538152 │       2 │
2. │ 1192629491 │       2 │
3. │ 1281635230 │       2 │
   └────────────┴─────────┘
...
Elapsed: 5.396 sec. Processed 298.70 million rows, 3.58 GB (55.36 million rows/s., 664.32 MB/s.).
Peak memory usage: 12.31 GiB.

PostgreSQL:

examples=> SELECT phone, count(*) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY phone
ORDER BY count(*) DESC
LIMIT 3;
   phone    | count
------------+-------
 1005979635 |     2
 1024774770 |     2
 1002306879 |     2
(3 rows)

Time: 14730.330 ms (00:14.730)

Results:

Engine Time Rows Processed Notes
ClickHouse 5.40 sec 298.7M Efficient sorting + aggregation over large dataset
PostgreSQL 14.73 sec ~30M (scaled dataset) Slower due to combined GROUP BY + ORDER BY on large intermediate set

Summary: ClickHouse handles top-N aggregations more efficiently, maintaining lower execution time even with a full dataset scan and sorting, while PostgreSQL shows significantly higher latency on the same pattern.

3. COUNT DISTINCT phone

ClickHouse:

2b23d97d412e :) SELECT countDistinct(phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00';
...
   ┌─countDistinct(phone)─┐
1. │            265162814 │ -- 265.16 million
   └──────────────────────┘

Elapsed: 4.035 sec. Processed 298.70 million rows, 3.58 GB (74.02 million rows/s., 888.28 MB/s.).
Peak memory usage: 7.53 GiB.

PostgreSQL:

examples=> SELECT count(distinct phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00';
  count
----------
 26515782
(1 row)

Time: 11007.715 ms (00:11.008)

Results:

Engine Time Result Notes
ClickHouse 4.04 sec 265.16M distinct Efficient approximate aggregation over full scan
PostgreSQL 11.01 sec 26.52M distinct Slower due to hashing/sorting for DISTINCT

Summary: ClickHouse computes DISTINCT counts significantly faster on large datasets, leveraging its columnar execution and optimized aggregation, while PostgreSQL incurs higher latency due to row-based processing and DISTINCT handling.

4. COUNT + GROUP BY account_id, phone

ClickHouse:

2b23d97d412e :) SELECT account_id, phone, count() FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY account_id, phone LIMIT 3;
...
   ┌─account_id─┬──────phone─┬─count()─┐
1. │        294 │ 1153551821 │       1 │
2. │        110 │ 1238940380 │       1 │
3. │        783 │ 1112154803 │       1 │
   └────────────┴────────────┴─────────┘
...
Elapsed: 4.305 sec. Processed 298.70 million rows, 4.18 GB (69.38 million rows/s., 971.36 MB/s.).
Peak memory usage: 19.08 GiB.

PostgreSQL:

examples=> SELECT account_id, phone, count(*) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY account_id, phone
LIMIT 3;
 account_id |   phone    | count
------------+------------+-------
          0 | 1000007580 |     1
          0 | 1000008730 |     1
          0 | 1000009936 |     1
(3 rows)

Time: 7078.238 ms (00:07.078)

Results:

Engine Time Rows Processed Notes
ClickHouse 4.31 sec 298.7M Multi-column aggregation with high memory usage
PostgreSQL 7.08 sec ~30M (scaled dataset) Slower but handles composite grouping efficiently on smaller dataset

Summary: ClickHouse maintains strong performance for multi-column GROUP BY queries at scale, though with increased memory consumption, while PostgreSQL performs reasonably on the smaller dataset but with higher execution time.

5. COUNT + GROUP BY account_id, phone (top-N)

ClickHouse:

2b23d97d412e :) SELECT account_id, phone, count() FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY account_id, phone
ORDER BY count() DESC
LIMIT 3;
...
   ┌─account_id─┬──────phone─┬─count()─┐
1. │        715 │ 1176691774 │       2 │
2. │        615 │ 1004560939 │       2 │
3. │        690 │ 1163640899 │       2 │
   └────────────┴────────────┴─────────┘
...
Elapsed: 13.535 sec. Processed 298.70 million rows, 4.18 GB (22.07 million rows/s., 308.96 MB/s.).
Peak memory usage: 24.36 GiB.

PostgreSQL:

examples=> SELECT account_id, phone, count(*) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY account_id, phone
ORDER BY count(*) DESC
LIMIT 3;
 account_id |   phone    | count
------------+------------+-------
          5 | 1004083438 |     2
        413 | 1012185151 |     2
        372 | 1022163696 |     2
(3 rows)

Time: 16656.843 ms (00:16.657)

Results:

Engine Time Rows Processed Notes
ClickHouse 13.54 sec 298.7M High cost due to aggregation + sorting on large intermediate result
PostgreSQL 16.66 sec ~30M (scaled dataset) Slower with top-N sorting on grouped results

Summary: For top-N queries on multi-column aggregations, ClickHouse performs better even with large intermediate datasets and sorting, while PostgreSQL shows higher latency under similar query patterns.

6. COUNT DISTINCT phone + GROUP BY account_id

ClickHouse:

2b23d97d412e :) SELECT account_id, countDistinct(phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY account_id
LIMIT 3;
...
   ┌─account_id─┬─countDistinct(phone)─┐
1. │          0 │               298307 │
2. │          1 │               298585 │
3. │          2 │               298663 │
   └────────────┴──────────────────────┘
...
Elapsed: 4.538 sec. Processed 298.70 million rows, 4.18 GB (65.82 million rows/s., 921.55 MB/s.).
Peak memory usage: 10.23 GiB.

PostgreSQL:

examples=> SELECT account_id, count(distinct phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY account_id
LIMIT 3;
 account_id | count
------------+-------
          0 | 14916
          1 | 29645
          2 | 29783
(3 rows)

Time: 5629.956 ms (00:05.630)

Results:

Engine Time Rows Processed Notes
ClickHouse 4.54 sec 298.7M Efficient distinct aggregation per group with moderate memory usage
PostgreSQL 5.63 sec ~30M (scaled dataset) Slower DISTINCT per group due to per-group hashing/sorting overhead

Summary: ClickHouse handles DISTINCT aggregations within GROUP BY efficiently at scale, maintaining competitive performance, while PostgreSQL incurs additional overhead when computing distinct values per group.

7. COUNT DISTINCT phone + GROUP BY account_id (top-N)

ClickHouse:

2b23d97d412e :) SELECT account_id, countDistinct(phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY account_id
ORDER BY countDistinct(phone) DESC
LIMIT 3;
...
   ┌─account_id─┬─countDistinct(phone)─┐
1. │        321 │               300848 │
2. │        890 │               300318 │
3. │         50 │               300288 │
   └────────────┴──────────────────────┘
...
Elapsed: 4.893 sec. Processed 298.70 million rows, 4.18 GB (61.05 million rows/s., 854.71 MB/s.).
Peak memory usage: 9.94 GiB.

PostgreSQL:

examples=> SELECT account_id, count(distinct phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY account_id
ORDER BY count(distinct phone) DESC
LIMIT 3;
 account_id | count
------------+-------
        250 | 30485
        593 | 30371
        571 | 30326
(3 rows)

Time: 12255.649 ms (00:12.256)

Results:

Engine Time Rows Processed Notes
ClickHouse 4.89 sec 298.7M Efficient distinct aggregation with ordering over pre-aggregated results
PostgreSQL 12.26 sec ~30M (scaled dataset) Slower due to DISTINCT computation combined with sorting on grouped aggregates

Summary: ClickHouse maintains strong performance for top-N queries over DISTINCT aggregates, while PostgreSQL shows higher latency when combining DISTINCT, GROUP BY, and ORDER BY operations.

8. COUNT + GROUP BY gateway, sender, phone

ClickHouse:

2b23d97d412e :) SELECT gateway, sender, phone, count() FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY gateway, sender, phone
LIMIT 3;
...
   ┌─gateway───┬─sender────┬──────phone─┬─count()─┐
1. │ gateway10 │ sender_66 │ 1202014038 │       1 │
2. │ gateway12 │ sender_52 │ 1113074900 │       1 │
3. │ gateway11 │ sender_25 │ 1127219607 │       1 │
   └───────────┴───────────┴────────────┴─────────┘
...
Elapsed: 28.784 sec. Processed 298.70 million rows, 4.18 GB (10.38 million rows/s., 145.28 MB/s.).
Peak memory usage: 31.18 GiB.

PostgreSQL:

examples=> SELECT gateway, sender, phone, count(*) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY gateway, sender, phone
LIMIT 3;
 gateway |  sender  |   phone    | count
---------+----------+------------+-------
 aws_ses | sender_1 | 1000000731 |     1
 aws_ses | sender_1 | 1000002137 |     1
 aws_ses | sender_1 | 1000002548 |     1
(3 rows)

Time: 28417.221 ms (00:28.417)

Results:

Engine Time Rows Processed Notes
ClickHouse 28.78 sec 298.7M High memory usage due to high-cardinality multi-column grouping
PostgreSQL 28.42 sec ~30M (scaled dataset) Comparable latency on smaller dataset, but less data processed

Summary: Multi-column high-cardinality GROUP BY significantly increases resource usage in ClickHouse, while PostgreSQL shows similar execution time on a smaller dataset but does not scale to the same data volume.

9. COUNT + GROUP BY gateway, sender, phone (top-N)

ClickHouse:

2b23d97d412e :) SELECT gateway, sender, phone, count() FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY gateway, sender, phone
ORDER BY count() DESC
LIMIT 3;
...
   ┌─gateway───┬─sender────┬──────phone─┬─count()─┐
1. │ gateway9  │ sender_61 │ 1126310721 │       2 │
2. │ gateway17 │ sender_76 │ 1288868318 │       2 │
3. │ gateway17 │ sender_40 │ 1052600883 │       2 │
   └───────────┴───────────┴────────────┴─────────┘
...
Elapsed: 51.013 sec. Processed 298.70 million rows, 4.18 GB (5.86 million rows/s., 81.98 MB/s.).
Peak memory usage: 30.88 GiB.

PostgreSQL:

examples=> SELECT gateway, sender, phone, count(*) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY gateway, sender, phone
ORDER BY count(*) DESC
LIMIT 3;
 gateway |  sender  |   phone    | count
---------+----------+------------+-------
 aws_ses | sender_1 | 1004283459 |     2
 aws_ses | sender_1 | 1022101875 |     2
 aws_ses | sender_1 | 1028962957 |     2
(3 rows)

Time: 43754.256 ms (00:43.754)

Results:

Engine Time Rows Processed Notes
ClickHouse 51.01 sec 298.7M High cost due to sorting over large, high-cardinality grouped data
PostgreSQL 43.75 sec ~30M (scaled dataset) Lower absolute time but on significantly smaller dataset

Summary: Top-N queries on high-cardinality multi-column aggregations with sorting are expensive in both systems, with ClickHouse showing higher absolute time due to full-scale data processing, while PostgreSQL benefits from the reduced dataset size.

10. COUNT DISTINCT phone + GROUP BY gateway, sender

ClickHouse:

2b23d97d412e :)  SELECT gateway, sender, countDistinct(phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY gateway, sender
LIMIT 3;
...
   ┌─gateway───┬─sender────┬─countDistinct(phone)─┐
1. │ gateway14 │ sender_96 │               327806 │
2. │ gateway15 │ sender_20 │               208841 │
3. │ gateway12 │ sender_55 │               268560 │
   └───────────┴───────────┴──────────────────────┘
...
Elapsed: 6.245 sec. Processed 298.70 million rows, 4.18 GB (47.83 million rows/s., 669.63 MB/s.).
Peak memory usage: 8.48 GiB.

PostgreSQL:

examples=> SELECT gateway, sender, count(distinct phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY gateway, sender
LIMIT 3;
 gateway |   sender   | count
---------+------------+-------
 aws_ses | sender_1   | 26695
 aws_ses | sender_10  | 26813
 aws_ses | sender_100 | 26882
(3 rows)

Time: 30876.245 ms (00:30.876)

Results:

Engine Time Rows Processed Notes
ClickHouse 6.25 sec 298.7M Efficient DISTINCT aggregation across grouped dimensions
PostgreSQL 30.88 sec ~30M (scaled dataset) Higher latency due to DISTINCT per group computation

Summary: ClickHouse demonstrates strong performance for DISTINCT aggregations combined with GROUP BY across multiple dimensions, while PostgreSQL incurs significantly higher latency when computing distinct values per group.

11. COUNT DISTINCT phone + GROUP BY gateway, sender (top-N)

ClickHouse:

2b23d97d412e :) SELECT gateway, sender, countDistinct(phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY gateway, sender
ORDER BY countDistinct(phone) DESC
LIMIT 3;
...
   ┌─gateway───┬─sender────┬─countDistinct(phone)─┐
1. │ gateway14 │ sender_32 │               329836 │
2. │ gateway14 │ sender_27 │               329752 │
3. │ gateway14 │ sender_19 │               329661 │
   └───────────┴───────────┴──────────────────────┘
...
Elapsed: 7.828 sec. Processed 298.70 million rows, 4.18 GB (38.16 million rows/s., 534.22 MB/s.).
Peak memory usage: 8.49 GiB.

PostgreSQL:

examples=> SELECT gateway, sender, count(distinct phone) FROM outgoing_sms
WHERE created_at >= '2025-09-05 00:00:00' AND created_at < '2025-10-05 00:00:00'
GROUP BY gateway, sender
ORDER BY count(distinct phone) DESC
LIMIT 3;
  gateway  |  sender   | count
-----------+-----------+-------
 gateway13 | sender_77 | 33215
 gateway13 | sender_98 | 33158
 gateway13 | sender_5  | 33153
(3 rows)

Time: 36286.881 ms (00:36.287)

Results:

Engine Time Rows Processed Notes
ClickHouse 7.83 sec 298.7M Efficient top-N over DISTINCT aggregates with moderate memory usage
PostgreSQL 36.29 sec ~30M (scaled dataset) High latency due to DISTINCT computation combined with sorting on aggregated results

Summary: ClickHouse performs significantly better for top-N queries over DISTINCT aggregates, especially when combined with GROUP BY and ORDER BY, while PostgreSQL shows substantially higher execution time under the same pattern.

Conclusions

Across all tested query patterns, ClickHouse demonstrates strong performance on large-scale analytical workloads, consistently processing hundreds of millions of rows efficiently. It performs particularly well for aggregation-heavy queries, DISTINCT computations, and multi-column GROUP BY operations, even when combined with ORDER BY and top-N limits.

PostgreSQL, while performant on smaller datasets, shows higher execution times as query complexity increases, especially for DISTINCT aggregations and large GROUP BY operations. Its performance is further impacted by row-based storage and the need to process significantly more intermediate data when scaling.

Overall, ClickHouse is better suited for read-heavy analytical workloads at billion-row scale, whereas PostgreSQL remains a strong general-purpose database but is less optimized for large-scale OLAP-style querying.

Comments (0)