Lead Lookup System Design (case study)

By Eugene, Published on

System Description

A high-scale lead management system designed to store and retrieve contact information for 200M users. The system supports lookups by phone number, email address, and SSN to resolve associated user profiles. A typical request maps a single contact identifier to exactly one user.

Primary Use Cases

  • Fast contact lookups (phone / email / SSN → user profile)
  • Multiple contacts per user (avg: 1 phone, 1 email; ~0.001% have 2–3)
  • Contact reassignment between users
  • Full contact history tracking

Requirements

Functional Requirements

  • Lookups: Find user by phone, email, or SSN
  • Contact management:
    • User can have multiple phones (avg 1, rarely 2–3)
    • User can have multiple emails (avg 1, rarely 2–3)
    • User has exactly one SSN (1:1 relationship)
  • Uniqueness: Each phone and email belongs to only one user at a time
  • History: Track all contact changes in a separate table (no impact on lookup performance)
  • Operations: Add/remove contacts, reassign contacts, update lead data

Non-Functional Requirements

  • Scale: 200M users
  • Read load: ~200K lookups/day (~2–3 QPS)
  • Write load: 10–30% of users updated once every 1–2 months

Solution 1: PostgreSQL

Migrations

CREATE TABLE leads ( 
user_id BIGSERIAL PRIMARY KEY, 
lead_data JSONB NOT NULL, 
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), 
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
); 
CREATE TABLE phone_index ( 
phone VARCHAR(20) PRIMARY KEY, 
user_id BIGINT NOT NULL REFERENCES leads(user_id) ON DELETE CASCADE, 
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), 
CONSTRAINT unique_phone UNIQUE (phone) 
); 

CREATE INDEX idx_phone_hash ON phone_index USING hash(phone); 
CREATE INDEX idx_phone_user ON phone_index USING btree(user_id); 
CREATE TABLE email_index (
email VARCHAR(255) PRIMARY KEY, 
user_id BIGINT NOT NULL REFERENCES leads(user_id) ON DELETE CASCADE, 
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), 
CONSTRAINT unique_email UNIQUE (email) 
); 

CREATE INDEX idx_email_hash ON email_index USING hash(email); 
CREATE INDEX idx_email_user ON email_index USING btree(user_id); 
CREATE FUNCTION hash_text(input TEXT) RETURNS BIGINT AS $$ BEGIN RETURN ('x' || substr(md5(input), 1, 16))::bit(64)::bigint; END; $$ LANGUAGE plpgsql IMMUTABLE; 

CREATE TABLE ssn_index ( 
ssn_hash BIGINT PRIMARY KEY, 
ssn_last4 CHAR(4) NOT NULL, 
user_id BIGINT UNIQUE NOT NULL REFERENCES leads(user_id) ON DELETE CASCADE, 
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() 
); 

CREATE INDEX idx_ssn_hash ON ssn_index USING hash(ssn_hash);
CREATE INDEX idx_ssn_last4 ON ssn_index USING btree(ssn_last4); 
CREATE INDEX idx_ssn_user ON ssn_index USING btree(user_id); 
CREATE TABLE contact_history ( 
id BIGSERIAL PRIMARY KEY, 
user_id BIGINT NOT NULL REFERENCES leads(user_id) ON DELETE CASCADE,
contact_type VARCHAR(20) NOT NULL CHECK (contact_type IN ('phone', 'email')), 
contact_value VARCHAR(255) NOT NULL, 
added_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), 
removed_at TIMESTAMPTZ 
); 

CREATE INDEX idx_contact_history_user ON contact_history(user_id, added_at DESC); 
CREATE INDEX idx_contact_history_contact ON contact_history(contact_type, contact_value); 

Query Examples (PostgreSQL)

SELECT l.* 
FROM leads l 
JOIN phone_index p ON l.user_id = p.user_id 
WHERE p.phone = '12025551234'; 
SELECT l.* 
FROM leads l 
JOIN email_index e ON l.user_id = e.user_id 
WHERE e.email = 'john.doe@example.com'; 

Solution 2: ClickHouse (ReplacingMergeTree)

Core Problem

ReplacingMergeTree is optimized for append-only analytical workloads, not transactional lookups with updates. Deduplication occurs during background merges, making lookup correctness timing-dependent. Queries may return multiple versions unless explicitly constrained.

Key Issues

  • Eventual consistency: Multiple versions exist before merge
  • Query overhead: FINAL or ORDER BY + LIMIT required (10–50× slower)
  • No foreign keys: No ON DELETE CASCADE, orphaned rows possible
-- Reassignment example 
INSERT INTO phone_index VALUES ('+1234567890', 123, now()); 
INSERT INTO phone_index VALUES ('+1234567890', 456, now()); 

SELECT * 
FROM phone_index 
WHERE phone = '+1234567890'; -- Returns both rows until merge 

Why ClickHouse Still Has Advantages

  • Excellent compression: 5–10× smaller than PostgreSQL
  • Fast analytics: 10–100× faster aggregations
  • Automatic deduplication: No manual version cleanup
  • Append-only writes: 100K+ inserts/sec, no B-tree overhead
SELECT toYYYYMM(added_at) AS month, contact_type, count() AS additions, uniq(user_id) AS unique_users 
FROM contact_history 
GROUP BY month, contact_type 
ORDER BY month DESC; 

Solution 3: Hybrid (PostgreSQL + ClickHouse)

Architecture

  • PostgreSQL (hot path): phone_index, email_index, ssn_index
    • O(1) lookups
    • ACID uniqueness guarantees
  • ClickHouse (storage & analytics):
    • leads table
    • contact_history table

Query Flow

-- Step 1: lookup 
SELECT user_id 
FROM phone_index 
WHERE phone = '+12025551234'; 

-- Step 2: fetch lead data 
SELECT lead_data 
FROM leads 
WHERE user_id = 123 
ORDER BY updated_at DESC 
LIMIT 1;