System Design: Organizational Chart Builder

Requirements

Functional Requirements:

Create and edit organizational hierarchies with managers, direct reports, and departments
Support drag-and-drop reorganization of employees across teams and reporting lines
Provide full-text search by employee name, title, or department
Export org charts as PDF, PNG, or shareable read-only links
Role-based access: HR admins can edit; managers see their subtree; employees see read-only view
Track history of org changes with timestamps and author attribution

Non-Functional Requirements:

Support orgs with up to 500,000 employees without degradation in query performance
Render charts for subtrees of 10,000 nodes in under 2 seconds
99.9% uptime with optimistic concurrency for simultaneous edits
All changes must be audit-logged for compliance
GDPR-compliant data handling with PII masking for export

Scale Estimation

For a large enterprise SaaS product: assume 5,000 companies, average 2,000 employees each = 10M employee records. Peak edit traffic during annual reorgs can spike 20x normal load. Chart rendering is read-heavy — 95% reads, 5% writes. Expect 50,000 concurrent viewers during all-hands events.

High-Level Architecture

The system separates the org data model from the rendering layer. A Graph Service owns the canonical tree stored in a relational DB with adjacency list and materialized path columns. A read-optimized Projection Service materializes subtrees into Redis for fast chart rendering. An Event Bus propagates changes so downstream services (directory sync, HRIS integrations) stay consistent.

For concurrent edits, the system uses optimistic locking on node versions. When two HR admins attempt to reparent the same employee simultaneously, the second write gets a conflict response and must re-fetch before retrying. A CRDT-inspired merge strategy handles non-conflicting changes (editing different subtrees) automatically.

An Export Service handles async chart generation. Large org exports are queued as jobs, processed by workers, stored in S3, and delivered via signed URL. This prevents large PDF renders from blocking the API.

Core Components

Graph Service

Owns CRUD operations on the org tree. Uses PostgreSQL with a nodes table containing id, parent_id, employee_id, path (materialized path as ltree), and version. The ltree extension enables efficient ancestor/descendant queries with GiST indexes. All writes go through this service, which emits change events to Kafka.

Projection & Cache Layer

Consumes Kafka events and maintains Redis sorted sets representing each manager's direct reports and subtree membership. Chart rendering clients call a dedicated read endpoint backed by Redis, achieving sub-10ms response for subtrees up to 1,000 nodes. A TTL-based invalidation strategy ensures stale cache entries are evicted within 30 seconds of a structural change.

Access Control Engine

Enforces visibility rules at query time. Given a requesting user's employee ID, the engine resolves their role and maximum visible scope (full org, own subtree, department only). Visibility filters are pushed into SQL/Redis queries rather than applied post-fetch, preventing information leakage and improving performance at scale.

Database Design

The primary store is PostgreSQL with the ltree extension. The org_nodes table holds node_id UUID, employee_id UUID, parent_node_id UUID, path ltree, depth INT, version BIGINT, created_at, updated_at. The path column enables queries like "all nodes under VP of Engineering" as a single indexed range scan. A separate org_history table captures every write with old/new parent, editor ID, and timestamp for audit trails.

For search, an Elasticsearch index is kept in sync via the Kafka event stream. Documents include employee name, title, department, email, and manager chain as a denormalized string, supporting full-text and faceted search across the entire org without hitting PostgreSQL.

API Design

GET /api/v1/org/subtree?root={nodeId}&depth={n} — returns the subtree rooted at nodeId up to n levels deep, from Redis cache.

PUT /api/v1/org/nodes/{nodeId}/parent — reparents a node; body includes new_parent_id and expected_version for optimistic locking.

GET /api/v1/org/search?q={query}&scope={nodeId} — full-text search within a scope, proxied to Elasticsearch.

POST /api/v1/org/exports — enqueues an export job; returns job_id. Poll GET /api/v1/org/exports/{jobId} for status and download URL.

Scaling & Bottlenecks

The main bottleneck is mass reorgs — when a company restructures an entire division, thousands of nodes get reparented in minutes. Bulk write APIs with transactional batching mitigate single-row update overhead. The materialized path column must be recomputed for all descendants on reparent; this is offloaded to an async worker that processes subtree path updates out-of-band, with the chart temporarily showing a "pending" state.

For very large orgs (100k+ employees), chart rendering must be paginated. The UI requests one level at a time (lazy expansion), so no single API call fetches more than a few hundred nodes. Server-side pagination with cursor-based navigation keeps memory bounded. Redis memory usage is managed by only caching recently accessed subtrees with LRU eviction.

Key Trade-offs

Materialized path vs. adjacency list: Materialized path gives O(1) ancestor queries but requires path recomputation on reparent. Adjacency list is simpler to write but requires recursive CTEs for traversal.
Optimistic vs. pessimistic locking: Optimistic locking scales better for read-heavy org browsing but causes retry friction during bulk reorgs; a short-lived advisory lock per subtree root can reduce conflicts.
Sync vs. async cache invalidation: Synchronous invalidation on write keeps caches always fresh but adds write latency; async via Kafka is faster for writes but introduces a brief inconsistency window.
Full-tree export vs. lazy rendering: Generating a full PDF of a 100k-node org is expensive; scoping exports to subtrees with configurable depth limits prevents runaway resource usage.