System Design: Electronic Health Records (EHR)

Requirements

Functional Requirements:

Store and retrieve complete patient medical records including demographics, diagnoses, medications, lab results, and clinical notes
Support clinical documentation workflows with structured and unstructured data entry (SOAP notes, templates)
Implement HL7 FHIR R4 APIs for interoperability with external health systems, labs, pharmacies, and insurers
Provide clinical decision support with drug-allergy alerts, duplicate order detection, and evidence-based guidelines
Maintain a complete, immutable audit trail of every record access and modification with user identity and timestamp
Role-based access control (RBAC) with break-the-glass emergency override for critical care scenarios

Non-Functional Requirements:

Support 500,000 concurrent users across 2,000+ healthcare facilities
Chart retrieval latency under 500ms for p99 — clinicians cannot wait during patient encounters
99.99% availability — EHR downtime directly impacts patient safety
Full HIPAA compliance: encryption at rest (AES-256) and in transit (TLS 1.3), audit logging, minimum necessary access
Data retention for 30+ years per regulatory requirements with no data loss

Scale Estimation

A large health system manages 50M patient records with an average of 200 clinical documents per patient over their lifetime, totaling 10B documents. Each patient encounter generates 5-15KB of structured data (vitals, orders, diagnoses as FHIR resources) plus 20-50KB of unstructured clinical notes. Daily write volume: 2M encounters/day producing 100GB of new clinical data. Read-heavy workload: clinicians access charts 30M times/day with an average of 8 FHIR resource queries per chart open = 240M read queries/day (2,800 QPS average, 15,000 QPS peak during morning rounds 7-9 AM). Audit log volume: every read and write is logged, producing 300M audit entries/day = 5TB/month of audit data.

High-Level Architecture

The EHR system follows a service-oriented architecture with clear domain boundaries aligned to clinical workflows. The API layer exposes HL7 FHIR R4-compliant RESTful endpoints through an API Gateway (Kong) handling OAuth 2.0/SMART on FHIR authentication, rate limiting, and request routing. Behind the gateway, domain services are organized by clinical function: Patient Demographics Service, Clinical Documentation Service, Order Entry Service (CPOE), Results Service, and Medication Service. Each service owns its data store but shares patient identity through a Master Patient Index (MPI).

The data architecture uses a polyglot persistence strategy. Structured clinical data (diagnoses, medications, lab results) is stored in PostgreSQL with a FHIR-native schema where each resource type maps to a table with a JSONB column for the FHIR resource payload plus extracted columns for common query fields. Unstructured clinical notes are stored in object storage (S3) with metadata indexed in Elasticsearch for full-text clinical search. The Master Patient Index uses PostgreSQL with probabilistic matching algorithms (Jaro-Winkler string similarity on name, DOB, SSN) to link patient identities across facilities.

All data is encrypted at rest using AES-256 with envelope encryption managed by AWS KMS. A dedicated Audit Service consumes events from every service via Kafka and writes immutable audit records to a time-series append-only store (TimescaleDB). The audit trail captures: who accessed what data, when, from which IP, for which patient, and whether break-the-glass was invoked.

Core Components

Master Patient Index (MPI)

The MPI is the identity backbone of the EHR, ensuring a single longitudinal record per patient across all facilities. When a patient registers, the MPI runs a probabilistic matching algorithm comparing incoming demographics (name, DOB, gender, SSN, address, phone) against existing records using weighted Jaro-Winkler similarity scores. Matches above 0.95 confidence auto-link; scores between 0.80-0.95 are queued for manual review by Health Information Management (HIM) staff. The MPI stores a golden record per patient with links to facility-specific medical record numbers. Enterprise Master Person Index (EMPI) federation allows cross-organization identity resolution via FHIR Patient/$match operations. The MPI runs on PostgreSQL with GiST indexes on trigram-indexed name fields for fast fuzzy search.

Clinical Documentation Service

This service handles the creation, versioning, and retrieval of clinical documents. Every document is stored as a FHIR DocumentReference resource with the actual content in S3. The service supports structured templates (FHIR Questionnaire/QuestionnaireResponse) for standardized data capture and free-text SOAP notes. Document versioning is append-only: updates create new versions linked to the previous version via a provenance chain (FHIR Provenance resource). The service enforces document-level access control — behavioral health notes, HIV status, and substance abuse records carry additional sensitivity flags (per 42 CFR Part 2) requiring explicit consent for access beyond the treatment team. Elasticsearch indexes clinical notes with medical NLP enrichment (extracted diagnoses, medications, procedures using cTAKES/MetaMap) enabling semantic clinical search.

Clinical Decision Support (CDS) Engine

The CDS Engine runs rules and alerts inline with clinical workflows. When a provider enters a medication order, the CDS Service evaluates it against: (1) the patient's documented allergies (drug-allergy cross-reference using RxNorm codes), (2) current medications for drug-drug interactions (using the FDB or Medi-Span knowledge base), (3) patient conditions for contraindications, and (4) evidence-based order sets and clinical practice guidelines. The engine uses a Rete-based rule engine (Drools) with rules authored by clinical informaticists and published as FHIR CDS Hooks services. Alert fatigue mitigation is critical — the system categorizes alerts by severity (hard stop, soft alert, informational) and suppresses previously overridden alerts of the same type for the same provider to reduce interruption burden.

Database Design

The primary clinical data store uses PostgreSQL (Citus-sharded by patient_id). The core schema follows a FHIR-native pattern: each FHIR resource type has a table (patient, encounter, observation, medication_request, condition, diagnostic_report) with columns: id (UUID), patient_id (shard key), resource_type, resource (JSONB containing the full FHIR resource), version_id, last_updated, status, and extracted search parameter columns (e.g., observation.code, observation.effective_date). GIN indexes on the JSONB resource column support arbitrary FHIR search parameters. The version history table stores every prior version of every resource, enabling point-in-time reconstruction of any patient chart.

The audit database uses TimescaleDB (time-series optimized PostgreSQL) with a hypertable partitioned by week: audit_id, timestamp, user_id, user_role, patient_id, resource_type, resource_id, action (CREATE/READ/UPDATE/DELETE), access_context (TREATMENT/PAYMENT/OPERATIONS/EMERGENCY), client_ip, facility_id, break_glass_flag. Continuous aggregates provide real-time dashboards for HIPAA compliance officers to detect unusual access patterns.

API Design

GET /fhir/r4/Patient/{id}/$everything — Retrieve the complete patient chart as a FHIR Bundle containing all linked resources (encounters, conditions, observations, medications); supports _since parameter for incremental sync
POST /fhir/r4/Encounter — Create a new patient encounter; body is a FHIR Encounter resource with participant references, service provider, and class (inpatient/outpatient/emergency)
POST /fhir/r4/MedicationRequest — Submit a medication order; triggers CDS Hooks evaluation; returns the created order plus any CDS alerts as OperationOutcome resources
GET /fhir/r4/Observation?patient={id}&category=laboratory&date=ge2024-01-01 — Search observations with FHIR search parameters; returns paginated Bundle with _count and _offset

Scaling & Bottlenecks

The chart retrieval query ($everything) is the most expensive operation, as it joins across multiple resource tables for a single patient. Citus sharding by patient_id ensures all of a patient's data is co-located on one shard, making $everything a single-shard query. With 50M patients across 32 shards, each shard holds ~1.5M patients. Read replicas (2 per shard) handle the read-heavy workload, with writes directed to the primary. Connection pooling via PgBouncer limits each shard to 200 connections. The morning rounds traffic spike (7-9 AM) is handled by pre-warming caches: a Redis cluster caches recently accessed patient summaries with a 30-minute TTL, reducing database load by 60% for repeat chart opens.

The audit log write path must not add latency to clinical operations. Audit events are published to Kafka asynchronously — the clinical service returns the response to the user before the audit record is confirmed written. Kafka provides durability guarantees (acks=all, min.insync.replicas=2) ensuring no audit records are lost. TimescaleDB ingests audit records at 5,000 writes/sec with batch inserts from Kafka consumers.

Key Trade-offs

FHIR-native storage over relational normalization: Storing full FHIR JSON resources in JSONB columns enables flexible querying and direct API serialization, but increases storage by 2-3x compared to fully normalized tables — acceptable given the query pattern benefits and reduced serialization complexity
Append-only document versioning over in-place updates: Never modifying or deleting clinical data ensures complete provenance and regulatory compliance, but increases storage requirements and complicates queries that need only the current state — mitigated by a current_version materialized view
Asynchronous audit logging over synchronous: Decoupling audit writes from clinical operations keeps chart retrieval fast, but creates a brief window where an access is not yet auditable — Kafka durability guarantees and a maximum 2-second lag make this acceptable for HIPAA compliance
Probabilistic MPI matching over deterministic identifiers: No universal patient identifier exists in the US healthcare system, so probabilistic matching is necessary — but it introduces false positives (merged records that shouldn't be) and false negatives (duplicate records), requiring ongoing HIM staff review