SYSTEM_DESIGN
System Design: KYC (Know Your Customer) System
Design a KYC system handling identity verification, document validation, sanctions screening, risk scoring, and ongoing monitoring for financial institutions complying with BSA/AML and global regulations.
Requirements
Functional Requirements:
- Identity verification: validate customer identity using government-issued ID (passport, driver's license), selfie biometric matching, and SSN/TIN verification
- Document verification: authenticate identity documents using ML-based forgery detection
- Sanctions and PEP (Politically Exposed Persons) screening against global watchlists
- Risk scoring: assign customer risk levels (low, medium, high) based on identity, geography, and activity
- Enhanced Due Diligence (EDD) workflow for high-risk customers requiring additional documentation
- Ongoing monitoring: continuous screening against updated watchlists and transaction monitoring
Non-Functional Requirements:
- Process 100,000 KYC verifications/day with 90% automated pass-through for low-risk customers
- Identity verification completion within 60 seconds for automated cases
- False rejection rate below 2% (minimizing customer friction)
- Compliance with BSA/AML (US), 4AMLD/5AMLD/6AMLD (EU), FATF recommendations
- Full audit trail retained for 5 years after customer relationship ends
Scale Estimation
100K verifications/day = 1.2 verifications/sec average, peaking at 5/sec during business hours. Each verification involves: document image upload and analysis (2 images minimum: ID front + selfie), OCR extraction from ID, biometric face matching (ID photo vs. selfie), SSN verification against government databases, and sanctions screening. Document processing: 200K images/day at 5MB average = 1TB/day of identity document storage (highly sensitive, encrypted at rest). Biometric face matching: 100K comparisons/day using face embedding models. Sanctions screening: each name checked against 500K+ watchlist entries. Ongoing monitoring: 10M existing customers rescreened monthly against updated watchlists = 333K screenings/day.
High-Level Architecture
The KYC system is architected as a workflow-driven platform using Temporal for orchestrating the multi-step verification process. The system is divided into the Verification Pipeline (processes new customer applications), the Screening Engine (sanctions and PEP checks), the Risk Engine (assigns risk scores), and the Monitoring Service (ongoing compliance).
The verification flow: Customer uploads identity documents via the client application's KYC SDK → images are uploaded to an encrypted S3 bucket → the Verification Orchestrator (Temporal workflow) initiates parallel processing: (1) Document Verification Service analyzes the ID for authenticity, (2) OCR Service extracts text fields (name, DOB, address, ID number), (3) Biometric Service compares the selfie against the ID photo, (4) Data Verification Service validates extracted data against authoritative sources (SSN verification via SSA, address verification via USPS). Once all verifications complete, the Risk Engine computes a composite risk score. Low-risk customers (score <30 on a 0-100 scale) are auto-approved. Medium-risk (30-70) and high-risk (>70) customers are routed to a manual review queue.
The Screening Engine runs both at onboarding and continuously thereafter. At onboarding, it screens the customer's name and known aliases against OFAC SDN, EU consolidated lists, UN sanctions, Interpol notices, and country-specific PEP databases. Ongoing monitoring re-screens all existing customers whenever watchlists are updated (typically weekly) and also monitors transaction patterns for suspicious activity.
Core Components
Document Verification Service
The Document Verification Service authenticates identity documents using a multi-model ML pipeline. Stage 1: Document Classification — a CNN model identifies the document type (passport, driver's license, national ID) and issuing country from the image. Stage 2: Forgery Detection — a specialized model checks for signs of tampering: inconsistent fonts, misaligned security features, digital manipulation artifacts (assessed via noise analysis and JPEG compression artifacts), and validates holographic/UV features when captured under appropriate lighting (progressive web app guides users to tilt the ID). Stage 3: Data Extraction — an OCR pipeline optimized for identity documents extracts: full name, date of birth, document number, expiration date, address (where present), and the Machine Readable Zone (MRZ) for passports. MRZ check digits are validated algorithmically. The pipeline achieves 98% accuracy on document authentication with a 0.5% false acceptance rate.
Biometric Matching Service
The Biometric Service performs facial comparison between the ID photo and the live selfie. The process: (1) face detection using MTCNN to locate faces in both images, (2) liveness detection to prevent spoofing (analyzing micro-movements in a short video capture, checking for 3D depth consistency, detecting screen reflections or printed photo edges), (3) face encoding using a FaceNet model that produces a 128-dimensional embedding vector for each face, (4) similarity comparison using cosine distance between embeddings — a score above 0.85 threshold confirms a match. The service handles challenging cases: aging (ID photo may be 10 years old), glasses, facial hair changes, and different lighting conditions. For edge cases near the threshold (0.75-0.85), the system requests a second selfie with specific instructions (remove glasses, improve lighting).
Risk Scoring Engine
The Risk Engine assigns a composite risk score (0-100) based on multiple risk dimensions: (1) Identity Risk (verification confidence score, document authenticity score, biometric match score), (2) Geographic Risk (customer's country rated by FATF grey/black list status, Transparency International CPI score, and jurisdiction-specific risk ratings), (3) Product Risk (account type and expected transaction volume — high-value investment accounts score higher than basic checking), (4) Behavioral Risk (for existing customers: transaction patterns, unusual activity flags). Each dimension produces a sub-score weighted by configurable factors. The composite score maps to risk tiers: Low (0-30, auto-approve), Medium (31-70, enhanced automated checks), High (71-100, manual EDD required). Risk scores are recalculated monthly for existing customers based on updated transaction behavior.
Database Design
The KYC database is PostgreSQL with row-level encryption for PII fields. Core tables: customers (customer_id, legal_name_encrypted, dob_encrypted, ssn_hash, nationality, risk_score, risk_tier LOW/MEDIUM/HIGH, kyc_status PENDING/APPROVED/REJECTED/UNDER_REVIEW, onboarded_at, last_reviewed_at), verifications (verification_id, customer_id, type DOCUMENT/BIOMETRIC/SSN/ADDRESS, status PASSED/FAILED/MANUAL_REVIEW, confidence_score, details JSONB, verified_at), documents (document_id, customer_id, document_type PASSPORT/DL/NATIONAL_ID, s3_key_encrypted, ocr_data_encrypted JSONB, authenticity_score, uploaded_at, expires_at), screening_results (screening_id, customer_id, list_type OFAC/EU/UN/PEP, match_status NO_MATCH/POTENTIAL_MATCH/CONFIRMED_MATCH, matched_entity, match_score, reviewed_by, screened_at).
Document images are stored in an encrypted S3 bucket with a separate KMS key from the main application. Access to the bucket requires both IAM authentication and a signed token from the KYC service. Images are retained for 5 years after customer relationship ends (regulatory requirement) and automatically deleted thereafter via S3 lifecycle policies. A separate audit_log table records every access to customer PII: who accessed what, when, and why (purpose_code).
API Design
POST /v1/verifications— Initiate KYC verification; body contains customer_id, document_images (front, back), selfie_image, consent_token; returns verification_id, status (PROCESSING)GET /v1/verifications/{verification_id}— Check verification status and results; returns status, risk_score, risk_tier, individual check results (document, biometric, sanctions), required_actions if anyPOST /v1/screening/batch— Submit batch screening request for ongoing monitoring; body contains customer_ids[]; returns job_id for async result pollingGET /v1/customers/{customer_id}/risk-profile— Comprehensive risk profile including current risk score, tier, all historical verifications, screening results, and EDD status
Scaling & Bottlenecks
Document verification ML models are the primary compute bottleneck. The multi-model pipeline (classification + forgery detection + OCR) takes 8-12 seconds per document on GPU. With 200K images/day, the pipeline requires 20 T4 GPUs. During customer onboarding surges (fintech launch campaigns driving 10x normal volume), auto-scaling GPU nodes take 5-10 minutes to provision — a pre-warmed pool of 5 standby GPUs absorbs initial spikes. The biometric matching service is less GPU-intensive (FaceNet inference takes 100ms per pair) but liveness detection video processing adds 3-5 seconds.
Ongoing monitoring (re-screening 10M customers monthly against updated watchlists) is a batch-intensive workload. Naive approach: 10M customers × 500K watchlist entries = 5 trillion comparisons. Optimized approach: pre-compute phonetic hashes (Double Metaphone) and trigram tokens for all watchlist entries → build an inverted index → for each customer, generate the same tokens and look up potential matches in the index, achieving sub-millisecond per-customer screening. The monthly batch completes in 8 hours across 50 workers.
Key Trade-offs
- Multi-model ML pipeline over single end-to-end model: Separate models for classification, forgery detection, and OCR enable independent improvement and debugging of each stage, but increase total latency — parallel execution where possible (OCR and forgery detection run simultaneously) mitigates this
- Liveness detection via video over single photo: Video-based liveness with micro-movement analysis provides stronger anti-spoofing than single-photo analysis, but increases user friction (users must record a 3-second video) — reducing the video requirement to a simple head turn balances security and UX
- Pre-computed screening index over real-time fuzzy matching: Building an inverted index for watchlist screening enables sub-millisecond per-customer lookups, but requires index rebuilds on every watchlist update (weekly) — incremental index updates handle daily additions efficiently
- Auto-approval for low-risk over all-manual review: Automating 90% of verifications reduces onboarding time from days to minutes, but risks approving sophisticated identity fraud — a random audit of 1% of auto-approved cases provides ongoing quality assurance
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.