SYSTEM_DESIGN
System Design: Loan Origination System
Design a loan origination system covering application intake, automated underwriting, credit decisioning, document management, and compliance with TILA/RESPA regulations for processing thousands of loan applications daily.
Requirements
Functional Requirements:
- Multi-channel loan application intake (web, mobile, branch, broker portal)
- Automated credit decisioning with configurable underwriting rules and ML risk models
- Document collection, verification, and management (income verification, identity docs, property appraisals)
- Workflow engine managing the loan lifecycle from application through funding
- Compliance checks for TILA (Truth in Lending Act), RESPA, ECOA (Equal Credit Opportunity Act)
- Integration with credit bureaus (Experian, Equifax, TransUnion), income verification services, and title companies
Non-Functional Requirements:
- Process 10,000 loan applications/day with automated decisioning in under 30 seconds
- Support complex loan products (mortgage, auto, personal, business) with configurable parameters
- 99.9% availability for the application intake flow
- Audit trail for every decision with explainability for adverse action notices
- Data encryption at rest and in transit with role-based access to PII
Scale Estimation
At 10,000 applications/day = 0.12 applications/sec average, peaking at 1 application/sec during business hours. Each application triggers 5-8 third-party API calls: 3 credit bureau pulls (Experian, Equifax, TransUnion), income verification (Plaid/Finicity), identity verification (Socure/Jumio), fraud screening, and property valuation for mortgages. Document storage: average 15 documents per application at 2MB each = 300GB/day of new documents. Active pipeline: 50,000 loans in various stages simultaneously, each with 20-30 data fields updated by multiple parties (borrower, loan officer, underwriter, closing agent). Workflow events: 500K state transitions/day across all active loans.
High-Level Architecture
The loan origination system (LOS) follows a workflow-centric architecture where a central Workflow Engine (Camunda or Temporal) orchestrates the loan lifecycle. The system is organized into domain services: Application Service (intake and data capture), Decision Engine (automated underwriting), Document Service (storage and OCR), Compliance Service (regulatory checks), and Integration Service (third-party API management).
The loan lifecycle flows through stages: Application → Pre-qualification → Full Application → Credit Decision → Conditional Approval → Document Collection → Underwriting Review → Final Approval → Closing → Funding → Servicing Handoff. The Workflow Engine manages transitions between stages, enforcing prerequisites (e.g., cannot move to Final Approval until all conditions are satisfied) and triggering automated actions (e.g., pull credit report when application is submitted, send adverse action notice on decline). Each stage can have multiple parallel activities: while the borrower uploads income documents, the system simultaneously pulls credit reports and runs fraud screening.
The Decision Engine implements a two-tier architecture: a rules engine (Drools) evaluates hard cutoffs (minimum credit score, maximum DTI ratio, prohibited property types) and a ML model provides risk scoring for borderline applications. The rules engine ensures regulatory compliance (no decisions based on prohibited factors under ECOA), while the ML model optimizes for default prediction.
Core Components
Decision Engine
The Decision Engine evaluates loan applications through a pipeline: data enrichment (augmenting the application with credit bureau data, income verification results, property valuation) → rules evaluation (200+ configurable rules organized by product type) → ML scoring (gradient-boosted model predicting probability of default at 12, 24, and 60 months) → decision synthesis (combining rules output and ML score into APPROVE, DECLINE, or REFER for manual review). Rules are version-controlled and deployed independently of application code — compliance teams can modify rules without engineering releases. Every decision records the full input feature set, all rules evaluated (pass/fail), ML model version and score, and the final decision rationale — required for adverse action notices under ECOA and for model governance audits.
Document Management Service
The Document Service handles the lifecycle of loan documents: upload, classification, OCR extraction, verification, and storage. Documents are uploaded by borrowers (pay stubs, tax returns, bank statements) or generated by the system (disclosures, pre-approval letters). An ML-based document classifier (fine-tuned ResNet) automatically categorizes uploaded documents into 40+ types (W-2, 1099, bank statement page 1, bank statement page 2, etc.). OCR (Tesseract + GPT-4 for complex layouts) extracts key fields: employer name and income from pay stubs, deposit amounts from bank statements. Extracted data is compared against application data for consistency — discrepancies are flagged for underwriter review. Documents are stored in S3 with AES-256 encryption and lifecycle policies (retain for 7 years post-closing per TILA).
Compliance & Regulatory Service
The Compliance Service runs checks at every stage of the loan lifecycle. At application: HMDA (Home Mortgage Disclosure Act) data collection and fair lending monitoring. At decisioning: adverse action notice generation per ECOA with specific reasons (template library of 200+ reason codes). At closing: TILA disclosure accuracy verification (APR calculation must be within 1/8% of correct value), RESPA good faith estimate comparison to final closing costs (tolerance checking). The service also runs continuous fair lending monitoring — statistical analysis of approval rates by demographic group (race, gender, age) to detect potential disparate impact, using the 80% rule and regression analysis. Results are reported to the compliance team with drill-down by decision factor.
Database Design
The LOS uses PostgreSQL as the primary store. Core tables: applications (application_id, borrower_id, co_borrower_id, product_type, loan_amount, property_address, status, loan_officer_id, branch_id, created_at), credit_reports (report_id, application_id, bureau, score, report_data JSONB, pulled_at), decisions (decision_id, application_id, decision_type AUTO/MANUAL, result APPROVE/DECLINE/REFER, rules_results JSONB, ml_score, model_version, reasons JSONB, decided_by, decided_at), conditions (condition_id, application_id, condition_type, description, status PENDING/RECEIVED/WAIVED/SATISFIED, document_id).
The workflow state is managed by Temporal/Camunda's internal database (separate from the LOS database). Document metadata is stored in PostgreSQL (document_id, application_id, document_type, file_key_s3, ocr_data JSONB, classification_confidence, uploaded_by, uploaded_at) while document binary content resides in S3. A full-text search index (Elasticsearch) enables searching across applications by borrower name, address, loan officer, or status for pipeline management.
API Design
POST /v1/applications— Submit a loan application; body contains borrower info, loan details, property info, consent for credit pull; returns application_id and initial pre-qualification resultGET /v1/applications/{app_id}/decision— Retrieve the decisioning result including approval status, conditions, rate/term offer, and adverse action reasons if declinedPOST /v1/applications/{app_id}/documents— Upload a document; multipart form with file and optional document_type hint; returns document_id, auto-classified type, and OCR extraction resultsPATCH /v1/applications/{app_id}/conditions/{condition_id}— Update a condition status (satisfy, waive); triggers workflow progression if all conditions are met
Scaling & Bottlenecks
The primary bottleneck is third-party API latency. Credit bureau pulls take 2-5 seconds each; income verification 5-10 seconds; property valuation 10-30 seconds. Running these sequentially would make decisioning take 30+ seconds. The Integration Service runs all independent calls in parallel using async/await patterns, with circuit breakers per provider (Resilience4j) and fallback logic (if Experian is down, proceed with Equifax and TransUnion scores only). Response caching with a 24-hour TTL prevents redundant pulls when an application is re-evaluated.
Document OCR processing is CPU-intensive and bursty. Borrowers often upload 10-15 documents at once, creating processing spikes. This is handled by a queue-based architecture: uploads are accepted immediately (stored in S3), and OCR jobs are submitted to an SQS queue consumed by auto-scaling worker pods. OCR results typically complete within 60 seconds and trigger a WebSocket notification to the loan officer's dashboard.
Key Trade-offs
- Workflow engine (Temporal/Camunda) over custom state machine: Workflow engines provide durability, retry logic, and visibility out of the box, but introduce a dependency on the engine's database and add operational complexity — the payoff is dramatic reduction in custom orchestration code
- Rules engine + ML model over pure ML decisioning: The rules engine ensures hard regulatory compliance (no prohibited factor decisions) while ML optimizes risk prediction — pure ML would risk inadvertent fair lending violations
- Parallel third-party calls over sequential: Parallel execution reduces total decisioning time from 30+ seconds to 10 seconds, but complicates error handling — partial results may be sufficient for a preliminary decision, with full data required before final approval
- OCR extraction with human review over manual data entry: OCR reduces loan officer workload by 60% but introduces extraction errors — confidence thresholds below 90% trigger mandatory human verification
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.