SYSTEM_DESIGN

System Design: Mortgage Application Platform

Design a digital mortgage application platform supporting multi-step application workflows, document collection, underwriting integration, and loan status tracking. Covers compliance, data security, and third-party integration patterns.

13 min readUpdated Jan 15, 2025
system-designreal-estatemortgagefintechdocument-processingcompliance

Requirements

Functional Requirements:

  • Multi-step mortgage application with progressive disclosure and save-and-resume
  • Document collection: pay stubs, W-2s, bank statements with automated data extraction
  • Integration with credit bureaus for hard and soft credit pulls
  • Automated underwriting with rule-based pre-approval and risk scoring
  • Loan officer assignment and borrower-lender communication portal
  • Loan status tracking from application through closing with milestone notifications

Non-Functional Requirements:

  • Application data encrypted at rest with field-level encryption for PII and financial data
  • Compliant with RESPA, TRID, HMDA, and Equal Credit Opportunity Act
  • Audit trail for all underwriting decisions (required by ECOA for adverse action notices)
  • Document retention: 3 years post-closing minimum per RESPA
  • 99.9% uptime; borrowers have rate lock expiry deadlines

Scale Estimation

For a large lender: 500k loan applications/year = ~1,370/day = ~1/minute. Each application involves 5-20 document uploads averaging 2MB each = up to 40MB per application = 20TB of documents/year. Credit bureau API calls: 2 pulls per application × 500k = 1M calls/year. Underwriting rule evaluations: 500k applications × 200 rules = 100M rule evaluations/year. Concurrent loan officers: 10k at peak (morning hours) working their pipelines.

High-Level Architecture

The platform is organized around an Application Service, a Document Processing Pipeline, an Underwriting Engine, and a Loan Management System. The Application Service handles the borrower-facing intake flow — a multi-step wizard that auto-saves progress and validates inputs in real time. Completed applications are submitted to the Document Processing Pipeline, which extracts data from uploaded documents and feeds the Underwriting Engine.

Document processing is async and CPU-intensive. An OCR and data extraction worker fleet (using AWS Textract for financial documents) processes uploads from an SQS queue. Extracted data (income, assets, liabilities) is reconciled against applicant-stated values and discrepancies are flagged for loan officer review. A confidence score per extracted field determines whether human review is required.

The Underwriting Engine applies FNMA/FHLMC guidelines and lender overlays as a rule set. Rules include DTI (debt-to-income) ratio checks, LTV (loan-to-value) checks, credit score thresholds, and property appraisal requirements. The engine is deterministic and fully auditable: each evaluation stores the input data snapshot, rules version, and decision with reason codes. ECOA-required adverse action notices are auto-generated from the decision record when applications are denied.

Core Components

Application Service & Progressive Disclosure Engine

A form workflow engine driven by a JSON schema defining multi-step application sections (personal info, income, assets, liabilities, property) with conditional logic (self-employed flows show additional sections). Each section auto-saves to a application_drafts store on blur/section-exit. The wizard resumes from the last completed section on return visits. Form validation applies TRID timing rules (Loan Estimate must be issued within 3 business days of application receipt) and triggers compliance workflow events.

Document Processing Pipeline

An SQS-backed worker fleet processing uploaded documents. Each document goes through: virus scan → format validation → OCR extraction (Textract for paystubs and bank statements, custom models for complex documents) → structured data extraction (income figures, account balances) → confidence scoring → match-to-application-field. Low-confidence extractions are queued for loan officer review via the task management interface. Processed document metadata and extracted data are stored in PostgreSQL; raw files in S3 with SSE-KMS encryption.

Underwriting Rules Engine

A forward-chaining rules engine evaluating loan eligibility. Rules are defined in a DSL by underwriting analysts and compiled to executable artifacts. The engine is invoked multiple times per application: initial pre-qualification (soft rules, applicant-stated data), full underwriting (hard credit pull, verified income), and pre-closing (final check after appraisal). Each evaluation is immutable — the inputs, rules version, and outputs are stored in underwriting_decisions. Reason codes map to HMDA/ECOA-required disclosure language for adverse action letters.

Database Design

Applications: application_id UUID, borrower_id, co_borrower_id, property_address, loan_purpose ENUM, loan_amount, property_value, status ENUM, loan_officer_id, created_at, locked_rate, rate_lock_expiry. A application_fields JSONB column stores all borrower-entered data to accommodate schema evolution without migrations.

Documents: documents (doc_id, application_id, doc_type ENUM, s3_key, uploaded_at, processing_status ENUM, extracted_data JSONB, confidence_score, reviewed_by). Underwriting: underwriting_decisions (decision_id, application_id, stage ENUM, rules_version, input_snapshot JSONB, decision ENUM, reason_codes[], created_at). Communications: loan_messages (message_id, application_id, sender_id, recipient_id, body, sent_at, read_at) — all lender-borrower communications are logged here for compliance.

API Design

POST /api/v1/applications — creates a new loan application and issues a Loan Estimate within 3 business days (compliance trigger).

POST /api/v1/applications/{appId}/documents — uploads a document; returns {doc_id, processing_status}; async processing begins immediately.

GET /api/v1/applications/{appId}/underwriting — returns current underwriting status, required conditions, and any adverse action information.

POST /api/v1/applications/{appId}/messages — sends a message within the loan communication portal; logged for compliance.

Scaling & Bottlenecks

Document processing is CPU-bound and bursty. The SQS worker fleet auto-scales based on queue depth. AWS Textract has regional rate limits; the platform uses multiple AWS accounts across regions to maximize throughput. High-priority documents (rate-lock-expiring applications) are placed in a priority queue with a dedicated worker pool. A circuit breaker detects Textract throttling and falls back to synchronous human review routing.

The underwriting engine is stateless and parallelizable. Rule evaluation for a single application takes 50-200ms depending on complexity. Horizontal scaling of the engine service handles burst demand at quarter-end (high loan application volume). Rules deployment requires zero-downtime rolling updates with version pinning — in-flight applications must complete evaluation on the same rules version they started with.

Key Trade-offs

  • Automated vs. human underwriting: Full automation maximizes throughput and consistency but cannot handle edge cases well; a hybrid model routes clear pass/fail applications to automation and edge cases to human underwriters, balancing speed and accuracy.
  • Progressive disclosure vs. upfront full application: Progressive disclosure improves completion rates but means data collected early may be invalidated by later answers (income type affects which documents are required); full upfront collection is more efficient but has higher abandonment.
  • Field-level vs. full-record encryption: Field-level encryption for SSN and bank account numbers protects the highest-risk data with minimal performance impact; full-record encryption is stronger but makes SQL queries on encrypted fields impossible without decryption.
  • Rules DSL vs. code: A DSL lets underwriting analysts update rules without engineering, critical for regulatory changes; but DSL tooling requires investment in validation, testing, and debugging infrastructure that pure-code rule systems already have.

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.