SYSTEM_DESIGN
System Design: Tax Filing Platform
Design a government tax filing platform handling millions of submissions during peak season, with real-time validation, fraud detection, and secure storage of sensitive financial records. Covers reliability, auditability, and compliance.
Requirements
Functional Requirements:
- Taxpayers submit annual returns with income, deductions, and supporting documents
- Real-time validation of form completeness and arithmetic before submission
- Integration with employer wage reporting APIs to pre-populate income fields
- Calculate refund or amount owed and initiate payment/disbursement
- Issue confirmation receipts with a legally binding submission timestamp
- Support amended returns, extensions, and installment payment plans
Non-Functional Requirements:
- Handle 10M submissions during the final week before the filing deadline
- 99.95% availability; filing system downtime during deadline week is a legal issue
- All financial data encrypted at rest with AES-256; in transit with TLS 1.3
- Immutable audit trail for every state change in a submission record
- Accessibility: WCAG 2.1 AA compliance across all interfaces
Scale Estimation
For a country with 150M taxpayers filing annually: ~60% file in the final 4 weeks = 90M returns over 28 days = 3.2M per day average, with a 3x spike in the final week = ~9.6M/day or ~111 returns/second. Document storage: average 5 attachments × 1MB per return × 90M = 450TB per season. Validation API calls to employer wage services: ~500M calls/season.
High-Level Architecture
The platform is structured around a Submission Pipeline: an API Gateway receives submissions, a Validation Service checks business rules synchronously, and a Processing Queue handles the asynchronous computation (tax calculation, fraud scoring, payment initiation). This decoupling lets the API tier respond quickly with an acknowledgment while heavy computation runs in the background.
Third-party integrations (employer wage APIs, bank verification, payment processors) are wrapped in an Integration Hub with circuit breakers and retry queues. When an upstream employer API is slow during peak season, submissions queue up rather than timing out. Pre-population of returns is done in a separate pre-filing window: the system calls employer APIs in batch during off-peak hours and stores prefill data in a staging area for taxpayers to review.
A Fraud Detection Service consumes every submission event from Kafka and runs rule-based and ML-based checks asynchronously. Flagged returns are routed to a Review Queue for auditor attention rather than blocking the filing confirmation. Audit investigators access submissions through a separate internal portal with its own access control tier.
Core Components
Submission & Validation Service
Handles the synchronous leg of the filing flow. Validates schema conformance, required fields, arithmetic consistency (income - deductions = taxable income), and cross-field business rules. Returns structured validation errors in a single response, not one error at a time. Once validation passes, assigns a unique submission ID, writes to the database with status PENDING, and enqueues a processing message. The response to the taxpayer includes the submission ID and a timestamped receipt hash.
Tax Calculation Engine
A stateless computation service consuming from the processing queue. Applies the tax year's rate tables, credit rules, and deduction limits to compute the liability or refund. Rate tables are loaded from a versioned configuration store (not hardcoded) so annual tax law changes are deployable without code changes. Results are written back to the submission record with status CALCULATED. For complex returns (business income, foreign assets), the engine routes to a specialized calculation worker pool.
Payment & Disbursement Service
For refunds: initiates ACH transfers via a banking API, polling for confirmation and updating status through PAYMENT_INITIATED → PAYMENT_CONFIRMED. For amounts owed: generates payment instructions with a unique payment reference, monitors the payment processing system for receipt confirmation, and issues a final settled receipt. All payment state transitions are logged to the audit trail. Failed payments trigger a retry workflow with exponential backoff and taxpayer notification.
Database Design
Submission records are stored in PostgreSQL with status machine fields: submission_id, taxpayer_id, tax_year, status ENUM, submitted_at, calculated_at, refund_amount, tax_owed. A companion submission_history table (append-only) records every status transition with timestamp and actor, forming the immutable audit trail. Row-level security policies restrict access so agents can only read returns assigned to their region.
Attachments (W-2s, 1099s, receipts) are stored in S3 with server-side encryption (SSE-KMS) with customer-managed keys. S3 Object Lock with Compliance mode prevents deletion or overwrite for a configurable retention period (7 years for tax records). A relational index maps submission_id → [s3_keys]. PII fields (SSN, bank account) are tokenized using a format-preserving encryption service before storage.
API Design
POST /api/v1/returns — submits a new tax return; synchronous validation, async processing; returns {submission_id, receipt_hash, status: "PENDING"}.
GET /api/v1/returns/{submissionId} — returns current status, calculated amounts, and any review flags.
POST /api/v1/returns/{submissionId}/amend — initiates an amended return, locking the original from further amendment until resolved.
GET /api/v1/prefill/{taxYear} — returns pre-populated income data from employer integrations for the authenticated taxpayer.
Scaling & Bottlenecks
The filing deadline creates the most predictable traffic spike in government systems. Auto-scaling policies for the submission tier are pre-configured to scale to 10x normal capacity starting 3 days before the deadline. Database connection pooling (PgBouncer) is critical — each submission service instance should not hold direct connections. Read replicas handle status polling traffic (most users check status repeatedly), routing away from the primary write path.
The validation service is CPU-bound during complex return validation. A separate high-CPU compute tier handles these, while simple returns (W-2 only) are routed to a lightweight validation pool. The integration hub's employer API calls are the main latency source; pre-batch-fetching during off-peak hours and caching the results dramatically reduces real-time dependency on third-party APIs during the deadline rush.
Key Trade-offs
- Synchronous vs. asynchronous processing: Fully synchronous processing gives taxpayers immediate results but blocks during computation; the hybrid approach (sync validation, async calculation) gives fast acknowledgment with a polling model for results.
- Pre-population accuracy vs. staleness: Prefilling from employer data reduces errors but if fetched days earlier may miss late-amended W-2s; a refresh mechanism on submission detects discrepancies.
- Audit trail granularity: Logging every field change provides maximum auditability but at significant storage cost; logging status transitions and full snapshots at key milestones is a practical middle ground.
- Monolith vs. microservices: A monolith simplifies the transaction model for submission + calculation + payment; microservices allow independent scaling of the calculation tier during peak but introduce distributed transaction complexity.
GO DEEPER
Master this topic in our 12-week cohort
Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.