SYSTEM_DESIGN

System Design: Employee Onboarding Platform

System design of an employee onboarding platform covering task workflow automation, document collection, training assignment, IT provisioning integration, and compliance tracking for new hires.

15 min readUpdated Jan 15, 2025
system-designemployee-onboardingworkflow-automationhr-tech

Requirements

Functional Requirements:

  • HR creates onboarding workflows with customizable task checklists (role-based templates for engineering, sales, marketing, etc.)
  • New hires complete pre-boarding tasks before day one: tax forms (W-4, I-9), emergency contacts, direct deposit setup, equipment preferences
  • Automated IT provisioning: create email accounts, assign software licenses, provision laptop, grant access to required systems
  • Training and orientation scheduling: assign required training modules, schedule orientation sessions, track completion
  • Buddy/mentor assignment with introduction workflows
  • Progress dashboard for HR, managers, and new hires showing onboarding completion status

Non-Functional Requirements:

  • Support 10K companies onboarding 500K new hires/month
  • Task completion updates reflected across all dashboards within 5 seconds
  • 99.9% availability; onboarding delays create poor first impressions
  • Document storage with encryption and retention policies (I-9 forms retained 3 years after termination per federal law)
  • Integration with 50+ third-party systems (HRIS, identity providers, IT asset management, LMS)

Scale Estimation

500K new hires/month = 16,700/day = 0.19/sec. Each onboarding workflow has 30-50 tasks = 15M-25M tasks created/month. Task completion events: assuming tasks are completed over 30 days, 20M completions/month = 7.7/sec. Document uploads: 500K hires × 5 documents each = 2.5M documents/month. IT provisioning API calls: 500K hires × 10 system provisioning actions = 5M API calls/month. Email/notification volume: 500K hires × 20 notifications = 10M/month. Storage: 2.5M documents × 500KB average = 1.25TB/month.

High-Level Architecture

The platform uses a workflow engine at its core with an integration layer for third-party systems. The Workflow Engine manages onboarding workflows as DAGs of tasks with dependencies, assignments, and deadlines. When a new hire record is created (via HRIS integration or manual entry), the engine instantiates a workflow from the appropriate template (selected based on role, department, location, and employment type). Each task in the workflow has properties: assignee (new hire, manager, HR, IT, or auto), due date (relative to start date, e.g., "7 days before start"), required vs optional, dependencies (e.g., "equipment order" cannot start until "equipment preference" is complete).

The Integration Layer connects to external systems via a unified adapter framework. Adapters exist for identity providers (Okta, Azure AD) for account creation, IT asset management (JAMF, Intune) for device provisioning, LMS platforms (Cornerstone, Litmos) for training enrollment, e-signature services (DocuSign, HelloSign) for document signing, and HRIS systems for employee data sync. Each integration is modeled as a task in the workflow: when the task is triggered, the adapter executes the API call, handles errors/retries, and marks the task complete on success.

The Notification Engine sends context-aware communications across channels (email, Slack, SMS). Notifications are triggered by workflow events: task assignments, approaching deadlines, task completions, and escalations. A preference engine determines the best channel per recipient and batches notifications to avoid overwhelming users (maximum 3 notifications per day per person, with digest mode for lower-priority updates).

Core Components

Workflow Engine

The engine executes workflows as state machines. Each workflow instance tracks its tasks in a task table with statuses: pending, available (dependencies met), in_progress, completed, skipped, blocked. A scheduler runs every 30 seconds, evaluating all active workflows: it marks tasks as available when their dependencies are fulfilled, sends reminders for overdue tasks, and escalates to managers when tasks are more than 3 days overdue. The engine supports conditional branching: if the new hire is in California, add California-specific compliance tasks; if the role is engineering, add GitHub access provisioning. Conditions are expressed in a simple predicate language evaluated against the new hire's profile attributes.

Document Collection & E-Signature

The document collection flow uses a portal where new hires complete forms and upload documents. Tax forms (W-4) are rendered as interactive web forms; the data is validated client-side and server-side, then submitted to the payroll system. I-9 verification uses a two-step process: the employee completes Section 1 remotely, then the employer verifies identity documents on day one (in-person or via video verification for remote employees). Documents requiring signatures are routed through DocuSign or HelloSign via API; signed documents are stored in S3 with server-side encryption (AES-256) and tagged with retention policies (I-9: 3 years post-termination, W-4: 4 years). A compliance checker validates that all required documents are collected before the new hire's start date.

IT Provisioning Orchestrator

IT provisioning is the most integration-heavy component. When a new hire's onboarding workflow reaches the IT tasks, the orchestrator executes provisioning steps in dependency order: (1) Create identity (Okta/Azure AD) → (2) Create email (Google Workspace/Microsoft 365) → (3) Assign software licenses (Slack, Zoom, Jira) → (4) Grant access to internal systems (VPN, wiki, code repos) → (5) Order and ship equipment (via JAMF/Intune for MDM enrollment). Each step is an idempotent API call with retry logic (exponential backoff, max 5 retries). Failures are logged and trigger a fallback to manual IT ticket creation. A deprovision workflow (triggered by termination) reverses all provisioning steps within 24 hours.

Database Design

PostgreSQL schema: new_hires (hire_id UUID PK, company_id, employee_id, name, email, role, department, location, start_date, manager_id, buddy_id, status ENUM(pre_boarding, day_one, in_progress, completed), created_at). workflows (workflow_id, hire_id, template_id, status, started_at, completed_at). tasks (task_id, workflow_id, name, description, assignee_type ENUM(new_hire, manager, hr, it, auto), assignee_id, status, due_date, completed_at, dependency_task_ids ARRAY, integration_type nullable, integration_config JSONB). documents (doc_id, hire_id, doc_type ENUM(w4, i9, direct_deposit, nda, offer_letter), s3_path, signature_status, retention_until DATE, uploaded_at).

Indexes: (company_id, status) for company dashboards, (assignee_id, status) for "my pending tasks", (hire_id) for individual onboarding views. The integration_config JSONB on tasks stores adapter-specific parameters (e.g., for Okta provisioning: {"provider": "okta", "groups": ["engineering", "all-staff"], "apps": ["slack", "github"]}). A task_events table (event_id, task_id, event_type, actor_id, metadata JSONB, timestamp) provides an audit trail of all task state changes.

API Design

  • POST /api/v1/onboarding — Create an onboarding workflow for a new hire; body contains hire details and template_id; returns workflow_id
  • GET /api/v1/onboarding/{workflow_id}/tasks — Fetch all tasks in the onboarding workflow with statuses
  • PUT /api/v1/tasks/{task_id}/complete — Mark a task as complete; body contains completion data (uploaded document, form responses); triggers dependent task evaluation
  • GET /api/v1/dashboard?company_id={id}&status=in_progress — Fetch onboarding progress for all active new hires in a company

Scaling & Bottlenecks

The integration layer is the primary bottleneck due to third-party API rate limits and reliability. Okta's API allows 600 requests/minute; provisioning 1,000 new hires in a day requires 10,000 Okta API calls (10 operations per hire), consuming the entire rate limit. The system queues provisioning requests and processes them at a rate just below the API limit, using a token bucket rate limiter per integration per tenant. Provisioning for a single new hire may span several hours due to these rate limits. A priority queue ensures day-one hires are provisioned first.

The workflow scheduler evaluating all active workflows every 30 seconds is computationally bounded. With 500K active workflows × 40 tasks each = 20M task evaluations per cycle. The scheduler partitions workflows across 10 worker instances (by company_id hash), each evaluating 2M tasks in under 3 seconds using batch SQL queries (SELECT tasks WHERE status='pending' AND all dependencies in ('completed')). This avoids per-task database queries.

Key Trade-offs

  • Template-based workflows vs fully custom: Templates with conditional branching cover 90% of use cases with minimal configuration, while fully custom workflow builders add flexibility at the cost of usability — the template approach reduces HR administrator burden
  • Automated IT provisioning vs manual ticketing: Automation reduces onboarding time from 3 days to 30 minutes for IT setup, but requires maintaining integrations with 50+ third-party APIs that change frequently — the ROI on automation is clear at scale
  • Pre-boarding before day one vs all on day one: Pre-boarding (tax forms, equipment preferences) reduces day-one information overload but requires the new hire to engage before employment officially starts — most hires appreciate the smoother day-one experience
  • Centralized workflow engine vs per-task microservices: A centralized engine simplifies dependency management and state tracking but creates a single point of failure — mitigated with active-passive failover and persistent task state in PostgreSQL

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.