SYSTEM_DESIGN

System Design: Image Hosting Service (Imgur-style)

System design of an Imgur-style image hosting service covering image upload, thumbnail generation, CDN delivery, content moderation, and gallery curation for 300 million monthly visitors.

17 min readUpdated Jan 15, 2025
system-designimage-hostingcdnmedia-processing

Requirements

Functional Requirements:

  • Users upload images (JPEG, PNG, GIF, WebP) and short videos (MP4 under 60 seconds) anonymously or to an account
  • Automatic generation of multiple thumbnail sizes (small 160px, medium 320px, large 640px, original)
  • Shareable short links and direct image URLs with hotlink support
  • Gallery/album creation grouping multiple images with titles and descriptions
  • Community voting (upvote/downvote) and comments on public gallery posts
  • NSFW content detection and tagging using ML-based moderation

Non-Functional Requirements:

  • 300 million monthly visitors, 10 million uploads per day
  • Image serving latency under 100ms from CDN edge (p95)
  • 99.95% availability for image serving; uploads can tolerate brief degradation
  • Images must be stored durably with 99.999999999% (11 nines) durability
  • Support images up to 20MB; animated GIFs up to 200MB (converted to MP4)

Scale Estimation

10 million uploads/day = 116 uploads/sec. Average image size 2MB → 20TB/day of raw uploads. Thumbnail generation: 4 sizes per image = 40M thumbnails/day. Total storage after 5 years: 20TB/day × 1,825 days = 36.5PB (before compression and deduplication). Read traffic: 300M monthly visitors × 20 images/visit = 6B image serves/month = 2,315 images/sec average. CDN bandwidth: at 500KB average served size = 1.16GB/sec = 9.26 Gbps sustained, ~30 Gbps peak. Community engagement: 50M votes/day, 5M comments/day.

High-Level Architecture

The architecture separates the upload pipeline from the serving pipeline. The Upload Pipeline begins when a user selects an image: the client uploads directly to an S3-compatible object store via a pre-signed URL obtained from the Upload Service. This bypasses the application servers for large file transfers. Once S3 confirms the upload, an S3 event notification triggers a Processing Pipeline (AWS Lambda or a dedicated worker fleet). The pipeline (1) validates the file type and dimensions using libmagic and image header parsing (not just file extension), (2) strips EXIF metadata for privacy (GPS coordinates, camera serial numbers), (3) generates thumbnails at 4 sizes using libvips (chosen over ImageMagick for 5-10x better performance and lower memory usage), (4) converts animated GIFs to MP4 using FFmpeg (typically 90% file size reduction), and (5) runs NSFW classification using a pre-trained CNN model (ResNet-based, fine-tuned on labeled content).

The Serving Pipeline uses a multi-tier CDN strategy. When an image URL is requested (e.g., i.imghost.com/abc123.jpg), DNS routes to the nearest CDN edge node (CloudFront or Fastly). On cache hit (85% hit rate), the edge serves directly. On miss, the request goes to a regional origin shield (reduces origin load by collapsing duplicate requests), then to S3 origin. Image URLs are immutable — once uploaded, an image at a given URL never changes, enabling infinite CDN TTLs (Cache-Control: max-age=31536000, immutable). Deletion removes the S3 object and issues a CDN invalidation.

The Community Layer handles galleries, votes, and comments. Gallery posts are stored in PostgreSQL; vote counts use Redis sorted sets for real-time ranking on the front page (scored by a time-decayed upvote formula similar to Reddit's hot ranking). Comments use a threaded model stored in PostgreSQL with materialized path encoding for efficient tree retrieval.

Core Components

Image Processing Pipeline

The pipeline processes 116 images/sec using a fleet of worker containers. Each worker runs libvips for resizing (chosen for its streaming architecture that processes images without loading them entirely into memory — critical for 200MB GIFs). Thumbnail generation uses Lanczos3 resampling for quality. For animated GIFs, the pipeline first extracts frame count and duration; GIFs over 100 frames are converted to H.264 MP4 with a poster frame extracted as the static thumbnail. WebP output is generated alongside JPEG for browsers that support it (30% smaller file sizes). The pipeline uses a fan-out pattern: a single input message spawns 4 parallel thumbnail tasks plus the NSFW classification task, coordinated by an SQS-based workflow.

Content Moderation

The NSFW classifier is a ResNet-50 model fine-tuned on a labeled dataset of 10M images across categories (safe, suggestive, explicit). The model runs inference on GPU instances (g4dn.xlarge) with batched processing, achieving 200 images/sec per GPU. Images classified as explicit are auto-tagged NSFW and hidden behind interstitials. Edge cases (confidence between 0.4-0.8) are routed to a human moderation queue. A separate model detects illegal content (CSAM) using perceptual hashing (PhotoDNA) against a known-hash database; matches trigger immediate removal and reporting.

CDN & Hotlink Management

Hotlinking (embedding images on external sites) is a significant traffic cost driver. The system implements tiered hotlink policies: images in public galleries allow hotlinking (this drives traffic and ad impressions), while private/unlisted images use signed URLs with time-limited tokens. Bandwidth abuse is detected by monitoring per-image request rates; images exceeding 10K requests/hour from non-gallery contexts trigger rate limiting at the CDN edge via edge compute functions (CloudFront Functions). Origin shield regions (3 globally) collapse cache misses from hundreds of edge nodes into a single origin fetch.

Database Design

Image metadata is stored in PostgreSQL: images (image_id short-hash PK, uploader_id nullable, original_filename, content_type, width, height, file_size_bytes, s3_key, thumbnails JSONB, nsfw_score FLOAT, created_at, deleted_at nullable). The thumbnails JSONB field stores paths to each size variant: {"small": "s3://...", "medium": "s3://...", ...}. Albums use a separate table: albums (album_id, creator_id, title, description, cover_image_id, privacy ENUM, created_at) with a junction table album_images (album_id, image_id, position).

Community data: gallery_posts (post_id, image_id, title, description, section, upvotes INT, downvotes INT, score FLOAT, created_at) with indexes on (section, score DESC) for front-page queries. Votes are stored in a votes table (user_id, post_id, vote_type) with a unique constraint preventing double-voting. Redis sorted sets mirror the score for real-time ranking: ZADD section:hot score post_id. Comments use materialized paths: comments (comment_id, post_id, user_id, parent_id nullable, path VARCHAR e.g. '001.003.002', body, created_at).

API Design

  • POST /api/v1/upload — Request a pre-signed upload URL; returns upload_url and image_id (short hash)
  • GET /api/v1/image/{image_id} — Fetch image metadata including dimensions, thumbnails, and NSFW status
  • GET /api/v1/gallery/hot?section={section}&page={n} — Fetch hot gallery posts with pagination
  • POST /api/v1/gallery/{post_id}/vote — Submit upvote or downvote; body contains direction (up/down); idempotent

Scaling & Bottlenecks

The image processing pipeline is the primary compute bottleneck. At 116 uploads/sec with 5 processing tasks each, the system handles 580 tasks/sec. Auto-scaling the worker fleet based on SQS queue depth ensures processing latency stays under 30 seconds. The GIF-to-MP4 conversion is the slowest task (up to 60 seconds for large GIFs); these are routed to a separate high-memory worker pool to avoid blocking thumbnail generation. During traffic spikes (viral content), the upload rate can increase 10x; pre-provisioned capacity handles the burst while auto-scaling catches up.

CDN costs dominate the operational budget. At 30 Gbps peak, CDN egress is the largest expense. Optimization strategies include aggressive WebP serving (30% bandwidth reduction for supported browsers), quality-based compression (JPEG quality 82 provides optimal quality-to-size ratio), and tiered storage (images not accessed in 90 days move to S3 Infrequent Access, saving 40% on storage costs). Deduplication via perceptual hashing (pHash) detects re-uploads of identical images and serves the existing copy instead.

Key Trade-offs

  • libvips over ImageMagick: libvips uses 10x less memory and is 5x faster for common operations, but has a smaller ecosystem and fewer format plugins — acceptable since the platform only needs JPEG/PNG/GIF/WebP support
  • Pre-signed S3 uploads vs proxy through application servers: Direct-to-S3 eliminates the application server as a bandwidth bottleneck but requires careful security (signed URLs with short expiry, file type validation post-upload) and adds complexity for progress tracking
  • Immutable URLs vs mutable: Immutable URLs enable infinite CDN caching (massive cost savings) but mean edited images get new URLs, breaking existing embeds — the trade-off favors immutability given the platform's link-sharing model
  • Animated GIF to MP4 conversion vs serving raw GIFs: MP4 is 90% smaller and plays smoother, but breaks the "right-click save as GIF" user expectation — mitigated by offering both formats with MP4 as default and GIF as a download option

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.