INTERVIEW_QUESTIONS

Docker Interview Questions for Senior Engineers (2026)

Advanced Docker interview questions with detailed answer frameworks covering container internals, image optimization, networking, security, orchestration, and production-grade patterns used at companies like Google, Spotify, and Netflix.

20 min readUpdated Apr 21, 2026
interview-questionsdockersenior-engineercontainersdevops

Why Docker Expertise Matters in Senior Engineering Interviews

Docker fundamentally changed how software is built, shipped, and run. While Kubernetes gets the headlines for orchestration, Docker and its underlying container runtime technology remain the foundation that everything else builds on. Senior engineering candidates are expected to understand not just how to write a Dockerfile, but how containers actually work at the kernel level, how to optimize images for production, how to secure the container supply chain, and when containers are the wrong abstraction.

At companies like Google, Spotify, and Netflix, every engineer interacts with containers daily. The interview questions go far beyond basic syntax. Interviewers want to see that you can debug a container that behaves differently in production than it did in CI, optimize an image build pipeline that takes too long, secure a container runtime against privilege escalation, and design multi-stage builds that produce minimal, auditable production images.

This guide covers fifteen questions that test deep Docker knowledge. Each includes the hidden intent, a structured answer framework, and follow-up questions you should anticipate. For foundational reading, start with how Docker works and explore Docker vs Podman for a modern perspective on container runtimes. For broader interview preparation, see our system design interview guide and explore learning paths.


1. Explain how Linux containers work under the hood — namespaces, cgroups, and union filesystems.

What the interviewer is really asking: Do you understand the kernel primitives that make containers possible, or do you just use Docker as a black box?

Answer framework:

Start by establishing that containers are not virtual machines. A container is a regular Linux process (or group of processes) that has a restricted view of the system. This restriction comes from three kernel features working together: namespaces, cgroups, and union filesystems.

Namespaces provide isolation. Each namespace type isolates a different aspect of the system. PID namespace gives the container its own process tree — PID 1 inside the container is just a regular process on the host with a different PID. Network namespace gives the container its own network stack — its own interfaces, routing table, iptables rules, and port space. Mount namespace gives the container its own filesystem view. UTS namespace isolates hostname and domain name. IPC namespace isolates System V IPC and POSIX message queues. User namespace maps UIDs inside the container to different UIDs on the host — root (UID 0) inside the container can map to an unprivileged user (say UID 100000) on the host, which is the foundation of rootless containers.

Cgroups (control groups) provide resource management. Cgroups v2 (the current standard, unified hierarchy) organizes processes into a tree of groups and applies resource controllers. The CPU controller limits CPU time using CFS bandwidth throttling (cpu.max sets the quota within a period) or CPU weight for proportional sharing (cpu.weight). The memory controller limits memory usage (memory.max) and tracks current consumption (memory.current). When a process exceeds its memory limit, the kernel OOM-kills it. The I/O controller limits block device throughput and IOPS. The PID controller limits the number of processes, preventing fork bombs.

Union filesystems provide efficient image layering. OverlayFS (the default storage driver) combines multiple directory layers into a single merged view. The image layers are read-only (the lowerdir). A thin writable layer (the upperdir) captures all filesystem changes made by the running container. Copy-on-write means that reading a file in a lower layer is zero-cost (it is accessed directly), but modifying a file first copies it to the upper layer. This is why large file modifications inside containers are expensive — the entire file is copied even if you change one byte.

Connect this to Docker specifically. When you run docker run, the Docker daemon calls containerd, which calls runc (the OCI runtime). Runc is a small binary that creates the namespaces, configures the cgroups, sets up the root filesystem using the union mount, and exec's the container entrypoint process. Once the container is running, runc exits — containerd and its shim process manage the container lifecycle from that point.

This understanding is critical for debugging. If a container behaves differently than expected, knowing whether it is a namespace issue (the container can see something it should not), a cgroup issue (the container is being throttled or killed), or a filesystem issue (copy-on-write overhead) directly informs your investigation.

Follow-up questions:

  • What is the difference between cgroups v1 and v2, and why does it matter?
  • How do rootless containers use user namespaces to run without root privileges on the host?
  • What happens when you run docker exec — which namespaces does the new process join?

2. How do you design a multi-stage Dockerfile for a production application?

What the interviewer is really asking: Can you build optimized, secure, and reproducible container images that follow production best practices?

Answer framework:

Start with the problem multi-stage builds solve. A single-stage build includes build tools, compilers, development dependencies, and source code in the final image. This bloats the image size (Go build images can be over 1GB versus 10MB for the final binary) and expands the attack surface (every package in the image is a potential vulnerability).

A multi-stage build separates build-time and runtime concerns. The first stage (the builder) has everything needed to compile the application: the compiler, build tools, test frameworks, and all dependencies. The final stage starts from a minimal base and copies only the compiled artifacts. Nothing from the builder stage appears in the final image unless explicitly copied.

Walk through a concrete example for a Go application. The builder stage uses the official Go image, copies go.mod and go.sum first (to cache dependency downloads), runs go mod download, then copies the source code and runs go build with appropriate flags (-ldflags to strip debug symbols, CGO_ENABLED=0 for static linking). The final stage uses distroless/static or scratch, copies the binary from the builder, sets a non-root USER, and defines the ENTRYPOINT.

For interpreted languages like Python or Node.js, the pattern differs. The builder stage installs dependencies, runs any build steps (webpack, compilation of native extensions), and produces a virtual environment or node_modules directory. The final stage uses a slim runtime image, copies the installed dependencies and application code, but excludes build tools (gcc, make, python3-dev).

Discuss layer caching strategy. Docker caches each layer based on the instruction and the files involved. Order your Dockerfile instructions from least-frequently-changing to most-frequently-changing. Copy dependency manifests and install dependencies before copying source code. This way, changing application code does not invalidate the dependency installation cache.

Address security hardening in the final image. Run as a non-root user: create a user in the Dockerfile and use the USER directive. Use a read-only filesystem in production. Do not install a shell unless absolutely necessary (distroless images have no shell). Set HEALTHCHECK to define a health check within the image specification. Use .dockerignore to exclude git directories, test files, documentation, and secrets from the build context.

Discuss reproducibility. Pin base image versions using SHA256 digests rather than tags (tags are mutable). Pin dependency versions in lock files. Use BuildKit's --mount=type=cache for persistent build caches that survive image rebuilds. Use --mount=type=secret to pass secrets during build without embedding them in any layer.

For more on container technology decisions, see Docker vs Podman and how Docker works.

Follow-up questions:

  • How does BuildKit differ from the legacy Docker build engine, and what features does it enable?
  • How would you debug a multi-stage build when the final image is missing a file or library?
  • What is the difference between COPY --from and using named stages?

3. How does Docker networking work, and what are the trade-offs between network modes?

What the interviewer is really asking: Can you reason about container networking from first principles and troubleshoot connectivity issues?

Answer framework:

Start with the default bridge network. When Docker starts, it creates a Linux bridge (docker0) on the host. Each container gets a veth pair: one end in the container's network namespace, the other attached to the docker0 bridge. Containers on the same bridge can communicate via IP address. The Docker daemon runs an embedded DNS server that resolves container names to IP addresses, but only for user-defined bridge networks — the default bridge network does not support DNS resolution by container name.

Explain user-defined bridge networks, which are the recommended approach. They provide automatic DNS resolution between containers by name, better isolation (containers on different networks cannot communicate), and configurable subnets, gateways, and IP ranges. Containers can be attached to multiple user-defined networks, enabling controlled connectivity between services.

Host network mode eliminates the network namespace entirely. The container shares the host's network stack, using the host's IP address and port space directly. This removes the overhead of the veth pair, NAT, and bridge forwarding. The performance benefit is measurable: reduced latency (typically 5-20 microseconds less per packet) and higher throughput. Use host networking for latency-sensitive applications or when the container needs to bind to many ports. The trade-off is lost isolation — port conflicts are possible, and the container can see all host network traffic.

None network mode creates a container with a loopback interface only. Use this for batch jobs that do not need network access, reducing the attack surface.

Macvlan network mode assigns a MAC address to the container, making it appear as a physical device on the network. The container gets an IP address from the physical network's DHCP server or a static assignment. This is useful for legacy applications that expect to be directly addressable on the LAN or for workloads that need to be reachable from outside without port mapping.

Overlay networks span multiple Docker hosts. They use VXLAN encapsulation to create a virtual L2 network across hosts. Each packet is encapsulated with an outer UDP header containing the VXLAN Network Identifier (VNI). This is the networking model used by Docker Swarm for multi-host communication. The overhead is approximately 50 bytes per packet for the encapsulation header.

Discuss port mapping. The -p flag creates iptables DNAT rules that translate connections to the host port into connections to the container IP and port. Docker adds these rules to the DOCKER chain in the nat table. Understanding this is critical for debugging — if iptables is flushed or another tool conflicts with Docker's rules, port mapping breaks silently.

Address DNS resolution in detail. Docker's embedded DNS server (at 127.0.0.11 inside the container) resolves container names and service names. It forwards unknown queries to the host's configured DNS servers. In complex environments, DNS resolution issues are a common source of container connectivity problems.

Follow-up questions:

  • How would you troubleshoot a container that can reach the internet but cannot resolve DNS names?
  • What are the security implications of using host network mode?
  • How does Docker's iptables integration interact with a host firewall like firewalld or ufw?

4. How do you optimize Docker image size and build time for a CI/CD pipeline?

What the interviewer is really asking: Can you reduce feedback loop time and infrastructure cost through systematic build optimization?

Answer framework:

Start with why image size and build time matter. Large images increase pull time (critical in autoscaling scenarios where new instances must start quickly), consume more registry storage, and contain more packages that could harbor vulnerabilities. Slow builds reduce developer productivity and increase CI/CD costs.

For image size optimization, work from the bottom up. Choose the right base image: alpine (5MB) versus debian-slim (80MB) versus ubuntu (77MB) versus distroless (2-20MB depending on variant) versus scratch (0MB). Each choice has trade-offs — Alpine uses musl libc instead of glibc, which can cause subtle compatibility issues with some applications. Distroless images have no package manager or shell, which improves security but complicates debugging.

Minimize layers. Each RUN, COPY, and ADD instruction creates a layer. Combine related commands in a single RUN instruction using && to chain them. Clean up in the same layer — removing temporary files in a separate RUN instruction does not reduce image size because the files still exist in the previous layer. For example: RUN apt-get update && apt-get install -y --no-install-recommends package && rm -rf /var/lib/apt/lists/.

Use multi-stage builds (as discussed in question 2) to exclude build tools from the final image. Use .dockerignore to prevent large or unnecessary files from entering the build context — this also speeds up the context transfer to the daemon.

For build time optimization, focus on layer caching. Structure the Dockerfile so that frequently changing content appears later. Copy dependency manifests first, install dependencies, then copy source code. This way, source code changes do not invalidate the expensive dependency installation cache.

Use BuildKit's cache mounts for package managers. --mount=type=cache,target=/root/.cache/pip persists the pip cache across builds, so unchanged packages are not re-downloaded. Similarly for npm (--mount=type=cache,target=/root/.npm), Maven (--mount=type=cache,target=/root/.m2), and Go modules (--mount=type=cache,target=/root/go/pkg/mod).

Implement remote caching for CI environments. BuildKit can push cache layers to a registry and pull them in subsequent builds using --cache-from and --cache-to. This means CI runners that start with empty caches can still benefit from layer caching. Use inline cache mode (--cache-to type=inline) for simple cases or registry cache mode (--cache-to type=registry) for more control.

Parallelize multi-stage builds. BuildKit automatically parallelizes stages that do not depend on each other. Structure your Dockerfile with independent stages (frontend build, backend build, test runner) that run concurrently.

Consider buildx bake for complex build pipelines. It defines build targets in an HCL or JSON file, specifying dependencies and parallelism explicitly. This is useful when building multiple images from the same repository.

Follow-up questions:

  • How does BuildKit's cache differ from the legacy builder's cache?
  • What is the impact of the build context size on build performance?
  • How would you handle a CI pipeline where builds must be reproducible but also fast?

5. What is the difference between CMD, ENTRYPOINT, and how does the interaction between them affect container behavior?

What the interviewer is really asking: Do you understand the subtleties of container startup configuration that affect operational behavior?

Answer framework:

Start with the fundamental distinction. ENTRYPOINT defines the executable that runs when the container starts. CMD provides default arguments to the ENTRYPOINT. When a user passes arguments to docker run, those arguments replace CMD but not ENTRYPOINT. This design enables containers that behave like executables (with configurable arguments) versus containers that behave like environments (with a default command).

Explain the two forms. Shell form (CMD command arg1 arg2) wraps the command in /bin/sh -c, which means the command runs as a child process of the shell. This has implications: the shell process is PID 1, not your application, so SIGTERM is sent to the shell, which may not forward it to the application. Exec form (CMD ["command", "arg1", "arg2"]) runs the command directly as PID 1. Always use exec form for ENTRYPOINT so your application receives signals directly.

Walk through the interaction matrix. If only CMD is set: the CMD command runs, and docker run arguments replace it entirely. If only ENTRYPOINT is set: the ENTRYPOINT runs with docker run arguments appended. If both are set: ENTRYPOINT is the executable, CMD provides default arguments, and docker run arguments replace CMD. This is the recommended pattern for production images.

Provide a concrete example. For an HTTP server image: ENTRYPOINT ["./server"] and CMD ["--port", "8080", "--workers", "4"]. Running docker run myimage starts the server with default configuration. Running docker run myimage --port 9090 overrides only the port. Running docker run --entrypoint /bin/sh myimage gives a shell for debugging, overriding the entrypoint entirely.

Discuss the PID 1 problem in depth. In Linux, PID 1 has special responsibilities: it must reap zombie processes (orphaned child processes whose parent has exited). If your application is PID 1 and it does not handle SIGCHLD, zombie processes accumulate. If you use shell form and /bin/sh is PID 1, the shell typically does reap zombies, but it may not forward signals properly.

Solutions to the PID 1 problem: use exec form so your application is PID 1 and handles signals properly, use tini (a tiny init system designed for containers) as the ENTRYPOINT that forwards signals and reaps zombies, or use Docker's --init flag which injects tini automatically. In Kubernetes, the pause container in each pod handles this role.

Discuss how ENTRYPOINT scripts work in practice. Many production images use an entrypoint.sh script that performs initialization (waiting for database availability, running migrations, generating configuration from environment variables) and then exec's the main process. The exec replaces the shell process with the application process, ensuring the application becomes PID 1 and receives signals directly.

Follow-up questions:

  • How does the STOPSIGNAL instruction interact with ENTRYPOINT?
  • What happens when docker stop sends SIGTERM and the process does not exit within the grace period?
  • How would you debug a container that does not shut down gracefully?

6. How does Docker handle storage, and what are the performance implications of different storage drivers?

What the interviewer is really asking: Do you understand the I/O performance characteristics that affect containerized workloads, especially databases and high-throughput applications?

Answer framework:

Start with the layered storage model. A Docker image consists of read-only layers stacked on top of each other. When a container is created, a thin read-write layer is added on top. All filesystem writes go to this writable layer using copy-on-write (COW) semantics. Reading a file traverses layers from top to bottom until the file is found.

Explain the primary storage driver: overlay2. OverlayFS uses two directories: lowerdir (read-only image layers, merged into a single view) and upperdir (writable container layer). When a container modifies a file from a lower layer, the file is first copied to the upper layer (copy-up), then modified. This copy-up operation is the main performance cost of overlay2. It is a one-time cost per file — subsequent modifications to the same file are direct writes to the upper layer. However, for large files (database data files, log files), this copy-up can cause significant latency spikes.

Discuss performance implications. Sequential reads from lower layers are fast because OverlayFS reads directly from the underlying filesystem. Random reads across many small files may be slower due to directory lookup overhead across layers. Writes to new files (not from lower layers) go directly to the upper layer with no copy-up cost. Write-heavy workloads on existing files from lower layers suffer from copy-up overhead.

For high-performance storage needs, Docker volumes bypass the storage driver entirely. A volume is a directory on the host filesystem (or a network-attached storage device) that is bind-mounted into the container. Reads and writes go directly to the underlying filesystem with no overlay or copy-on-write overhead. This is why databases, message queues, and any write-heavy workload must use volumes rather than writing to the container's writable layer.

Explain volume types. Named volumes are managed by Docker and stored in /var/lib/docker/volumes/. Bind mounts map a specific host directory into the container. tmpfs mounts store data in memory, useful for sensitive data that should not persist to disk (application secrets, session files). Volume drivers enable network-attached storage: NFS, Amazon EBS, Azure Disk, and proprietary storage systems.

Discuss the writable layer and its limitations. The writable layer grows with every write and is deleted when the container is removed. Docker does not support layer-level quotas with overlay2 (unlike devicemapper which supported per-container storage limits). A runaway container writing to its writable layer can fill the host disk. Use --storage-opt size=10G (available with some storage drivers) or monitor container disk usage with docker system df.

Address build cache storage. BuildKit stores build caches separately from image layers. The build cache can grow large over time. Use docker builder prune to clean it. In CI environments, configure garbage collection policies to prevent disk exhaustion.

For understanding how storage decisions affect container orchestration, CSI (Container Storage Interface) drivers in Kubernetes provide the abstraction layer between storage vendors and the orchestrator.

Follow-up questions:

  • How would you diagnose I/O performance issues in a containerized application?
  • What is the difference between a bind mount and a named volume, and when would you choose each?
  • How does container storage work differently on macOS versus Linux (Docker Desktop uses a VM)?

7. Explain Docker security — what attack vectors exist and how do you mitigate them?

What the interviewer is really asking: Can you reason about container security holistically, from image to runtime to host?

Answer framework:

Start with the threat model. Container security operates at four layers: the image (what code and dependencies are packaged), the runtime (how the container process is isolated), the host (the kernel and system services shared by all containers), and the network (how containers communicate).

Image security: supply chain attacks. Base images from public registries may contain malware or known vulnerabilities. Pin base images by digest. Scan images with Trivy, Grype, or Snyk. Sign images with Cosign and verify signatures before deployment. Generate SBOMs to track all components. Use minimal base images (distroless, Alpine, scratch) to reduce the vulnerability surface. Never include secrets, credentials, or private keys in image layers — they persist in the layer history even if deleted in a later layer. Use BuildKit's --mount=type=secret for build-time secrets.

Runtime security: container escape. By default, Docker containers run as root inside the container, and this maps to root on the host (unless user namespace remapping is enabled). A container escape vulnerability (like CVE-2019-5736 in runc) allows root-in-container to become root-on-host. Mitigations: run containers as non-root users (USER directive in Dockerfile), enable user namespace remapping (--userns-remap), drop all Linux capabilities and add back only what is needed (--cap-drop=ALL --cap-add=NET_BIND_SERVICE), apply seccomp profiles that restrict system calls, use AppArmor or SELinux for mandatory access control, set --security-opt=no-new-privileges to prevent privilege escalation via setuid binaries.

Host security: kernel sharing. All containers share the host kernel. A kernel vulnerability affects every container. Keep the host kernel updated. Use a minimal host OS designed for containers (Flatcar Container Linux, Bottlerocket, Talos). Do not run unnecessary services on the host. Restrict access to the Docker socket (/var/run/docker.sock) — mounting it into a container gives that container full control over the Docker daemon, which is equivalent to root access on the host.

Network security: lateral movement. By default, all containers on the same Docker network can communicate. Use user-defined bridge networks to segment containers. Apply iptables rules or Docker network policies to restrict traffic. For sensitive workloads, use encrypted overlay networks (--opt encrypted for Docker Swarm overlay networks). Never expose the Docker daemon API (port 2375/2376) to the network without TLS and certificate authentication.

Discuss rootless Docker. Running the Docker daemon as a non-root user eliminates the largest class of container escape attacks. Rootless Docker uses user namespaces, unprivileged overlayfs, and slirp4netns for networking. Trade-offs: slightly reduced networking performance, inability to bind to privileged ports (below 1024), and some storage driver limitations.

Address Docker vs Podman from a security perspective. Podman is daemonless — there is no long-running root process. Each container runs as a direct child of the user's session. This eliminates the Docker daemon as a single point of compromise. Podman also defaults to rootless mode, making it inherently more secure for many use cases.

Follow-up questions:

  • How do seccomp profiles work, and how would you create a custom profile for a specific application?
  • What is the attack surface of the Docker socket, and how do CI/CD systems mitigate it?
  • How does gVisor or Kata Containers provide stronger isolation than standard containers?

8. How do you handle logging in containerized applications?

What the interviewer is really asking: Can you design a logging architecture that works with the ephemeral nature of containers and scales to production requirements?

Answer framework:

Start with the fundamental principle: containers should write logs to stdout and stderr. The container runtime (containerd, CRI-O) captures these streams and writes them to files on the host. This decouples the application from the logging infrastructure — the application does not need to know where logs are stored or how they are shipped.

Explain the Docker logging driver architecture. Docker supports pluggable logging drivers that determine how container logs are processed. The json-file driver (default) writes JSON-formatted log entries to /var/lib/docker/containers//-json.log. The local driver is more efficient (compressed, with rotation built in). The journald driver sends logs to systemd journal. The syslog driver sends logs via syslog protocol. The fluentd driver sends logs to a Fluentd collector. The awslogs, gcplogs, and splunk drivers send logs directly to cloud services.

Discuss the critical trade-off: blocking versus non-blocking log delivery. By default, logging drivers are blocking — if the logging destination is slow or unavailable, the application process blocks on stdout writes. This can cause your application to hang because of a logging infrastructure problem. Set the log driver to non-blocking mode (--log-opt mode=non-blocking --log-opt max-buffer-size=4m), which buffers log entries in a ring buffer and drops the oldest entries when the buffer is full. This trades log completeness for application availability — the correct trade-off for most production services.

For production logging architecture, describe the DaemonSet pattern used in Kubernetes (but applicable to any container platform). A log collection agent (Fluent Bit, the OpenTelemetry Collector, or Vector) runs on every host, reads log files from all containers, enriches them with metadata (container name, image, labels), and ships them to a centralized store. Fluent Bit is preferred over Fluentd for the collection tier because it uses significantly less memory (typically 15MB versus 60MB per node).

Discuss structured logging. Applications should emit structured logs (JSON) rather than unstructured text. Structured logs enable efficient parsing, filtering, and querying without regex-based extraction. Include correlation IDs (trace IDs from distributed tracing), timestamps in RFC 3339 format, log levels, and relevant context fields. At scale, structured logging reduces storage costs by enabling column-oriented compression and targeted queries.

Address log storage trade-offs. Elasticsearch provides full-text search and complex aggregations but is expensive to operate at scale (memory-intensive JVM, disk-heavy indexing). Grafana Loki indexes only labels (namespace, pod, container, level) and stores log content in object storage, making it 10-100x cheaper for log storage but slower for full-text searches. Cloud logging services (CloudWatch Logs, Google Cloud Logging, Azure Monitor) eliminate operational overhead but introduce vendor lock-in and can become expensive at high volume.

Cover log rotation. Without rotation, container logs can fill the host disk. Configure max-size and max-file for the json-file driver (--log-opt max-size=10m --log-opt max-file=3). In Kubernetes, the kubelet manages log rotation based on containerLogMaxSize and containerLogMaxFiles configuration.

Follow-up questions:

  • How do you correlate logs from multiple containers involved in a single request?
  • What happens to logs when a container crashes before the logging agent collects them?
  • How would you handle multi-line log entries (stack traces) in a log collection pipeline?

9. How does Docker Compose work and when is it appropriate for production use?

What the interviewer is really asking: Can you distinguish between development tooling and production infrastructure, and do you understand the limitations of simpler orchestration?

Answer framework:

Start with what Docker Compose is: a tool for defining and running multi-container applications using a declarative YAML file. You define services (containers), networks, volumes, and their relationships in a docker-compose.yml (or compose.yaml) file. The docker compose up command creates all defined resources and starts the containers.

Explain the runtime behavior. Compose creates a dedicated Docker network for the project (named after the directory by default). All services are attached to this network and can resolve each other by service name via Docker's embedded DNS. Compose manages the lifecycle: starting, stopping, rebuilding, and scaling services as a group. It tracks state using container labels that associate containers with the Compose project.

Discuss Compose's strengths. For local development, it is unmatched. A single file defines the entire application stack: the application, database, cache, message queue, and any supporting services. New developers clone the repo, run docker compose up, and have a working environment in minutes. Environment variable substitution (${VAR:-default}) and profiles (--profile debug) allow customization without modifying the file. The watch feature (docker compose watch) enables live code reloading by syncing source code changes into running containers.

For testing and CI/CD, Compose excels at creating isolated, reproducible environments. Run integration tests against a Compose stack that mirrors production topology. Use depends_on with condition: service_healthy to ensure services start in the correct order and are ready before tests begin.

Now discuss the limitations that make Compose inappropriate for most production use cases. Single-host only: Compose does not distribute containers across multiple machines, so there is no resilience to host failure. No rolling updates: docker compose up --detach recreates containers, causing brief downtime for each service. No health-based traffic management: Compose does not remove unhealthy containers from load balancing. No autoscaling: Compose does not dynamically adjust replica counts. Limited secrets management: Compose supports Docker secrets but the implementation is basic compared to production solutions.

However, be nuanced. Compose is appropriate for production in specific scenarios: single-server applications where the traffic and availability requirements are modest, internal tools that do not require high availability, and small projects where Kubernetes or Docker Swarm would be overengineering. The key question is: can you tolerate the downtime of a single-host failure and manual recovery?

For production workloads that need orchestration, discuss the graduation path. Compose defines the application topology. That same understanding of services, dependencies, and configuration translates directly to Kubernetes manifests or Helm charts. Some tools (Kompose) even convert docker-compose.yml files to Kubernetes resources, though the output usually requires significant refinement. For understanding the full orchestration landscape, see how container orchestration works and Kubernetes vs Docker Swarm.

Follow-up questions:

  • How does Docker Compose handle service dependencies and startup ordering?
  • What is the difference between Docker Compose v1 (docker-compose) and v2 (docker compose)?
  • How would you use Compose profiles to manage different environment configurations?

10. How do you debug a containerized application that works locally but fails in production?

What the interviewer is really asking: Can you systematically troubleshoot containers in environments where you may have limited access and cannot simply attach a debugger?

Answer framework:

Start with a systematic debugging methodology. Container behavior differences between environments are caused by a finite set of factors: different base images or dependencies, different configuration (environment variables, config files, mounted secrets), different resource constraints (CPU limits, memory limits), different network topology (DNS resolution, service discovery, firewall rules), and different storage (volume mounts, filesystem permissions).

Step one: reproduce the failure. Get the exact image digest running in production (not just the tag, which may have been overwritten). Pull that specific image. Gather the runtime configuration: environment variables, volume mounts, resource limits, network settings. Reconstruct the production environment as closely as possible locally. If the failure reproduces, you have a local debugging target.

Step two: examine container state. docker inspect reveals the full container configuration: mounts, network settings, environment variables, resource constraints, health check status, and restart count. docker logs shows stdout and stderr output — check for error messages, stack traces, and unexpected behavior during startup. docker stats shows real-time CPU, memory, network, and block I/O usage — compare against resource limits to identify throttling or OOM conditions.

Step three: get inside the container. docker exec -it /bin/sh (or /bin/bash) gives you a shell inside the running container. From there, inspect the filesystem (are config files correct?), check network connectivity (can the container reach its dependencies?), examine processes (is the application running as expected?), and review resource usage (top, free -h, df -h). If the image has no shell (distroless), use docker debug (introduced in Docker Desktop 4.27) or build an ephemeral debug image that shares the same namespaces.

Step four: check the platform layer. In Kubernetes, kubectl describe pod reveals events (scheduling failures, image pull errors, probe failures, OOM kills). kubectl logs --previous shows logs from the previous container instance (useful for CrashLoopBackOff). kubectl top pod shows actual resource usage. kubectl get events --field-selector involvedObject.name= shows the full event history.

Step five: compare configurations. Use docker diff to see filesystem changes in a running container. Compare environment variables between local and production. Check for differences in mounted volumes and secrets. Verify DNS resolution: nslookup or dig from inside the container to confirm service discovery is working correctly.

Discuss the most common root causes. Configuration drift: environment variables or config files differ between environments. Resource constraints: the application works without CPU or memory limits locally but gets throttled or OOM-killed in production. DNS resolution: service names that resolve locally do not resolve in the production network. Filesystem permissions: the application runs as root locally but as a non-root user in production, and file permissions prevent access. Time zone differences: the container uses UTC but the application expects local time.

Share a war story pattern. At Spotify, engineers have developed internal debugging toolkits that include ephemeral debug containers, preconfigured with diagnostic tools, that can be attached to any running pod without modifying the original container image.

Follow-up questions:

  • How would you capture a heap dump from a Java application running in a container?
  • What tools would you use to trace system calls made by a containerized process?
  • How do you debug networking issues when the container has no diagnostic tools installed?

11. How does Docker handle image layer caching, and how does it affect reproducibility?

What the interviewer is really asking: Do you understand the caching semantics well enough to optimize builds without sacrificing correctness?

Answer framework:

Start with the caching rules. The legacy builder uses a straightforward algorithm: for each instruction, Docker checks if an existing layer was produced by the same instruction with the same inputs. For RUN instructions, the cache key is the instruction string itself — Docker does not execute the command to check if the output has changed. This means RUN apt-get update && apt-get install -y curl produces a cached layer even if new package versions are available. For COPY and ADD, Docker checksums the source files and invalidates the cache if any file has changed.

Once a cache miss occurs, all subsequent instructions are rebuilt. This is why instruction ordering matters so much: putting frequently-changing instructions (like COPY . .) early in the Dockerfile invalidates the cache for all following layers, including expensive operations like dependency installation.

Explain BuildKit's improvements. BuildKit uses a content-based cache rather than instruction-based. It builds a dependency graph of all instructions and only rebuilds what is actually affected by changes. BuildKit can also skip building stages that are not needed for the final output, and it parallelizes independent stages automatically.

BuildKit introduces cache mounts (--mount=type=cache) that persist across builds. Unlike layer caching, which stores the entire layer state, cache mounts persist a specific directory. This is ideal for package manager caches: the pip download cache, the npm cache, the Maven local repository. Cache mounts are not included in the final image, so they do not affect image size.

Discuss the reproducibility tension. Layer caching improves build speed but can mask reproducibility issues. A cached RUN apt-get install might produce a different result if re-run from scratch (different package versions). Strategies for reproducibility: pin package versions explicitly (apt-get install -y curl=7.88.1-10), use lock files for language-specific dependencies (package-lock.json, poetry.lock, go.sum), pin base images by digest rather than tag, and run periodic cache-busted builds in CI to verify that a clean build still works.

Cover remote caching for CI/CD environments. CI runners typically start with empty caches. BuildKit can export cache to a registry (--cache-to type=registry,ref=registry.example.com/myimage:cache) and import it in subsequent builds (--cache-from type=registry,ref=registry.example.com/myimage:cache). This makes CI builds almost as fast as local builds with warm caches. The trade-off is the time spent pushing and pulling cache layers, which is justified only when the cached layers are expensive to rebuild.

Discuss cache invalidation strategies. Use ARG instructions to create cache-busting keys when needed (ARG CACHEBUST=1, then reference it before a RUN instruction). Use --no-cache to rebuild from scratch. Use --build-arg to inject build-time variables that naturally change (commit SHA, build number) without invalidating unrelated layers — place the ARG that references them after the expensive layers.

Follow-up questions:

  • How does the cache behave when building for multiple platforms (docker buildx build --platform linux/amd64,linux/arm64)?
  • What is the difference between inline cache and registry cache in BuildKit?
  • How would you set up a shared build cache for a team of developers?

12. How do you implement health checks in Docker, and how do they interact with orchestrators?

What the interviewer is really asking: Can you design health checking strategies that enable self-healing infrastructure without causing cascading failures?

Answer framework:

Start with Docker's HEALTHCHECK instruction. It defines a command that Docker runs periodically to determine if a container is healthy. The command returns 0 for healthy, 1 for unhealthy. Docker tracks three states: starting (within the start-period), healthy (health check passing), and unhealthy (health check failing for the configured retries). Configuration parameters: --interval (time between checks, default 30s), --timeout (maximum time for a check to complete, default 30s), --start-period (grace period for startup, default 0s), --retries (consecutive failures before marking unhealthy, default 3).

Design effective health checks. A good health check verifies that the application can serve its purpose, not just that the process is running. For a web server: HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1. But this requires curl to be installed in the image. For minimal images: write a dedicated health check binary or use the application's built-in health endpoint with wget or a language-specific HTTP client.

The health check should verify downstream dependencies conditionally. An HTTP health endpoint should return 200 if the application can serve requests. It should report dependency status (database connectivity, cache availability) in the response body for observability but should not fail the health check for non-critical dependency failures. If the database is down and the application cannot serve any requests, then yes, the health check should fail. But if a non-essential cache is temporarily unavailable and the application can serve requests (just more slowly), the health check should still pass.

Explain the interaction with orchestrators. In Docker Compose, depends_on with condition: service_healthy ensures a dependent service does not start until the dependency is healthy. In Docker Swarm, unhealthy containers are stopped and replaced. In Kubernetes, health checks are implemented differently: liveness probes determine if the container should be restarted, readiness probes determine if the container should receive traffic, and startup probes give slow-starting applications time to initialize.

Discuss the danger of health checks that are too aggressive. Setting a short interval and timeout with few retries can cause healthy containers to be killed during transient load spikes (when the health check endpoint is slow to respond). This leads to cascading failures: containers are killed, traffic is redistributed to fewer containers, those containers become overloaded, their health checks fail, and more containers are killed. Configure intervals and timeouts based on your application's actual performance characteristics under load.

Address health check design patterns. The graduated health check: a lightweight check runs frequently (is the process listening on the port?) while a deeper check runs less frequently (can the application execute a test query against the database?). The dependency-aware health check: report overall health as a composite of component health checks, with clear degraded states between fully healthy and unhealthy. The startup-aware health check: use a startup probe or start-period that accommodates the slowest reasonable startup time, including initial cache warming, migration checks, and leader election.

For understanding how health checks drive orchestration behavior at scale, see how Kubernetes works and explore Netflix's architecture for health checking patterns at massive scale.

Follow-up questions:

  • How would you implement a health check for a WebSocket server that does not serve HTTP?
  • What is the difference between a liveness probe and a readiness probe in Kubernetes, and why do you need both?
  • How can overly aggressive health checks cause cascading failures in a microservices architecture?

13. Explain Docker content trust and image signing — how do you verify image integrity?

What the interviewer is really asking: Do you understand software supply chain security and can you implement image verification in a CI/CD pipeline?

Answer framework:

Start with the problem. When you pull an image from a registry, how do you know it has not been tampered with? The registry could be compromised. The network could be intercepted (even with TLS, a compromised CA is possible). The tag could have been overwritten with a different image. Without verification, you are trusting the registry to serve the correct image — and in a defense-in-depth security model, you should not trust any single component.

Explain Docker Content Trust (DCT), which is based on The Update Framework (TUF) and Notary. When DCT is enabled (DOCKER_CONTENT_TRUST=1), Docker signs images during push and verifies signatures during pull. The publisher generates a signing key, signs the image manifest, and uploads the signature to a Notary server. When a consumer pulls the image, Docker verifies the signature against the publisher's public key before proceeding.

DCT uses two types of keys. The root key (offline key) is the ultimate trust anchor. It should be stored in a hardware security module (HSM) or air-gapped machine. The repository key (signing key) is used for day-to-day signing. It can be delegated to CI/CD systems for automated signing. Key rotation is supported: if a signing key is compromised, the root key can revoke it and issue a new one.

Discuss the modern alternative: Sigstore and Cosign. Cosign is simpler than DCT, supports keyless signing (using OIDC identity providers for ephemeral keys), and stores signatures as OCI artifacts alongside the image in the registry (no separate Notary server needed). Keyless signing works by: the signer authenticates with an OIDC provider (Google, GitHub, Microsoft), Sigstore's Fulcio CA issues a short-lived certificate binding the OIDC identity to a signing key, the image is signed with this ephemeral key, and the signing event is recorded in Rekor (a transparency log).

For production implementation, describe an end-to-end pipeline. CI/CD builds the image. Vulnerability scanning runs (Trivy, Grype). If the scan passes policy, the image is signed with Cosign using the CI system's OIDC identity (GitHub Actions workload identity, GitLab CI OIDC token). The signature is pushed to the registry. An admission controller in the Kubernetes cluster (Sigstore Policy Controller, Kyverno with Cosign verification, or OPA Gatekeeper with a Cosign plugin) verifies the signature before allowing the pod to run. Unsigned or incorrectly signed images are rejected.

Address SBOM attestations. Beyond signing, Cosign supports attaching attestations to images: SBOM attestations (what components are in the image), SLSA provenance attestations (where and how the image was built), and vulnerability scan attestations (the scan results at build time). These attestations are signed with the same key, providing a verified chain of evidence about the image's origin and contents.

Discuss the organizational challenges. Key management is the hardest part. Who controls the root key? How are signing keys distributed to CI/CD systems? What is the process when a key is compromised? Keyless signing with Sigstore eliminates most key management complexity, which is why it is becoming the industry standard. Companies like Google use binary authorization (based on similar principles) to ensure only verified images run in production.

Follow-up questions:

  • What happens if the Notary server or Rekor transparency log is unavailable during a pull?
  • How do you handle image verification in air-gapped environments without internet access?
  • What is SLSA (Supply-chain Levels for Software Artifacts) and how does it relate to image signing?

14. How do you manage Docker in production — daemon configuration, resource management, and garbage collection?

What the interviewer is really asking: Have you operated Docker at scale and dealt with the operational challenges that come with long-running Docker hosts?

Answer framework:

Start with daemon configuration. The Docker daemon is configured via /etc/docker/daemon.json. Critical production settings: log-driver and log-opts to configure log rotation (prevent disk exhaustion from container logs), storage-driver (overlay2 is the standard), default-address-pools to control the IP ranges assigned to Docker networks (prevent conflicts with corporate VPN or infrastructure subnets), live-restore: true to keep containers running during daemon restarts (critical for minimizing disruption during Docker upgrades), and default-ulimits to set process, file descriptor, and memory lock limits for all containers.

Discuss resource management at the daemon level. Configure default CPU and memory limits using default-runtime options or runtime-specific configuration. Set container storage quotas to prevent individual containers from consuming all available disk. Configure the OOM score adjustment to control which processes the kernel kills first during memory pressure — Docker containers should generally be killed before critical system services.

Address garbage collection, which is the most common operational issue with Docker hosts. Docker accumulates unused data over time: stopped containers, unused images, dangling images (layers not referenced by any tagged image), unused volumes, and build cache. Without cleanup, Docker can consume all available disk space.

Automate garbage collection with docker system prune. Run docker system prune --filter until=24h daily via cron or systemd timer to remove containers, networks, and images not used in the last 24 hours. For build cache, run docker builder prune --keep-storage=5GB to cap build cache size. For volumes, docker volume prune removes only anonymous volumes — named volumes are never automatically removed to prevent data loss.

For orchestrated environments, discuss image garbage collection policy. The kubelet has built-in image garbage collection: imageGCHighThresholdPercent (default 85%) triggers cleanup, and imageGCLowThresholdPercent (default 80%) is the target after cleanup. It removes the least recently used images first. For Docker hosts not managed by Kubernetes, implement a similar policy using a scheduled script that identifies images not referenced by any running container and older than a configurable retention period.

Cover monitoring and alerting. Monitor disk usage on /var/lib/docker/ (or wherever Docker stores its data). Alert when usage exceeds 70% to allow proactive cleanup. Monitor Docker daemon health: docker info shows storage driver status, number of containers and images, and logging driver. Monitor container restart counts — a high restart count indicates crashlooping containers that may consume excessive resources through rapid image pulls and container creation.

Discuss security configuration. Enable user namespace remapping for defense-in-depth. Configure TLS for remote Docker API access. Use authorization plugins to control who can perform which Docker operations. Enable audit logging to track Docker API calls.

For production Docker infrastructure that feeds into container orchestration, these daemon-level concerns multiply across every node in the cluster.

Follow-up questions:

  • How would you handle a Docker host that has run out of disk space and cannot start new containers?
  • What is the impact of daemon restarts on running containers, and how does live-restore mitigate it?
  • How do you upgrade Docker on production hosts without downtime?

15. How does Docker compare to alternative container runtimes, and when would you choose something other than Docker?

What the interviewer is really asking: Do you understand the container ecosystem beyond Docker, and can you make informed technology choices?

Answer framework:

Start with the current container runtime landscape. Docker is a complete platform: it includes image building (BuildKit), image distribution (registry interaction), container runtime (containerd plus runc), networking (libnetwork), and orchestration (Docker Compose, formerly Swarm). However, each of these functions has specialized alternatives that may be better suited for specific use cases.

Podman is the most prominent Docker alternative. It is daemonless (no long-running root process), rootless by default, and CLI-compatible with Docker (alias docker=podman works for most commands). Podman uses the same OCI standards, so images built with Docker run identically on Podman and vice versa. Podman's pod concept (grouping containers that share namespaces, inspired by Kubernetes pods) makes it natural to develop for Kubernetes locally. For a detailed comparison, see Docker vs Podman.

Containerd is the container runtime that Docker uses internally, but it can be used directly without Docker. Kubernetes migrated from Docker (dockershim) to containerd as the default runtime. Using containerd directly reduces the runtime stack by removing the Docker daemon layer. For Kubernetes nodes, this means lower memory usage and fewer components to maintain. The trade-off is that containerd does not provide build tooling or the developer-friendly CLI that Docker offers.

CRI-O is a lightweight container runtime designed specifically for Kubernetes. It implements the Container Runtime Interface (CRI) with minimal additional functionality. CRI-O uses runc (or alternative OCI runtimes) for container execution and supports the same images as Docker and Podman. Red Hat OpenShift uses CRI-O as its default runtime.

For image building, discuss alternatives to Docker build. Kaniko builds images inside containers without requiring a Docker daemon or privileged access — ideal for CI/CD systems where mounting the Docker socket is a security concern. Buildah is the build component from the Podman ecosystem — it builds OCI-compliant images using Dockerfiles or a scriptable API, without a daemon. Both Kaniko and Buildah produce images identical to those built by Docker.

Discuss specialized runtimes for enhanced security. gVisor (by Google) interposes a user-space kernel between the container and the host kernel, intercepting system calls and reducing the attack surface. It introduces performance overhead (especially for I/O-heavy workloads) but provides much stronger isolation than standard containers. Kata Containers run each container inside a lightweight VM, providing hardware-level isolation comparable to virtual machines with container-like density and speed. Firecracker (by Amazon) is a micro-VM hypervisor designed for serverless and container workloads, providing VM-level isolation with millisecond startup times.

Provide decision guidance. For local development: Docker Desktop or Podman Desktop — familiarity and tooling ecosystem matter most. For Kubernetes nodes: containerd or CRI-O — lighter, purpose-built, and sufficient. For CI/CD image building: Kaniko (in Kubernetes), Buildah (on Linux hosts), or BuildKit with remote caching. For multi-tenant environments requiring strong isolation: gVisor or Kata Containers. For serverless platforms: Firecracker.

The key insight: the OCI (Open Container Initiative) standards — image format, runtime specification, distribution specification — mean that the ecosystem is interoperable. Images built with Docker run on Podman. Images built with Buildah run on containerd. The choice of runtime is an operational decision, not an application design decision.

Follow-up questions:

  • What are the performance implications of gVisor compared to standard runc containers?
  • How did the Kubernetes deprecation of dockershim affect Docker users?
  • What is the OCI runtime specification, and why is it important for the container ecosystem?

Common Mistakes in Docker Interviews

  1. Treating Docker as a virtual machine. Candidates who describe containers as lightweight VMs reveal a fundamental misunderstanding. Containers are isolated processes sharing the host kernel, not machines. This distinction affects security (kernel vulnerabilities affect all containers), performance (no hypervisor overhead but also no hardware isolation), and operations (containers should be ephemeral and stateless by default).

  2. Ignoring image security and supply chain concerns. Building FROM ubuntu:latest in production, never scanning images, and storing secrets in Dockerfiles are common anti-patterns. Senior candidates should proactively discuss supply chain security, image scanning, and secret management without being prompted.

  3. Not understanding the build cache. Candidates who cannot explain why changing a single source file rebuilds all dependencies, or who do not know that RUN apt-get update results are cached indefinitely, have not operated Docker build pipelines at scale. Understanding cache behavior is essential for maintaining fast, correct CI/CD pipelines.

  4. Overlooking resource limits and their kernel-level effects. Many candidates set CPU and memory limits without understanding CFS throttling or OOM kill behavior. In production, misconfigured resource limits are a top cause of performance degradation and unexpected container restarts.

  5. Conflating Docker with container orchestration. Docker Compose is not a production orchestrator. Docker Swarm is effectively deprecated. Senior candidates should clearly articulate when Docker alone is sufficient and when an orchestrator like Kubernetes is needed, rather than using the terms interchangeably.


How to Prepare for Docker Interviews

Build images from scratch — literally. Start FROM scratch and add only the binary and its dependencies. This exercise forces you to understand every layer, every dependency, and every configuration that your application needs. You will learn more from building a scratch-based image than from reading a hundred tutorials.

Study the kernel primitives. Experiment with namespaces and cgroups directly using unshare and cgcreate commands. Create a container manually without Docker. This gives you the vocabulary and mental model to debug container issues from first principles rather than searching Stack Overflow for error messages.

Optimize a real build pipeline. Take an existing project's Dockerfile, measure its build time and image size, and systematically optimize both. Track the before and after metrics. This exercise gives you concrete examples to discuss in interviews and develops intuition about caching, layering, and multi-stage builds.

Read CVE reports for container escapes. Understanding vulnerabilities like CVE-2019-5736 (runc container escape), CVE-2020-15257 (containerd host file access), and CVE-2024-21626 (runc working directory escape) teaches you what the security boundaries actually are and why defense-in-depth matters.

Practice debugging containers without a shell. Distroless images and scratch-based images have no shell, no package manager, and no diagnostic tools. Learn to use docker cp, docker export, nsenter, and strace from the host to investigate running containers. In production, you rarely have the luxury of installing debugging tools.

Understand the ecosystem. Know what Podman, Buildah, Kaniko, containerd, CRI-O, gVisor, and Kata Containers are, what problems they solve, and when you would choose each. Interviewers expect senior candidates to have a broader perspective than just Docker.

For a structured study plan covering Docker and the broader container ecosystem, explore our learning paths and see our pricing for full access to all preparation materials.


Related Resources

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.