INTERVIEW_QUESTIONS

PostgreSQL Interview Questions for Senior Engineers (2026)

Master advanced PostgreSQL interview questions covering indexing strategies, query optimization, replication, partitioning, MVCC internals, and production-grade database design for senior engineering roles.

20 min readUpdated Apr 19, 2026
interview-questionspostgresqlsenior-engineer

Why PostgreSQL Matters in Senior Engineering Interviews

PostgreSQL has become the default relational database for modern applications, powering everything from early-stage startups to enterprise platforms processing billions of transactions daily. Companies like Instagram, Stripe, and Notion rely heavily on PostgreSQL, and interviewers at these companies expect senior engineers to understand not just SQL syntax but the internal mechanics that determine whether a query takes 2 milliseconds or 20 seconds.

At the senior level, PostgreSQL questions go far beyond writing SELECT statements. Interviewers want to see that you can design schemas that scale, diagnose production performance issues under pressure, choose the right indexing strategy for complex query patterns, and make informed decisions about replication topologies. They want engineers who understand why PostgreSQL behaves a certain way, not just how to use it.

This guide covers the most frequently asked PostgreSQL interview questions at companies like Google, Amazon, and Stripe, with structured answer frameworks that demonstrate senior-level thinking. Each question includes the hidden intent behind what the interviewer is really testing, along with follow-up questions you should be prepared for. For broader database concepts, see our ACID properties deep dive and CAP theorem explainer.


1. Explain MVCC in PostgreSQL and how it differs from lock-based concurrency control.

What the interviewer is really asking: Do you understand PostgreSQL's fundamental concurrency model? Can you reason about visibility rules, vacuum, and the performance implications of long-running transactions?

Answer framework:

Multi-Version Concurrency Control (MVCC) is PostgreSQL's approach to handling concurrent access without readers blocking writers or writers blocking readers. Instead of acquiring locks on rows during reads, PostgreSQL maintains multiple versions of each row and uses transaction visibility rules to determine which version each transaction can see.

Every row in PostgreSQL has hidden system columns: xmin (the transaction ID that created this row version) and xmax (the transaction ID that deleted or updated this row version, creating a new version). When a transaction reads a row, PostgreSQL checks whether xmin is committed and visible to the current transaction, and whether xmax is either not set or not yet committed from the current transaction's perspective.

The key advantage over lock-based systems is concurrency. A SELECT never blocks an UPDATE, and an UPDATE never blocks a SELECT. Each transaction sees a consistent snapshot of the database as of its start time (for REPEATABLE READ) or statement start time (for READ COMMITTED, the default).

However, MVCC introduces its own challenges. Dead tuples accumulate as updates create new row versions while old versions remain until no transaction needs them. This is why VACUUM is essential — it reclaims space from dead tuples. Autovacuum handles this automatically, but misconfigured autovacuum settings are one of the most common causes of PostgreSQL performance degradation in production.

A critical production issue is transaction ID wraparound. PostgreSQL uses 32-bit transaction IDs, which means after roughly 2 billion transactions, IDs wrap around. PostgreSQL must freeze old transaction IDs before this happens. If autovacuum falls behind, PostgreSQL will eventually shut down to prevent data corruption — a scenario every senior engineer should know how to prevent.

Compared to MySQL's InnoDB (which also uses MVCC but stores undo information in a separate undo log), PostgreSQL stores old row versions directly in the table's heap, which can lead to table bloat but simplifies the implementation and avoids undo log contention.

Follow-up questions:

  • How would you tune autovacuum for a table with 100 million rows receiving 50,000 updates per second?
  • What is the difference between VACUUM and VACUUM FULL, and when would you use each?
  • How does HOT (Heap-Only Tuple) optimization reduce the overhead of updates?

2. How do you choose between B-tree, GIN, GiST, and BRIN indexes?

What the interviewer is really asking: Can you match index types to real query patterns? Do you understand the storage and maintenance trade-offs of each index type?

Answer framework:

PostgreSQL offers multiple index types, and choosing the right one is critical for query performance.

B-tree is the default and handles equality and range queries on scalar values. It works best for columns with high selectivity (many distinct values). B-tree indexes support ordering and can satisfy ORDER BY without a separate sort step. Use B-tree for primary keys, foreign keys, and columns frequently used in WHERE clauses with =, <, >, BETWEEN.

GIN (Generalized Inverted Index) is designed for composite values where you need to search for elements within a value. It excels at full-text search (tsvector), JSONB containment queries (@>), and array element searches. GIN indexes are slower to build and update but very fast for lookups. For a table storing JSONB documents, a GIN index on the JSONB column lets you efficiently query WHERE data @> '{"status": "active"}'.

GiST (Generalized Search Tree) handles spatial data (PostGIS geometry), range types, and full-text search. GiST indexes are lossy — they may return false positives that PostgreSQL rechecks against the heap. They support nearest-neighbor queries (ORDER BY distance) which B-tree cannot do. Use GiST for geographic queries, IP range lookups, and overlapping range queries.

BRIN (Block Range Index) is extremely compact and works best for naturally ordered data like timestamps in append-only tables. BRIN stores min/max values per block range (groups of pages). For a time-series table where rows are inserted in chronological order, a BRIN index on the timestamp column might be 1000x smaller than a B-tree index while providing comparable query performance for range scans.

A real-world example: for an e-commerce product search, you might use a B-tree on product_id, a GIN index on a tags array column for filtering, a GiST index on a location column for proximity searches, and a BRIN index on created_at for time-range queries. See our PostgreSQL vs MySQL comparison for how indexing strategies differ across databases.

Follow-up questions:

  • When would a partial index be more effective than a full index?
  • How do you use EXPLAIN ANALYZE to verify that PostgreSQL is actually using your index?
  • What is index-only scan and what conditions must be met for PostgreSQL to use one?

3. Walk me through how you would diagnose and fix a slow query in production.

What the interviewer is really asking: Do you have a systematic approach to performance debugging? Can you read execution plans and identify the root cause rather than guessing?

Answer framework:

Start with EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) on the problematic query. This gives you the actual execution plan with real timings, not just estimates.

Read the plan bottom-up, looking for the nodes with the highest actual time. Key things to watch for:

  1. Sequential scans on large tables — the planner chose not to use an index. Check if an appropriate index exists, if the table statistics are stale (run ANALYZE), or if the query is fetching too many rows for an index scan to be worthwhile.

  2. Nested loop joins with large outer tables — can indicate a missing index on the join column of the inner table. PostgreSQL defaults to nested loops when the outer table is small, but if row estimates are wrong, it may choose nested loops inappropriately.

  3. Large sort operations spilling to disk — check Sort Method: external merge vs Sort Method: quicksort. If sorting spills to disk, increase work_mem for the session or add an index that provides pre-sorted output.

  4. Rows removed by filter vs actual rows — a large discrepancy means PostgreSQL is reading far more rows than it returns. This often indicates a missing index or a composite index where the column order doesn't match the query pattern.

  5. Buffer usageshared hit means the page was in the buffer cache; shared read means it went to disk. High read counts indicate poor cache utilization.

Beyond EXPLAIN, check pg_stat_statements for aggregate query statistics, pg_stat_user_tables for sequential scan counts and dead tuple ratios, and pg_locks for lock contention.

A common production scenario: a query that was fast for months suddenly becomes slow. This often indicates bloat (autovacuum falling behind), statistics drift (data distribution changed), or a plan change (new planner choice after ANALYZE). Pinning a plan with pg_hint_plan or creating a more selective index usually resolves it.

For a broader perspective on database performance in distributed systems, see our guide on how caching works.

Follow-up questions:

  • How does PostgreSQL's query planner decide between a sequential scan and an index scan?
  • What is the role of random_page_cost and effective_cache_size in plan selection?
  • How would you handle a query that performs well in testing but poorly in production?

4. How would you design a PostgreSQL schema for a multi-tenant SaaS application?

What the interviewer is really asking: Can you balance data isolation, query performance, operational complexity, and cost in a multi-tenancy architecture?

Answer framework:

There are three primary approaches to multi-tenancy in PostgreSQL:

Shared table with tenant_id column — All tenants share the same tables, distinguished by a tenant_id column. This is the most common approach. Add tenant_id as the leading column in every index and use Row-Level Security (RLS) policies to enforce isolation. PostgreSQL's RLS lets you define policies like CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting('app.tenant_id')). The application sets this session variable on each request.

Pros: simple operations, easy migrations, efficient resource utilization. Cons: noisy neighbor risk (one large tenant's queries affect others), all data in one table means larger indexes.

Schema per tenant — Each tenant gets their own PostgreSQL schema within the same database. Tables are identical across schemas. The application sets search_path per request. This provides better logical isolation and makes it easier to handle tenant-specific customizations.

Pros: cleaner isolation, per-tenant backup/restore possible, easier to reason about. Cons: schema migrations must be applied to every schema, connection pooling is more complex (PgBouncer needs careful configuration), hundreds of schemas can slow down pg_dump.

Database per tenant — Maximum isolation but highest operational overhead. Only justified for compliance requirements (healthcare, finance) or when tenants need completely different configurations.

For most SaaS applications, the shared table approach with RLS is the right starting point. As you scale, you can shard large tenants to dedicated infrastructure while keeping small tenants on shared tables.

Key design considerations: use composite primary keys (tenant_id, entity_id) so partition pruning works efficiently. Consider table partitioning by tenant_id for very large deployments. Implement connection pooling with PgBouncer in transaction mode. Monitor per-tenant query patterns to identify noisy neighbors.

This connects directly to broader system design patterns and backend development principles.

Follow-up questions:

  • How would you handle schema migrations across 10,000 tenant schemas?
  • What are the trade-offs of using PostgreSQL partitioning for multi-tenancy?
  • How would you implement tenant-specific rate limiting at the database level?

5. Explain PostgreSQL's WAL (Write-Ahead Log) and its role in replication and crash recovery.

What the interviewer is really asking: Do you understand how PostgreSQL ensures durability and how replication actually works under the hood?

Answer framework:

The Write-Ahead Log (WAL) is the foundation of PostgreSQL's durability and replication. The core principle is simple: before any change is written to the actual data files (heap, indexes), it must first be written to the WAL. If PostgreSQL crashes, it replays the WAL from the last checkpoint to reconstruct any changes that were committed but not yet flushed to data files.

The WAL is a sequence of records stored in 16MB segment files (by default) in pg_wal/. Each WAL record describes a physical change to a data page. A checkpoint is a point where PostgreSQL guarantees all data pages are consistent on disk — during recovery, replay starts from the last checkpoint.

Key configuration parameters: wal_level controls how much information is written (minimal, replica, or logical). max_wal_size controls how much WAL accumulates before a checkpoint is triggered. synchronous_commit controls whether commits wait for WAL to be flushed to disk (on) or return before flush (off — faster but risks losing the last few transactions on crash).

Streaming Replication uses the WAL directly. The primary streams WAL records to standby servers in real-time. Standbys apply these records to maintain an almost-identical copy. synchronous_standby_names configures synchronous replication where the primary waits for at least one standby to confirm WAL receipt before acknowledging a commit. This guarantees zero data loss (RPO=0) at the cost of higher commit latency.

Logical Replication (introduced in PostgreSQL 10) decodes WAL records into logical changes (INSERT, UPDATE, DELETE operations) and publishes them. This enables selective table replication, cross-version replication, and replication to different database systems.

Production considerations: monitor pg_stat_replication for replication lag. Set wal_keep_size to prevent WAL segments from being recycled before standbys consume them. Use replication slots to guarantee WAL retention but monitor slot lag — an inactive slot will cause WAL to accumulate indefinitely and fill the disk.

This connects to how distributed databases work and consistency models.

Follow-up questions:

  • What happens if a synchronous standby goes down? How do you prevent the primary from hanging?
  • How would you set up point-in-time recovery (PITR) using WAL archiving?
  • What is the difference between physical and logical replication, and when would you choose each?

6. How does PostgreSQL handle table partitioning, and when should you use it?

What the interviewer is really asking: Can you identify when partitioning helps versus when it adds unnecessary complexity? Do you understand the operational implications?

Answer framework:

PostgreSQL supports declarative partitioning (since version 10) with three strategies: range, list, and hash.

Range partitioning divides data by value ranges — most commonly used for time-series data. A table partitioned by month allows queries with a date filter to scan only relevant partitions (partition pruning). Example: CREATE TABLE events (id bigint, created_at timestamptz, data jsonb) PARTITION BY RANGE (created_at), then create monthly partitions.

List partitioning divides data by discrete values — useful for multi-tenant tables partitioned by tenant_id or geographic data partitioned by region.

Hash partitioning distributes data evenly across a fixed number of partitions — useful when you need to spread write load but don't have a natural range or list key.

Partitioning helps when: the table exceeds hundreds of millions of rows, queries consistently filter on the partition key, you need to efficiently drop old data (detach and drop a partition is instant vs DELETE which creates dead tuples), or you need to maintain different storage policies per partition (recent data on fast storage, old data on cheap storage).

Partitioning hurts when: queries don't filter on the partition key (cross-partition queries can be slower), you have fewer than 10 million rows (overhead isn't worth it), or you have many partitions with foreign key constraints (PostgreSQL's FK support with partitioned tables has limitations).

Operational considerations: automate partition creation (use pg_partman), monitor for queries that aren't pruning partitions (check EXPLAIN output for Append nodes scanning all partitions), and be aware that unique constraints must include the partition key.

Follow-up questions:

  • How would you migrate a large existing table to a partitioned table with minimal downtime?
  • What are the limitations of partitioning with foreign keys?
  • How does partition pruning work at plan time vs execution time?

7. How do you implement and manage database migrations in a zero-downtime deployment?

What the interviewer is really asking: Do you understand which DDL operations lock tables and how to restructure schema changes to avoid downtime?

Answer framework:

The fundamental challenge is that many DDL operations in PostgreSQL acquire an ACCESS EXCLUSIVE lock, which blocks all reads and writes. For a table with active queries, even a brief lock acquisition can cause a cascade of blocked connections.

Safe operations (no or minimal locking): CREATE INDEX CONCURRENTLY (builds the index without holding a lock that blocks writes), ADD COLUMN with no default or with a non-volatile default (since PostgreSQL 11, adding a column with a constant default is instant), DROP COLUMN (marks column as dropped but doesn't rewrite the table).

Dangerous operations requiring careful handling: ALTER COLUMN TYPE (rewrites the entire table), ADD COLUMN with volatile DEFAULT (rewrites the table in versions before 11), ADD NOT NULL constraint (requires scanning all rows), CREATE INDEX without CONCURRENTLY (locks writes for the duration).

The expand-contract pattern for safe migrations:

  1. Expand: Add new column/table without removing old. Deploy application code that writes to both old and new structures. Backfill historical data.
  2. Contract: Once all data is migrated and the application only reads from the new structure, remove the old column/table.

For adding a NOT NULL constraint safely: first add a CHECK constraint with NOT VALID (doesn't scan existing rows), then VALIDATE CONSTRAINT in a separate transaction (acquires a weaker SHARE UPDATE EXCLUSIVE lock and scans rows without blocking writes).

Always set lock_timeout before running migrations: SET lock_timeout = '3s'. If the lock can't be acquired within 3 seconds (because of long-running queries), the migration fails instead of queuing behind and blocking all subsequent queries.

See our backend development guide for broader deployment patterns.

Follow-up questions:

  • How would you rename a column without downtime?
  • What is the difference between SHARE UPDATE EXCLUSIVE and ACCESS EXCLUSIVE locks?
  • How do you handle migrations that need to backfill data in a table with 500 million rows?

8. Explain connection pooling in PostgreSQL and when you would use PgBouncer vs built-in pooling.

What the interviewer is really asking: Do you understand why PostgreSQL's process-per-connection model creates scalability limits and how to architect around it?

Answer framework:

PostgreSQL forks a new backend process for every client connection. Each process consumes memory (typically 5-10MB of RSS per connection), and context switching between hundreds of processes degrades performance. In practice, PostgreSQL performs best with a connection count close to (2 * CPU cores) + number of disks for OLTP workloads.*

PgBouncer is an external connection pooler that sits between the application and PostgreSQL. It maintains a pool of server connections and multiplexes client connections onto them. Three pooling modes:

Transaction pooling — the most common mode. A server connection is assigned to a client for the duration of a transaction, then returned to the pool. This allows 1000+ application connections to share 50 PostgreSQL connections. Limitation: you cannot use session-level features like prepared statements (without protocol-level workarounds), LISTEN/NOTIFY, or SET commands.

Session pooling — a server connection is assigned for the entire client session. This supports all PostgreSQL features but provides less multiplexing benefit.

Statement pooling — a server connection is assigned per statement. Maximum multiplexing but doesn't support multi-statement transactions.

PostgreSQL 14+ includes a built-in connection pooler option, and there are alternatives like pgpool-II (which also provides load balancing and replication management) and Supabase's supavisor (built in Elixir for cloud-native deployments).

For most production deployments, PgBouncer in transaction mode with 50-100 server connections handles thousands of concurrent application connections efficiently. Place PgBouncer on the same host as PostgreSQL to minimize latency, or run it as a sidecar in Kubernetes.

This connects to how companies like Stripe and Instagram scale their PostgreSQL infrastructure.

Follow-up questions:

  • How would you handle prepared statements with PgBouncer in transaction mode?
  • What monitoring would you set up for a PgBouncer deployment?
  • When would pgpool-II be a better choice than PgBouncer?

9. How would you implement full-text search in PostgreSQL, and when would you reach for Elasticsearch instead?

What the interviewer is really asking: Can you make pragmatic technology choices? Do you know PostgreSQL's search capabilities and their limits?

Answer framework:

PostgreSQL has robust built-in full-text search. The core components are tsvector (a processed document representation with lexemes and positions) and tsquery (a search query with boolean operators). Use to_tsvector('english', document) to create the vector and to_tsquery('english', 'search & terms') to create the query. The @@ operator matches them.

For performance, create a GIN index on the tsvector column. You can also use a stored generated column: ALTER TABLE articles ADD COLUMN search_vector tsvector GENERATED ALWAYS AS (to_tsvector('english', coalesce(title,'') || ' ' || coalesce(body,''))) STORED and index it.

PostgreSQL FTS supports: multiple languages with stemming, phrase search ('quick <-> brown <-> fox'), prefix matching ('search:*'), ranking with ts_rank and ts_rank_cd, custom dictionaries and stop words, highlighting with ts_headline.*

Choose PostgreSQL FTS when: the dataset is under 10-50 million documents, the search requirements are straightforward, you want to keep your stack simple (no additional infrastructure), or you need transactional consistency between search and other data.

Choose Elasticsearch when: you need fuzzy matching and typo tolerance, you need faceted search with complex aggregations, the dataset exceeds 100 million documents, you need real-time analytics on the search data, or you need features like "did you mean" suggestions and autocomplete at scale.

A common pattern is to start with PostgreSQL FTS and migrate to Elasticsearch only when you hit its limitations. See our Elasticsearch deep dive and PostgreSQL vs Elasticsearch comparison for more details.

Follow-up questions:

  • How would you implement search suggestions/autocomplete in PostgreSQL?
  • What is the performance impact of GIN index updates on write-heavy tables?
  • How would you handle multilingual search in PostgreSQL?

10. Describe how you would set up high availability for a PostgreSQL database.

What the interviewer is really asking: Do you understand failover mechanisms, split-brain prevention, and the operational complexity of HA PostgreSQL?

Answer framework:

A production HA PostgreSQL setup typically consists of: a primary server, one or more synchronous/asynchronous standbys, a connection proxy or virtual IP for transparent failover, and a consensus mechanism for leader election.

The standard architecture uses Patroni (the industry standard for PostgreSQL HA) which manages a PostgreSQL cluster using a distributed configuration store (etcd, ZooKeeper, or Consul) for leader election. Patroni handles automatic failover: if the primary becomes unreachable, Patroni promotes the most up-to-date standby to primary and reconfigures other standbys to follow the new primary.

Key components:

  1. Patroni on each PostgreSQL node — manages PostgreSQL lifecycle, handles promotion and demotion
  2. etcd cluster (3 or 5 nodes) — stores cluster state and provides leader election
  3. HAProxy or PgBouncer — routes connections to the current primary. HAProxy health checks query Patroni's REST API to determine which node is primary
  4. Monitoring — alert on replication lag, failover events, and etcd health

Critical design decisions:

  • Synchronous vs asynchronous replication: synchronous guarantees zero data loss but adds latency (every commit waits for standby acknowledgment). Asynchronous has lower latency but may lose the last few transactions during failover.
  • Split-brain prevention: Patroni uses the distributed consensus store (etcd) to ensure only one node believes it's the primary. If a primary loses connection to etcd, it demotes itself.
  • Fencing: ensure the old primary cannot accept writes after a failover. Patroni handles this by shutting down PostgreSQL on the old primary before promoting the new one.

For cloud deployments, managed services like AWS RDS, Google Cloud SQL, and Azure Database for PostgreSQL handle HA automatically with multi-AZ deployments. The trade-off is less control over failover behavior and replication configuration.

This connects to distributed systems concepts and how consensus algorithms work.

Follow-up questions:

  • How would you test your failover procedure?
  • What is the impact of synchronous replication on write latency, and how do you mitigate it?
  • How would you handle a scenario where the promoted standby has less data than the old primary?

11. How do you handle time-series data in PostgreSQL?

What the interviewer is really asking: Do you know when PostgreSQL is sufficient for time-series workloads versus when you need a specialized database? Can you optimize PostgreSQL for append-heavy, time-windowed queries?

Answer framework:

PostgreSQL can handle time-series workloads effectively with the right design, especially with the TimescaleDB extension which adds native time-series capabilities.

Core design principles for time-series in PostgreSQL:

Partitioning by time: Partition the table by time range (daily, weekly, or monthly depending on data volume). This enables efficient partition pruning for time-range queries and instant data retention (drop old partitions instead of DELETE). Use pg_partman to automate partition creation and dropping.

BRIN indexes: For time-series data inserted in chronological order, BRIN indexes on the timestamp column are 100-1000x smaller than B-tree indexes while providing excellent performance for range scans.

Compression: PostgreSQL 15+ supports zstd compression at the TOAST level. TimescaleDB adds native columnar compression that can achieve 90-95% compression ratios for time-series data.

Continuous aggregates: Pre-compute rollups (hourly, daily averages) using materialized views that refresh incrementally. TimescaleDB automates this with its continuous aggregate feature.

Retention policies: Automate data lifecycle — keep raw data for 30 days, hourly aggregates for 1 year, daily aggregates indefinitely.

When to use PostgreSQL for time-series: moderate write volume (under 100K rows/second), need to join time-series with relational data, team already knows PostgreSQL, regulatory requirements to keep data in one system.

When to use a dedicated time-series database (InfluxDB, TimescaleDB, ClickHouse): extreme write throughput, specialized query functions (downsampling, gap filling, interpolation), need for columnar storage and compression from the start.

See our system design for monitoring systems for how time-series databases fit into larger architectures.

Follow-up questions:

  • How would you handle late-arriving data that needs to be inserted into already-compressed partitions?
  • What is the performance difference between BRIN and B-tree indexes for time-range queries?
  • How would you implement efficient percentile calculations on time-series data?

12. Explain PostgreSQL's JSON/JSONB capabilities and when to use them versus a document database.

What the interviewer is really asking: Can you make pragmatic schema design choices that balance flexibility with queryability and performance?

Answer framework:

PostgreSQL's JSONB type stores JSON in a decomposed binary format that supports efficient indexing and querying. Unlike the JSON type (which stores raw text), JSONB allows direct access to nested keys, supports containment and existence operators, and can be indexed with GIN indexes.

Key JSONB operators: -> extracts a JSON element as JSON, ->> extracts as text, @> tests containment, ? tests key existence, #> extracts at a path. JSONB also supports jsonb_path_query for SQL/JSON path expressions (PostgreSQL 12+).

Indexing JSONB: a default GIN index (CREATE INDEX ON table USING GIN (data)) indexes all keys and values, supporting @>, ?, ?|, and ?& operators. For queries on specific paths, an expression index is more efficient: CREATE INDEX ON table ((data->>'status')) for queries filtering on data->>'status'.

Use JSONB when: the schema is semi-structured and varies across rows (user preferences, metadata, event payloads), you need flexibility without frequent migrations, the JSON data is queried alongside relational data, or you want the transactional guarantees of PostgreSQL for your document data.

Use a document database like MongoDB when: the entire data model is document-oriented with deeply nested structures, you need horizontal sharding across hundreds of nodes, write throughput exceeds what a single PostgreSQL instance can handle, or the team is building a prototype that needs maximum schema flexibility.

A common pattern is to use relational columns for frequently queried/indexed fields and a JSONB column for flexible metadata: CREATE TABLE events (id bigint, type text, timestamp timestamptz, user_id bigint, metadata jsonb). This gives you the best of both worlds — relational performance for core queries and flexibility for variable data.

Follow-up questions:

  • How would you enforce a partial schema on a JSONB column?
  • What is the performance difference between GIN and B-tree indexes on JSONB expressions?
  • How do you handle JSONB in application code without losing type safety?

13. How do you approach PostgreSQL security for a production environment?

What the interviewer is really asking: Do you think about security holistically — authentication, authorization, encryption, and auditing — or just application-level concerns?

Answer framework:

Production PostgreSQL security spans multiple layers:

Authentication (pg_hba.conf): Use scram-sha-256 (not md5 or trust). Require SSL for all connections (hostssl entries). For cloud deployments, use IAM authentication (AWS RDS IAM, GCP Cloud SQL IAM) to avoid password management.

Authorization: Follow least privilege. Create separate roles for the application (read-write on specific tables), read replicas (read-only), migrations (DDL permissions), and monitoring (access to pg_stat_* views). Use GRANT and REVOKE explicitly. Never let the application connect as a superuser.*_

Row-Level Security (RLS): Essential for multi-tenant applications. Define policies that restrict row visibility based on the current user or session variables. Test RLS policies thoroughly — a misconfiguration can expose data across tenants.

Encryption: Enable SSL/TLS for connections in transit. For data at rest, use filesystem-level encryption (LUKS, AWS EBS encryption, GCP CMEK). For column-level encryption of sensitive data (PII, payment info), use pgcrypto extension, but be aware that encrypted columns cannot be indexed for range queries.

Auditing: Use the pgaudit extension to log DDL and DML operations. Configure which operations to audit (reads, writes, role changes) and send audit logs to a separate, append-only store.

Network security: Place PostgreSQL in a private subnet. Use security groups/firewall rules to allow connections only from application servers. For Kubernetes deployments, use NetworkPolicies to restrict pod-to-database access.

This connects to broader security patterns in backend development and cryptography concepts.

Follow-up questions:

  • How would you rotate database credentials without downtime?
  • What are the performance implications of enabling pgaudit on a high-throughput database?
  • How do you handle PII in PostgreSQL to comply with GDPR right to erasure?

14. What strategies do you use for PostgreSQL backup and disaster recovery?

What the interviewer is really asking: Do you have production experience with backup strategies, and do you understand RPO/RTO trade-offs?

Answer framework:

A comprehensive backup strategy combines multiple approaches:

Logical backups (pg_dump, pg_dumpall): Produce SQL or custom-format dumps. Portable across PostgreSQL versions. Slow for large databases (>100GB) but essential for selective table restoration and cross-version upgrades. Run against a standby to avoid impacting the primary.

Physical backups (base backup + WAL archiving): pg_basebackup takes a binary copy of the data directory. Combined with continuous WAL archiving (to S3, GCS, or Azure Blob), this enables Point-in-Time Recovery (PITR) — restore to any moment within your WAL retention window. Tools like pgBackRest and barman automate this with incremental backups, compression, and parallel restore.

RPO and RTO targets:

  • RPO (Recovery Point Objective): With continuous WAL archiving, RPO is typically seconds (the time between the last archived WAL segment and the failure). Synchronous replication achieves RPO=0.
  • RTO (Recovery Time Objective): Depends on database size. A 1TB database might take 30-60 minutes to restore from a physical backup. Incremental backups and parallel restore reduce this.

Testing backups: A backup that hasn't been tested is not a backup. Automate regular restore tests — spin up a temporary instance, restore the latest backup, run validation queries, and tear it down. Alert if restore tests fail.

Disaster recovery architecture: Primary in one region/AZ, synchronous standby in another AZ (same region, low latency), asynchronous standby in another region (cross-region DR). WAL archives replicated to a separate region.

For managed services, understand the built-in backup capabilities and their limitations. AWS RDS automated backups have a 35-day maximum retention, so you may need additional backup strategies for compliance.

Follow-up questions:

  • How would you restore a single table from a full physical backup?
  • What is the difference between pgBackRest's full, differential, and incremental backup types?
  • How would you handle a scenario where WAL archiving falls behind and you lose WAL segments?

15. How do you handle PostgreSQL upgrades across major versions?

What the interviewer is really asking: Have you actually done major version upgrades in production? Do you understand the risks and testing required?

Answer framework:

Major version upgrades (e.g., PostgreSQL 15 to 16) require careful planning because they involve changes to the internal storage format.

Three upgrade approaches:

pg_upgrade (in-place upgrade): The fastest approach. Uses hard links to avoid copying data files, completing in minutes even for terabyte-scale databases. Requires downtime (typically 5-30 minutes). Steps: install new PostgreSQL version, stop the old server, run pg_upgrade --link, start the new server, run ANALYZE to rebuild optimizer statistics. Run pg_upgrade --check first for a dry run.

Logical replication upgrade: Zero-downtime approach. Set up logical replication from the old version to a new-version instance. Once replication catches up, switch the application to the new instance. This works well but requires careful handling of sequences, DDL changes during replication, and large objects.

pg_dump/pg_restore: Simplest conceptually but slowest. Dump from old version, restore to new version. Only practical for smaller databases (<50GB) or when you need to restructure significantly.

Pre-upgrade testing is critical:

  1. Clone production data to a test environment
  2. Run the upgrade
  3. Run your full test suite and critical query benchmarks
  4. Compare EXPLAIN plans for top queries between old and new versions (planner behavior can change)
  5. Test all extensions for compatibility with the new version

Post-upgrade tasks: run ANALYZE on all tables (statistics are not preserved by pg_upgrade), monitor query performance closely for the first 24-48 hours, keep the old cluster available for quick rollback.

See our guide on system design interviews for how database upgrade strategies fit into broader architecture discussions.

Follow-up questions:

  • How would you handle an upgrade where a critical extension is not yet compatible with the new version?
  • What is the rollback procedure if the upgrade causes performance regressions?
  • How do you coordinate an upgrade across a Patroni-managed HA cluster?

Common Mistakes in PostgreSQL Interviews

  1. Not understanding MVCC trade-offs — Candidates describe MVCC benefits without mentioning dead tuples, bloat, or the need for VACUUM. Senior engineers must understand the maintenance implications.

  2. Defaulting to B-tree indexes for everything — PostgreSQL offers GIN, GiST, BRIN, and hash indexes for specific use cases. Choosing the right index type shows depth of knowledge.

  3. Ignoring connection limits — Suggesting thousands of direct connections without mentioning connection pooling reveals a lack of production experience.

  4. Underestimating schema migration complexity — Saying "just ALTER TABLE" without considering locking, downtime, and the expand-contract pattern shows lack of operational maturity.

  5. Not mentioning monitoring and observability — Production PostgreSQL requires monitoring replication lag, connection counts, cache hit ratios, long-running queries, and bloat. Senior engineers proactively discuss observability.

  6. Treating PostgreSQL as interchangeable with MySQL — While both are relational databases, their internals differ significantly (MVCC implementation, replication models, extension ecosystem). Demonstrate specific PostgreSQL knowledge.

How to Prepare

Week 1: Set up a local PostgreSQL instance. Practice EXPLAIN ANALYZE on complex queries. Understand the output of pg_stat_statements and pg_stat_user_tables.

Week 2: Implement partitioning, create different index types, and set up streaming replication locally using Docker Compose.

Week 3: Study real-world PostgreSQL architectures from companies like Instagram (sharding), Notion (migration from MongoDB), and Citus (distributed PostgreSQL).

Week 4: Practice articulating trade-offs. For every design decision, prepare to explain why you chose PostgreSQL over alternatives, and when PostgreSQL would not be the right choice.

For a comprehensive study plan, see our system design interview guide and explore the learning paths for structured preparation. Ready to accelerate your preparation? Check out our pricing plans for full access to all interview resources.

Related Resources

GO DEEPER

Master this topic in our 12-week cohort

Our Advanced System Design cohort covers this and 11 other deep-dive topics with live sessions, assignments, and expert feedback.