Schedule - PGCon 2020

Distributed snapshots and global deadlock detection

Date: 2020-05-28
Time: 10:00–10:45
Room: Stream 2
Level: Intermediate

In this talk we would like to share our experiences in implementing MVCC in an open source scale out environment based on PostgreSQL. Scale out environment is characterised by multiple PostgreSQL servers with one master PostgreSQL server designated as the entry point for clients. Transactions may update data residing on more than one PostgreSQL instance. Isolating two or more such transactions running concurrently is a major challenge in a scale out system. Different isolation levels have their own challenges, serialisable being the hardest to implement efficiently. Distributed snapshots enable individual PostgreSQL instances within a scale out system to determine status (in-progress, committed, aborted) of a transaction and to decide whether effects of a transaction are visible using a given snapshot. The talk will go over this distributed snapshot mechanism in detail including several corner cases that we found tricky to implement right. Note that each PostgreSQL instance in the scale out system continues to create local snapshots and local transactions.

Distributed deadlock occurs when the wait cycle spans multiple PostgreSQL instances. To detect it, wait graphs from each PostgreSQL instance need to aggregated and cycle detecting be performed on the aggregated data. The talk will describe how we model vertices and edges in such a graph, the method used to aggregate this information, being mindful of performance of the distributed system.

Video

Slides

The following slides have been made available for this session:

Speaker

Asim Rama Praveen
Hubert Zhang