Donut - A Distributed Hash Table

Chord Implementation

Introduction

Chord is an overlay network that maps logical addresses to physical nodes in a Peer-to-Peer system. Chord associates a ranges of keys with a particular node using a variant of consistent hashing. While traditional consistent hashing schemes require nodes to maintain state of the the system as a whole, Chord only requires that nodes know of a fixed number of “finger” nodes, allowing both massive scalability while maintaining efficient lookup time (O(log n)).

Terms

Chord

Chord works by arranging keys in a ring, such that when we reach the highest value in the key-space, the following key is zero. Nodes take on a particular key in this ring.

Successor

Node n is called the successor of a key k if and only if n is the closest node after or equal to k on the ring.

Predecessor

Node n is called the predecessor of a key k if and only n is the closest node before k in the ring.

Finger

Nodes maintain a set of other nodes, called fingers, which refer to other points on the ring. The first finger of node n is always the successor of n.

Variables

Variable	Definition
K	the total number of possible keys
k	a particular key in the key-space
m	log(K)
N	total number of nodes currently in the ring
n	a particular node
r	number of successors to keep in the successor list

Basic Design

A node n with key k stores a finger table containing the successors of the keys k + 2⁰, k + 2¹, k + 2², …, k + 2^m. (Note: it may be the case that one node in two different finger entries). Given this distribution of fingers, the number of lookup of any key’s successor, as well as the number of messages to achieve the same, is O(log(N)).

Chord’s lookup protocol is defined by the findSuccessor procedure. If node n realizes it is the predecessor of requested key k, n.successor is returned. Otherwise, the node finds the closest predecessor of k by searching fingers and invoking findSuccessor on the closest preceding node.

Donut’s findSuccessor is recursive, meaning each node in the call tree blocks until a result is found. It is also possible to do this asynchronously, where the predeccessor of the key returns its successor directly to the client. Donut is particularly attune to this optimization since the clients are request servers still internal to the system. However, network hops are not a primary performance bottleneck, so this optimization has not been implemented.

Joins

Nodes joining a chord ring must bootstrap using an existing node in the circle. To join existing known node n, joining node n’ invokes n.findSuccessor(n’). Once n’ finds its successor, n", n’ notifies n" which instructs n" to set its predecessor to n’. This is done in the stabilize routine. The joined is complete once the node immediately preceding n’ recognizes that it is the predecessor of n’.

Stabilize and Notify, Fix Fingers and Check Predecessor

With joining and leaving nodes, finger table entries, successors and predecessors must be periodically updated for all nodes. These operations are done at scheduled intervals spanning three procedures: fixFingers, stabilize, and checkPredecessor.

stabilize verifies a node’s immediate successor and then notifies that successor to set its predecessor correctly. stabilize first queries its successor for its predecessor. If the successor has a predecessor between the two nodes, the node corrects who its successor is. This can occur when a new node joins the ring and its key lies between the stabilizing node and it’s immediate successor.

fixFingers sequentially updates the finger table by invoking findSuccessor. Specifically, to update the i^th entry in the finger table, findSuccessor is called on the key k + 2ⁱ, where k is the current node’s key.

checkPredecessor ensures that a node’s predecessor is alive and well. A node periodically pings its predecessor. If the node does not respond, it is assumed dead and the node sets itself as the predecessor, awaiting notification from its new predecessor.

Leaves and Successor Lists

As nodes leave the ring, usually due to system crashes or network errors, the ring stabilizes itself through fixFingers, stabilize and checkPredecessor. Between these three functions the predecessor gets set by notify, invalidated by checkPredecessor, fingers get updated by fixFingers, and the successor gets updated by stabilize. However, when a node’s successor fails it must be invalidated and set to a new successor in order for the ring to be correct. Unlike a node’s predecessor, a node will not be notified of a new, valid successor. For added robustness, a node keeps a list of r successors from which to remove the failed, old, successor and immediately update to the next successor. More formally, a successor list is the set of r successors to a node. The successor list is populated by the notify response from a node to its predecessor.

By using a successor list to handle node leaves, Donut can ensure the overlay network is resilient from up to r – 1 concurrent node failures within a defined interval.