## Chord Implementation

### Introduction

Chord is an overlay network that maps logical addresses to physical nodes in a Peer-to-Peer system. Chord associates a ranges of keys with a particular node using a variant of consistent hashing. While traditional consistent hashing schemes require nodes to maintain state of the the system as a whole, Chord only requires that nodes know of a fixed number of “finger” nodes, allowing both massive scalability while maintaining efficient lookup time (O(log n)).

### Terms

#### Chord

Chord works by arranging keys in a ring, such that when we reach the highest value in the key-space, the following key is zero. Nodes take on a particular key in this ring.

#### Successor

Node *n* is called the successor of a key *k* if and only if *n* is the closest node after or equal to *k* on the ring.

#### Predecessor

Node *n* is called the predecessor of a key *k* if and only *n* is the closest node before *k* in the ring.

#### Finger

Nodes maintain a set of other nodes, called *fingers*, which refer to other points on the ring. The first *finger* of node *n* is always the successor of *n*.

#### Variables

Variable | Definition |
---|---|

K |
the total number of possible keys |

k |
a particular key in the key-space |

m |
log(K) |

N |
total number of nodes currently in the ring |

n |
a particular node |

r |
number of successors to keep in the successor list |

### Basic Design

A node *n* with key *k* stores a finger table containing the successors of the keys *k* + 2^{0}, *k* + 2^{1}, *k* + 2^{2}, …, *k* + 2^{m}. (Note: it may be the case that one node in two different finger entries). Given this distribution of fingers, the number of lookup of any key’s successor, as well as the number of messages to achieve the same, is O(log(*N*)).

Chord’s lookup protocol is defined by the *findSuccessor* procedure. If node *n* realizes it is the predecessor of requested key *k*, *n*.*successor* is returned. Otherwise, the node finds the closest predecessor of *k* by searching *fingers* and invoking *findSuccessor* on the closest preceding node.

Donut’s *findSuccessor* is recursive, meaning each node in the call tree blocks until a result is found. It is also possible to do this asynchronously, where the predeccessor of the key returns its successor directly to the client. Donut is particularly attune to this optimization since the clients are request servers still internal to the system. However, network hops are not a primary performance bottleneck, so this optimization has not been implemented.

### Joins

Nodes joining a chord ring must bootstrap using an existing node in the circle. To join existing known node *n*, joining node *n’* invokes *n.findSuccessor(n’)*. Once *n’* finds its successor, *n"*, *n’* notifies *n"* which instructs *n"* to set its predecessor to *n’*. This is done in the *stabilize* routine. The joined is complete once the node immediately preceding *n’* recognizes that it is the predecessor of *n’*.

### Stabilize and Notify, Fix Fingers and Check Predecessor

With joining and leaving nodes, finger table entries, successors and predecessors must be periodically updated for all nodes. These operations are done at scheduled intervals spanning three procedures: *fixFingers*, *stabilize*, and *checkPredecessor*.

*stabilize* verifies a node’s immediate successor and then notifies that successor to set its predecessor correctly. *stabilize* first queries its successor for its predecessor. If the successor has a predecessor between the two nodes, the node corrects who its successor is. This can occur when a new node joins the ring and its key lies between the stabilizing node and it’s immediate successor.

*fixFingers* sequentially updates the finger table by invoking *findSuccessor*. Specifically, to update the *i ^{th}* entry in the finger table,

*findSuccessor*is called on the key

*k*+ 2

^{i}, where

*k*is the current node’s key.

*checkPredecessor* ensures that a node’s predecessor is alive and well. A node periodically pings its predecessor. If the node does not respond, it is assumed dead and the node sets itself as the predecessor, awaiting notification from its new predecessor.

### Leaves and Successor Lists

As nodes leave the ring, usually due to system crashes or network errors, the ring stabilizes itself through *fixFingers*, *stabilize* and *checkPredecessor*. Between these three functions the predecessor gets set by *notify*, invalidated by *checkPredecessor*, fingers get updated by *fixFingers*, and the successor gets updated by *stabilize*. However, when a node’s successor fails it must be invalidated and set to a new successor in order for the ring to be correct. Unlike a node’s predecessor, a node will not be notified of a new, valid successor. For added robustness, a node keeps a list of *r* successors from which to remove the failed, old, successor and immediately update to the next successor. More formally, a successor list is the set of *r* successors to a node. The successor list is populated by the *notify* response from a node to its predecessor.

By using a successor list to handle node leaves, Donut can ensure the overlay network is resilient from up to *r* – 1 concurrent node failures within a defined interval.