Entry Date:
September 5, 2004

EpiChord: Large Routing State plus Parallel Queries equals Improved DHT Lookup Performance and Resilience

Principal Investigator Barbara Liskov

Co-investigator Erik Demaine


In recent years, more than a dozen DHT lookup algorithms and routing topologies have been proposed. Most initial DHT research was directed towards minimizing the amount of routing state per node while preserving good routing performance. In a dynamic network with a relatively high churn rate, routing state will inevitably become outdated over time and timeouts will occur when a node tries to query another node that has failed or has left the network. Most DHTs hence attempt to keep the amount of routing state reasonably small (O(log n)), so that nodes can periodically refresh their routing state by probing their neighbors at a frequency high enough to make the probability of timeouts negligible.

We make two observations: (1) Existing DHTs tend to decouple the lookup process from routing state maintenance, but there seems to be no compelling reason to do so, other than perhaps that lookup traffic is unpredictable and it is not feasible to depend entirely on network traffic to update the routing state if we want to provide guarantees on the lookup performance. (2) Limiting the amount of routing state to O(log n) seems to artificially limit the routing performance of a DHT even if the level of network churn is sufficiently low for a DHT to maintain larger amounts of network state, thereby achieving better network performance.

We propose EpiChord, a DHT that allows for the maintanence of large amounts (> O(log n)) of routing state by gleaning information from lookup traffic, while still providing a worst-case O(log n)-hop performance guarantee without introducing significant communication overhead in terms of additional amounts of network traffic.

We have implemented our lookup algorithm on the SSFNet simulation framework and are currently evaluating its performance in terms of both lookup hop count, latency and traffic generated relative to that for the Chord lookup algorithm.

EpiChord is currently not fully optimized. There is still significant flexibility for nodes to adopt individual policies to further enhance and optimize their individual (and thereby global) lookup performance, if so desired. For example, a node that discovers a high rate of node failures within the network (i.e., from the fact that many queries are unacknowledged) can adaptively increase the number of parallel queries per lookup as well as be more aggressive in flushing old entries from its cache. One can also imagine improving the dissemination of routing state by piggybacking additional random node entries on requests or responses.