# Distance oracle

In computing, a distance oracle (DO) is a data structure for calculating distances between vertices in a graph.

## Introduction

Let G(V,E) be an undirected, weighted graph, with n = |V| nodes and m = |E| edges. We would like to answer queries of the form "what is the distance between the nodes s and t?".

One way to do this is just run the Dijkstra algorithm. This takes time ${\displaystyle O(m+n\log n)}$, and requires no extra space (besides the graph itself).

In order to answer many queries more efficiently, we can spend some time in pre-processing the graph and creating an auxiliary data structure.

A simple data structure that achieves this goal is a matrix which specifies, for each pair of nodes, the distance between them. This structure allows us to answer queries in constant time ${\displaystyle O(1)}$, but requires ${\displaystyle O(n^{2})}$ extra space. It can be initialized in time ${\displaystyle O(n^{3})}$ using an all-pairs shortest paths algorithm, such as the Floyd–Warshall algorithm.

A DO lies between these two extremes. It uses less than ${\displaystyle O(n^{2})}$ space in order to answer queries in less than ${\displaystyle O(m+n\log n)}$ time. Most DOs have to compromise on accuracy, i.e. they don't return the accurate distance but rather a constant-factor approximation of it.

## Approximate DO

Thorup and Zwick[1] describe more than 10 different DOs. They then suggest a new DO that, for every k, requires space ${\displaystyle O(kn^{1+1/k})}$, such that any subsequent distance query can be approximately answered in time ${\displaystyle O(k)}$. The approximate distance returned is of stretch at most ${\displaystyle 2k-1}$, that is, the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and ${\displaystyle 2k-1}$. The initialization time is ${\displaystyle O(kmn^{1/k})}$.

Some special cases include:

• For ${\displaystyle k=1}$ we get the simple distance matrix.
• For ${\displaystyle k=2}$ we get a structure using ${\displaystyle O(n^{1.5})}$ space which answers each query in constant time and approximation factor at most 3.
• For ${\displaystyle k=\lfloor \log n\rfloor }$, we get a structure using ${\displaystyle O(n\log n)}$ space, query time ${\displaystyle O(\log n)}$, and stretch ${\displaystyle O(\log n)}$.

Higher values of k do not improve the space or preprocessing time.

### DO for general metric spaces

The oracle is built of a decreasing collection of k+1 sets of vertices:

• ${\displaystyle A_{0}=V}$
• For every ${\displaystyle i=1,\ldots ,k-1}$: ${\displaystyle A_{i}}$ contains each element of ${\displaystyle A_{i-1}}$, independently, with probability ${\displaystyle n^{-1/k}}$. Note that the expected size of ${\displaystyle A_{i}}$ is ${\displaystyle n^{1-i/k}}$. The elements of ${\displaystyle A_{i}}$ are called i-centers.
• ${\displaystyle A_{k}=\emptyset }$

For every node v, calculate its distance from each of these sets:

• For every ${\displaystyle i=0,\ldots ,k-1}$: ${\displaystyle d(A_{i},v)=\min {(d(w,v)|w\in A_{i}}}$ and ${\displaystyle p_{i}(v)=\arg \min {(d(w,v)|w\in A_{i}}}$. I.e., ${\displaystyle p_{i}(v)}$ is the i-center nearest to v, and ${\displaystyle d(A_{i},v)}$ is the distance between them. Note that for a fixed v, this distance is weakly increasing with i. Also note that for every v, ${\displaystyle d(A_{0},v)=0}$ and ${\displaystyle p_{0}(v)=v}$.
• ${\displaystyle d(A_{k},v)=\infty }$.

For every node v, calculate:

• For every ${\displaystyle i=0,\ldots ,k-1}$: ${\displaystyle B_{i}(v)=\{w\in A_{i}\setminus A_{i+1}|d(w,v). ${\displaystyle B_{i}(v)}$ contains all vertices in ${\displaystyle A_{i}}$ which are strictly closer to v than all vertices in ${\displaystyle A_{i+1}}$. The partial unions of ${\displaystyle B_{i}}$s are balls in increasing diameter, that contain vertices with distances up to the first vertex of the next level.

For every v, compute its bunch:

• ${\displaystyle B(v)=\bigcup _{i=0}^{k-1}B_{i}(v).}$

It is possible to show that the expected size of ${\displaystyle B(v)}$ is at most ${\displaystyle kn^{1/k}}$.

For every bunch ${\displaystyle B(v)}$, construct a hash table that holds, for every ${\displaystyle w\in B(V)}$, the distance ${\displaystyle d(w,v)}$.

The total size of the data structure is ${\displaystyle O(kn+\Sigma |B(v)|)=O(kn+nkn^{1/k})=O(kn^{1+1/k})}$

Having this structure initialized, the following algorithm finds the distance between two nodes, u and v:

• ${\displaystyle w:=u,i:=0}$
• while ${\displaystyle w\notin B(v)}$:
• ${\displaystyle i:=i+1}$
• ${\displaystyle (u,v):=(v,u)}$ (swap the two input nodes; this does not change the distance between them since the graph is undirected).
• ${\displaystyle w:=p_{i}(u)}$
• return ${\displaystyle d(w,u)+d(w,v)}$

It is possible to show that, in each iteration, the distance ${\displaystyle d(w,u)}$ grows by at most ${\displaystyle d(v,u)}$. Since ${\displaystyle A_{k-1}\subseteq B(v)}$, there are at most k-1 iterations, so finally ${\displaystyle d(w,u)\leq (k-1)d(v,u)}$. Now by the triangle inequality, ${\displaystyle d(w,v)\leq d(w,u)+d(u,v)\leq kd(v,u)}$, so the distance returned is at most ${\displaystyle (2k-1)d(u,v)}$.

### Improvements

The above result was later improved by Patrascu and Roditty[2] who suggest a DO of size ${\displaystyle O(n^{4/3}m^{1/3})}$ which returns a factor 2 approximation.

## Reduction from set intersection oracle

If there is a DO with an approximation factor of at most 2, then it is possible to build a set intersection oracle (SIO) with query time ${\displaystyle O(1)}$ and space requirements ${\displaystyle O(N+n)}$, where n is the number of sets and N the sum of their sizes; see set intersection oracle#Reduction to approximate distance oracle.

It is believed that the SIO problem does not have a non-trivial solution. I.e., it requires ${\displaystyle \Omega (n^{2})}$ space to answer queries in time ${\displaystyle O(1)}$, e.g. using an n-by-n table with the intersection between each two sets. If this conjecture is true, this implies that there is no DO with an approximation factor of less than 2 and a constant query time.[2]

## References

1. ^ Thorup, M.; Zwick, U. (2005). "Approximate distance oracles". Journal of the ACM. 52: 1. doi:10.1145/1044731.1044732.
2. ^ a b Patrascu, M.; Roditty, L. (2010). Distance Oracles beyond the Thorup–Zwick Bound. 2010 IEEE 51st Annual Symposium on Foundations of Computer Science (FOCS). p. 815. doi:10.1109/FOCS.2010.83. ISBN 978-1-4244-8525-3.