Metric k-center

In graph theory, the metric k-center problem is a combinatorial optimization problem studied in theoretical computer science. Given n cities with specified distances, one wants to build k warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of k vertices for which the largest distance of any point to its closest vertex in the k-set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

Formal definition

Let ${\displaystyle (X,d)}$ be a metric space where ${\displaystyle X}$ is a set and ${\displaystyle d}$ is a metric
A set ${\displaystyle \mathbf {V} \subseteq {\mathcal {X}}}$, is provided together with a parameter ${\displaystyle k}$. The goal is to find a subset ${\displaystyle {\mathcal {C}}\subseteq \mathbf {V} }$ with ${\displaystyle |{\mathcal {C}}|=k}$ such that the maximum distance of a point in ${\displaystyle \mathbf {V} }$ to the closest point in ${\displaystyle {\mathcal {C}}}$ is minimized. The problem can be formally defined as follows:
For a metric space (${\displaystyle {\mathcal {X}}}$,d),

• Input: a set ${\displaystyle \mathbf {V} \subseteq {\mathcal {X}}}$, and a parameter ${\displaystyle k}$.
• Output: a set ${\displaystyle {\mathcal {C}}\subseteq \mathbf {V} }$ of ${\displaystyle k}$ points.
• Goal: Minimize the cost ${\displaystyle r^{\mathcal {C}}(\mathbf {V} )={\underset {v\in V}{\max }}}$ d(v,${\displaystyle {\mathcal {C}}}$)

That is, every point in a cluster is in distance at most ${\displaystyle r^{\mathcal {C}}(V)}$ from its respective center. [1]

The k-Center Clustering problem can also be defined on a complete undirected graph G = (VE) as follows:
Given a complete undirected graph G = (VE) with distances d(vivj) ∈ N satisfying the triangle inequality, find a subset C ⊆ V with |C| = k while minimizing:

${\displaystyle \max _{v\in V}\min _{c\in C}d(v,c)}$

Computational complexity

In a complete undirected graph G = (VE), if we sort the edges in non-decreasing order of the distances: d(e1) ≤ d(e2) ≤ ... ≤ d(em) and let Gi = (V, Ei), where Ei = {e1e2, ..., ei}. The k-center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k. [2]

Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index i such that Gi has a dominating set of size at most k), which is precisely the difficult core of the NP-Hard problems. Although a Turing reduction can get around this issue by trying all values of k.

Approximations

A simple greedy algorithm

A simple greedy approximation algorithm that achieves an approximation factor of 2 builds ${\displaystyle {\mathcal {C}}}$ using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:

• Pick an arbitrary point ${\displaystyle {\bar {c}}_{1}}$ into ${\displaystyle C_{1}}$
• For every point ${\displaystyle v\in \mathbf {V} }$ compute ${\displaystyle d_{1}[v]}$ from ${\displaystyle {\bar {c}}_{1}}$
• Pick the point ${\displaystyle {\bar {c}}_{2}}$ with highest distance from ${\displaystyle {\bar {c}}_{1}}$.
• Add it to the set of centers and denote this expanded set of centers as ${\displaystyle C_{2}}$. Continue this till k centers are found

Running time

• The ith iteration of choosing the ith center takes ${\displaystyle {\mathcal {O}}(n)}$ time.
• There are k such iterations.
• Thus, overall the algorithm takes ${\displaystyle {\mathcal {O}}(nk)}$ time.[3]

Proving the approximation factor

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor.

Given a set of n points ${\displaystyle \mathbf {V} \subseteq {\mathcal {X}}}$, belonging to a metric space (${\displaystyle {\mathcal {X}}}$,d), the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V.

i.e. ${\displaystyle r^{\mathbf {K} }(\mathbf {V} )\leq 2r^{opt}(\mathbf {V} ,{\textit {k}})}$ [1]

This theorem can be proven using two cases as follows,

Case 1: Every cluster of ${\displaystyle {\mathcal {C}}_{opt}}$ contains exactly one point of ${\displaystyle \mathbf {K} }$

• Consider a point ${\displaystyle v\in \mathbf {V} }$
• Let ${\displaystyle {\bar {c}}}$ be the center it belongs to in ${\displaystyle {\mathcal {C}}_{opt}}$
• Let ${\displaystyle {\bar {k}}}$ be the center of ${\displaystyle \mathbf {K} }$ that is in ${\displaystyle \Pi ({\mathcal {C}}_{opt},{\bar {c}})}$
• ${\displaystyle d(v,{\bar {c}})=d(v,{\mathcal {C}}_{opt})\leq r^{opt}(\mathbf {V} ,k)}$
• Similarly, ${\displaystyle d({\bar {k}},{\bar {c}})=d({\bar {k}},{\mathcal {C}}_{opt})\leq r^{opt}}$
• By the triangle inequality: ${\displaystyle d(v,{\bar {k}})\leq d(v,{\bar {c}})+d({\bar {c}},{\bar {k}})\leq 2r^{opt}}$

Case 2: There are two centers ${\displaystyle {\bar {k}}}$ and ${\displaystyle {\bar {u}}}$ of ${\displaystyle \mathbf {K} }$ that are both in ${\displaystyle \Pi ({\mathcal {C}}_{opt},{\bar {c}})}$, for some ${\displaystyle {\bar {c}}\in {\mathcal {C}}_{opt}}$ (By pigeon hole principle, this is the only other possibility)

• Assume, without loss of generality, that ${\displaystyle {\bar {u}}}$ was added later to the center set ${\displaystyle \mathbf {K} }$ by the greedy algorithm, say in ith iteration.
• But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that ${\displaystyle {\bar {k}}\in {\mathcal {C}}_{i-1}}$and,

{\displaystyle {\begin{aligned}r^{\mathbf {K} }(\mathbf {V} )\leq r^{{\mathcal {C}}_{i-1}}(\mathbf {V} )&=d({\bar {u}},{\mathcal {C}}_{i-1})\\&\leq d({\bar {u}},{\bar {k}})\\&\leq d({\bar {u}},{\bar {c}})+d({\bar {c}},{\bar {k}})\\&\leq 2r^{opt}\end{aligned}}} [1]

Another 2-factor approximation algorithm

Another algorithm with the same approximation factor takes advantage of the fact that the k-Center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k and computes a maximal independent set of Gi, looking for the smallest index i that has a maximal independent set with a size of at least k. [4] It is not possible to find an approximation algorithm with an approximation factor of 2 − ε for any ε > 0, unless P = NP. [5] Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP. [6]

Parameterized approximations

It can be shown that the k-Center problem is W[2]-hard to approximate within a factor of 2 − ε for any ε > 0, when using k as the parameter.[7] This is also true when parameterizing by the doubling dimension (in fact the dimension of a Manhattan metric), unless P=NP.[8] When considering the combined parameter given by k and the doubling dimension, k-Center is still W[1]-hard but it is possible to obtain a parameterized approximation scheme.[9] This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.[10]

10. ^ Feldmann, Andreas Emil; Vu, Tung Anh (2022). "Generalized $$k$$-Center: Distinguishing Doubling and Highway Dimension". In Bekos, Michael A.; Kaufmann, Michael (eds.). Graph-Theoretic Concepts in Computer Science. Lecture Notes in Computer Science. Vol. 13453. Cham: Springer International Publishing. pp. 215–229. arXiv:2209.00675. doi:10.1007/978-3-031-15914-5_16. ISBN 978-3-031-15914-5.