Subset sum problem: Difference between revisions

Content deleted Content added

Inline

Revision as of 15:13, 1 June 2021

The subset sum problem (SSP) is a decision problem in computer science. In its most general formulation, there is a multiset $S$ of integers and a target-sum $T$ , and the question is to decide whether any subset of the integers sum to precisely $T$ .^[1] The problem is known to be NP-complete. Moreover, some restricted variants of it are NP-complete too, for example:^[1]

The variant in which all inputs are positive.
The variant in which inputs may be positive or negative, and $T=0$ . For example, given the set $\{-7,-3,-2,9000,5,8\}$ , the answer is yes because the subset $\{-3,-2,5\}$ sums to zero.
The variant in which all inputs are positive, and the target sum is exactly half the sum of all inputs, i.e., $T={\frac {1}{2}}(a_{1}+\dots +a_{n})$ . This special case of SSP is known as the partition problem.

SSP can also be regarded as an optimization problem: find a subset whose sum is at most T, and subject to that, as close as possible to T. It is NP-hard, but there are several algorithms that can solve it reasonably quickly in practice.

SSP is a special case of the knapsack problem.^[2]

Computational hardness

The run-time complexity of SSP depends on two parameters: n - the number of input integers, and L - the precision of the problem, stated as the number of binary place values that it takes to state the problem.

If n (the number of integers) is a small fixed number, then an exhaustive search for the solution is practical.
If L (the number of binary digits) is a small fixed number, then there are dynamic programming algorithms that can solve it exactly.

When both n and L are large, SSP is NP-hard. The complexity of the best known algorithms is exponential in the smaller of the two parameters n and L. The following variants are known to be NP-hard:

All input integers are positive (and the target-sum T is a part of the input). This can be proved by a direct reduction from 3SAT.^[3]^[4]
The input integers can be both positive and negative, and the target-sum T = 0. This can be proved by reduction from the variant with positive integers. Denote that variant by SubsetSumPositive and the current variant by SubsetSumZero. Given an instance (S, T) of SubsetSumPositive, construct an instance of SubsetSumZero by adding a single element with value -T. Given a solution to the SubsetSumPositive instance, adding the -T yields a solution to the SubsetSumZero instance. Conversely, given a solution to the SubsetSumZero instance, it must contain the -T (since all integers in S are positive), so to get a sum of zero, it must also contain a subset of S with a sum of +T, which is a solution of the SubsetSumPositive instance.
The input integers are positive, and T = sum(S)/2. This can also be proved by reduction from the general variant; see partition problem.

Exponential time algorithms

There are several ways to solve SSP in time exponential in n.^[5]

Inclusion-Exclusion

The most naïve algorithm would be to cycle through all subsets of n numbers and, for every one of them, check if the subset sums to the right number. The running time is of order $O(2^{n}\cdot n)$ , since there are $2^{n}$ subsets and, to check each subset, we need to sum at most n elements.

The algorithm can be implemented by depth-first search of a binary tree: each level in the tree corresponds to an input number; the left branch corresponds to excluding the number from the set, and the right branch corresponds to including the number (hence the name Inclusion-Exclusion). The memory required is $O(n)$ . The run-time can be improved by several heuristics:^[5]

Process the input numbers in descending order.
If the integers included in a given node exceed the sum of the best subset found so far, the node is pruned.
If the integers included in a given node, plus all remaining integers, are less than the sum of the best subset found so far, the node is pruned.

Horowitz and Sahni

Horowitz and Sahni^[6] published a faster exponential-time algorithm, which runs in time $O(2^{n/2}\cdot (n/2))$ , but requires much more space - $O(2^{n/2})$ . The algorithm splits arbitrarily the n elements into two sets of $n/2$ each. For each of these two sets, it stores a list of the sums of all $2^{n/2}$ possible subsets of its elements. Each of these two lists is then sorted. Using a standard comparison sorting algorithm for this step would take time $O(2^{n/2}n)$ . However, given a sorted list of sums for $k$ elements, the list can be expanded to two sorted lists with the introduction of a ( $k+1$ )th element, and these two sorted lists can be merged in time $O(2^{k})$ . Thus, each list can be generated in sorted form in time $O(2^{n/2})$ . Given the two sorted lists, the algorithm can check if an element of the first array and an element of the second array sum up to T in time $O(2^{n/2})$ . To do that, the algorithm passes through the first array in decreasing order (starting at the largest element) and the second array in increasing order (starting at the smallest element). Whenever the sum of the current element in the first array and the current element in the second array is more than T, the algorithm moves to the next element in the first array. If it is less than T, the algorithm moves to the next element in the second array. If two elements that sum to T are found, it stops.

Schroeppel and Shamir

Schroeppel and Shamir^[7] presented an algorithm based on Horowitz and Sanhi, that requires similar runtime - $O(2^{n/2}\cdot (n/4))$ , much less space - $O(2^{n/4})$ . Rather than generating all subsets of n/2 elements in advance, they partition the elements into 4 sets of n/4 elements each, and generate subsets of n/2 elements dynamically using a min heap.

Due to space requirements, the HS algorithm is practical for up to about 50 integers, and the SS algorithm is practical for up to 100 integers.^[5]

Howgrave-Graham and Joux

Howgrave-Graham and Joux^[8] presented a probabilistic algorithm that runs faster than all previous ones - in time $O(2^{n/3})$ . It solves only the decision problem, cannot prove there is no solution for a given sum, and does not return the subset sum closest to T.

Pseudo-polynomial time dynamic programming solution

SSP can be solved in pseudo-polynomial time using dynamic programming. Suppose the sequence is

x_{1},\ldots ,x_{N}

sorted in the increasing order and we wish to determine if there is a nonempty subset which sums to zero. Define the boolean-valued function $Q(i,s)$ to be the value ( $true$ or $false$ ) of

"there is a nonempty subset of

x_{1},\ldots ,x_{i}

which sums to

s

."

Thus, the solution to the problem "Given a set of integers, is there a non-empty subset whose sum is zero?" is the value of $Q(N,0)$ .

Let $A$ be the sum of the negative values and $B$ the sum of the positive values. Clearly, $Q(i,s)=false$ , if $s<A$ or $s>B$ . So these values do not need to be stored or computed.

Create an array to hold the values $Q(i,s)$ for $1\leq i\leq N$ and $A\leq s\leq B$ .

The array can now be filled in using a simple recursion. Initially, for $A\leq s\leq B$ , set

Q(1,s):=(x_{1}==s)

where $==$ is a boolean function that returns true if $x_{1}$ is equal to $s$ , false otherwise.

Then, for $i=2,\ldots ,N$ , set

Q(i,s):=Q(i-1,s)

or

(x_{i}==s)

or

Q(i-1,s-x_{i}),forA\leq s\leq B

.

For each assignment, the values of $Q$ on the right side are already known, either because they were stored in the table for the previous value of $i$ or because $Q(i-1,s-x_{i})=false$ if $s-x_{i}<A$ or $s-x_{i}>B$ . Therefore, the total number of arithmetic operations is $O(N(B-A))$ . For example, if all the values are $O(N^{k})$ for some $k$ , then the time required is $O(N^{k+2})$ .

This algorithm is easily modified to return the subset with sum 0 if there is one.

The dynamic programming solution has runtime of $O(sN)$ where $s$ is the sum we want to find in set of $N$ numbers. This solution does not count as polynomial time in complexity theory because $B-A$ is not polynomial in the size of the problem, which is the number of bits used to represent it. This algorithm is polynomial in the values of $A$ and $B$ , which are exponential in their numbers of bits.

For the case that each $x_{i}$ is positive and bounded by a fixed constant $C$ , Pisinger found a linear time algorithm having time complexity $O(NC)$ (note that this is for the version of the problem where the target sum is not necessarily zero, otherwise the problem would be trivial).^[9]^[10] In 2015, Koiliaris and Xu found a deterministic ${\tilde {O}}(s{\sqrt {N}})$ algorithm for the subset sum problem where $s$ is the sum we need to find.^[11] In 2017, Bringmann found a randomized ${\tilde {O}}(s)$ time algorithm.^[12]

In 2014, Curtis and Sanches found a simple recursion highly scalable in SIMD machines having $O(N(m-x_{\min })/p)$ time and $O(N+m-x_{\min })$ space, where $p$ is the number of processing elements, $m=\min(s,\sum x_{i}-s)$ and $x_{\min }$ is the lowest integer.^[13] This is the best theoretical parallel complexity known so far.

A comparison of practical results and the solution of hard instances of the SSP is discussed by Curtis and Sanches.^[14]

Polynomial time approximation algorithms

^{[citation needed]}

Suppose all inputs are positive. An approximation algorithm to SSP aims to find a subset of S with a sum of at most T and at least r times the optimal sum, where r is a number in (0,1) called the approximation ratio.

Simple 1/2-approximation

The following very simple algorithm has an approximation ratio of 1/2:^[15]

Order the inputs by descending value;
Put the next-largest input into the subset, as long as it fits there.

When this algorithm terminates, either all inputs are in the subset (which is obviously optimal), or there is an input that does not fit. The first such input is smaller than all previous inputs that are in the subset, so it must be smaller than T/2. Therefore, the sum of inputs in the subset is more than T/2, which is obviously more than OPT/2.

Fully-polynomial time approximation scheme

The following algorithm attains, for every $\epsilon >0$ , an approximation ratio of $(1-\epsilon )$ . Its run time is polynomial in $n$ and $1/\epsilon$ . Recall that n is the number of inputs and T is the upper bound to the subset sum.

initialize a list L to contain one element 0.

for each i from 1 to n do
    let U_i be a list containing all elements y in L, and all sums x_i + y for all y in L.
    sort U_i in ascending order
    make L empty 
    let y be the smallest element of U_i
    add y to L
    for each element z of U_i in increasing order do
        // Trim the list by eliminating numbers close to one another
        // and throw out elements greater than the target sum T.
        if y +  ε T/n < z ≤ T then
            y = z
            add z to L

return the largest element in L.

Note that without the trimming step (the inner "for each" loop), the list L would contain the sums of all $2^{n}$ subsets of inputs. The trimming step does two things:

It ensures that all sums remaining in L are below T, so they are feasible solutions to the subset-sum problem.
It ensures that the list L is "sparse", that is, the difference between each two consecutive partial-sums is at least $\epsilon T/n$ .

These properties together guarantee that the list $L$ contains no more than $n/\epsilon$ elements; therefore the run-time is polynomial in $n/\epsilon$ .

When the algorithm ends, if the optimal sum is in $L$ , then it is returned and we are done. Otherwise, it must have been removed in a previous trimming step. Each trimming step introduces an additive error of at most $\epsilon T/n$ , so $n$ steps together introduce an error of at most $\epsilon T$ . Therefore, the returned solution is at least $OPT-\epsilon T$ which is at least $(1-\epsilon )OPT$ .

The above algorithm provides the solution for the original subset sum problem in the case where the numbers are small (again, for non-negative numbers). If any sum of the numbers can be specified with at most $P$ bits, then solving the problem approximately with $\epsilon =2^{-P}$ is equivalent to solving it exactly. Then, the polynomial time algorithm for approximate subset sum becomes an exact algorithm with running time polynomial in $n$ and $2^{P}$ (i.e., exponential in $P$ ).

Another FPTAS for subset sum is available at ^[16].

Multiple subset sum problem

The multiple subset sum problem^[15] (MSSP) is a generalization of SSP in which we are allowed to select some m>1 subsets, where the sum in each subset is at most T. The problem has two variants:

Max-sum MSSP: the goal is that the sum of all subsets is as large as possible.
Max-min MSSP (also called bottleneck MSSP or BMSSP): the goal is that the smallest subset sum is as large as possible.

When m is variable (a part of the input), both problems are strongly NP-hard, by reduction from 3-partition. This means that they have no FPTAS unless P=NP.

Even when m=2, the problems do not have an FPTAS unless P=NP. This can be shown by a reduction from the equal-cardinality partition problem (EPART):

Given an instance a₁,...,a_n of EPART with target sum T, construct an instance 2T+a₁, ..., 2T+a_n of MSSP with target sum (n+1)T for both subsets.
A solution to EPART consists of two parts, each of which has n/2 elements with a sum of T. It corresponds to an optimal solution of both MSSP variants: two subsets with a sum of (n+1)T, which is the largest possible. Similarly, each optimal solution of MSSP corresponds to a solution to EPART.
Any non-optimal solution to MSSP leaves at least one item unallocated, so its sum is at most 2nT and its minimum is at most nT. In both variants, the approximation ratio is at most $1-1/(n+1)$ .
Therefore, for any $\epsilon <1/(n+1)$ , any algorithm with approximation ratio $(1-\epsilon )$ must find the optimal solution if it exists.
If we had an FPTAS, then we would have an algorithm with e.g. $\epsilon =1/(n+2)$ , with run-time polynomial in n. This algorithm could be used to solve EPART in time polynomial in n. But this is not possible unless P=NP.

The following approximation algorithms are known:^[15]

For max-sum MSSP, with variable m: a PTAS, which runs in time O(m+n) when $\epsilon$ is fixed. The run-time is at least exponential in $1/\epsilon ^{2}$ , and the authors consider it impractical.
For max-min MSSP:
- With variable m: a 2/3-approximation, in time O(n log n). No better approximation is possible unless P=NP (by reduction from 3-partition).
- With fixed m: a PTAS, running in time $O(n^{2m/\epsilon })$ .
- With a fixed number of distinct input values: a PTAS using Lenstra's algorithm.

References

^ ^a ^b Kleinberg, Jon; Tardos, Éva (2006). Algorithm Design (2nd ed.). p. 491. ISBN 0-321-37291-3.
^ Martello, Silvano; Toth, Paolo (1990). "4 Subset-sum problem". Knapsack problems: Algorithms and computer interpretations. Wiley-Interscience. pp. 105–136. ISBN 0-471-92420-2. MR 1086874.
^ Goodrich, Michael. "More NP complete and NP hard problems" (PDF).
^ Informal-CS (2018). "Reduction : 3-CNF SAT to Subset Sum". Youtube.{{cite web}}: CS1 maint: url-status (link)
^ ^a ^b ^c Richard E. Korf, Ethan L. Schreiber, and Michael D. Moffitt (2014). "Optimal Sequential Multi-Way Number Partitioning" (PDF).{{cite web}}: CS1 maint: multiple names: authors list (link)
^ Horowitz, Ellis; Sahni, Sartaj (1974), "Computing partitions with applications to the knapsack problem" (PDF), Journal of the Association for Computing Machinery, 21 (2): 277–292, doi:10.1145/321812.321823, hdl:1813/5989, MR 0354006, S2CID 16866858
^ Schroeppel, Richard; Shamir, Adi (1981-08-01). "A $T = O(2^{n/2} )$, $S = O(2^{n/4} )$ Algorithm for Certain NP-Complete Problems". SIAM Journal on Computing. 10 (3): 456–464. doi:10.1137/0210033. ISSN 0097-5397.
^ Howgrave-Graham, Nick; Joux, Antoine (2010). Gilbert, Henri (ed.). "New Generic Algorithms for Hard Knapsacks". Advances in Cryptology – EUROCRYPT 2010. Lecture Notes in Computer Science. 6110. Berlin, Heidelberg: Springer: 235–256. doi:10.1007/978-3-642-13190-5_12. ISBN 978-3-642-13190-5.
^ http://hjemmesider.diku.dk/~pisinger/codes.html
^ Pisinger D (1999). "Linear Time Algorithms for Knapsack Problems with Bounded Weights". Journal of Algorithms, Volume 33, Number 1, October 1999, pp. 1–14
^ Koiliaris, Konstantinos; Xu, Chao (2015-07-08). "A Faster Pseudopolynomial Time Algorithm for Subset Sum". arXiv:1507.02318 [cs.DS].
^ Bringmann K. A near-linear pseudopolynomial time algorithm for subset sum[C]//Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2017: 1073-1084
^ Curtis, V. V.; Sanches, C. A. A. (January 2016). "An efficient solution to the subset-sum problem on GPU: An efficient solution to the subset-sum problem on GPU". Concurrency and Computation: Practice and Experience. 28 (1): 95–113. doi:10.1002/cpe.3636. S2CID 20927927.
^ Curtis, V. V.; Sanches, C. A. A. (July 2017). "A low-space algorithm for the subset-sum problem on GPU". Computers & Operations Research. 83: 120–124. doi:10.1016/j.cor.2017.02.006.
^ ^a ^b ^c Caprara, Alberto; Kellerer, Hans; Pferschy, Ulrich (2000-02-01). "The Multiple Subset Sum Problem". SIAM Journal on Optimization. 11 (2): 308–319. doi:10.1137/S1052623498348481. ISSN 1052-6234.
^ Hans Kellerer; Ulrich Pferschy; David Pisinger (2004), Knapsack problems, Springer, p. 97, ISBN 9783540402862
^ Nicosia, Gaia; Pacifici, Andrea; Pferschy, Ulrich (2015). Hoefer, Martin (ed.). "Brief Announcement: On the Fair Subset Sum Problem". Algorithmic Game Theory. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 309–311. doi:10.1007/978-3-662-48433-3_28. ISBN 978-3-662-48433-3.

@@ Line 127: / Line 127: @@
 The above algorithm provides the solution for the original subset sum problem in the case where the numbers are small (again, for non-negative numbers). If any sum of the numbers can be specified with at most <math>P</math> bits, then solving the problem approximately with <math>\epsilon = 2^{-P}</math> is equivalent to solving it exactly. Then, the polynomial time algorithm for approximate subset sum becomes an exact algorithm with running time polynomial in <math>n</math> and <math>2^P</math> (i.e., exponential in <math>P</math>).
+Another FPTAS for subset sum is available at <ref name="knapsack">{{citation|author1=Hans Kellerer|title=Knapsack problems|url=https://books.google.com/books?id=u5DB7gck08YC&pg=PA97|page=97|year=2004|publisher=Springer|isbn=9783540402862|author2=Ulrich Pferschy|author3=David Pisinger}}</ref>.
 == Multiple subset sum problem ==