Maximum subarray problem
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)(Learn how and when to remove this template message)
In computer science, the maximum subarray problem is the task of finding a contiguous subarray with the largest sum, within a given one-dimensional array A[1...n] of numbers. Formally, the task is to find indices and with , such that the sum
is as large as possible. (Some formulations of the problem also allow the empty subarray to be considered; by convention, the sum of all values of the empty subarray is zero.) Each number in the input array A could be positive, negative, or zero.
For example, for the array of values [−2, 1, −3, 4, −1, 2, 1, −5, 4], the contiguous subarray with the largest sum is [4, −1, 2, 1], with sum 6.
Some properties of this problem are:
- If the array contains all non-negative numbers, then the problem is trivial; the maximum subarray is the entire array.
- If the array contains all non-positive numbers, then the solution is the number in the array with the smallest absolute value (or the empty subarray, if it is permitted).
- Several different sub-arrays may have the same maximum sum.
This problem can be solved using several different algorithmic techniques, including brute force, divide and conquer, dynamic programming, and reduction to shortest paths.
The maximum subarray problem was proposed by Ulf Grenander in 1977 as a simplified model for maximum likelihood estimation of patterns in digitized images. Grenander was looking to find a rectangular subarray with maximum sum, in a two-dimensional array of real numbers. A brute-force algorithm for the two-dimensional problem runs in O(n6) time; because this was prohibitively slow, Grenander proposed the one-dimensional problem to gain insight into its structure. Grenander derived an algorithm that solves the one-dimensional problem in O(n2) time, improving the brute force running time of O(n3).
Jay Kadane of Carnegie Mellon University soon after designed an O(n)-time algorithm for the one-dimensional problem, which is clearly as fast as possible. The same O(n)-time algorithm was later automatically derived by algebraic manipulation of the brute-force algorithm using the Bird–Meertens formalism.
Grenander's two-dimensional generalization can be solved in O(n3) time either by using Kadane's algorithm as a subroutine, or through a divide-and-conquer approach. Slightly faster algorithms based on distance matrix multiplication have been proposed by Hisao Tamaki and Takeshi Tokuyama in 1998 and by Tadao Takaoka in 2002. There is some evidence that no significantly faster algorithm exists; an algorithm that solves the two-dimensional maximum subarray problem in O(n3−ε) time, for any ε>0, would imply a similarly fast algorithm for the all-pairs shortest paths problem.
The score-based technique of finding the segment with the highest total sum is used in many problems similar to the MSP. In genomic sequence analysis, these problems include conserved segments, GC-rich regions, tandem repeats, low-complexity filter, DNA binding domains, and regions of high charge.
For computer vision , maximum subarray algorithms are used in bitmap images to detect the highest score subsequence which represents the brightest area in an image. The image is a two dimensional array of positive values that corresponds to the brightness of a pixel. The algorithm is evaluated after normalizing every value in the array by subtracting them from the mean brightness.
Data mining is an application of maximum subarray algorithms with numerical attributes. To understand the role of the maximum subarray problem in data mining it is important to be familiar with the association rule and its parts. The association rule is an if/then statement that creates relationships between unrelated pieces of data. The two parts of the association rule are the antecedent (if statement) and the consequent (then statement). In business applications of data mining, association rules are important in examining and foreseeing customer's actions/behavior. Data mining is used for companies to anticipate common trends, by representing problems with maximum subarray problem into an sub-array to be normalized and solved. The highest clarify] of the array will give companies information of what customers are responding well to and will influence the companies' actions as a result.[
A bit of a background: Kadane's algorithm is based on splitting up the set of possible solutions into mutually exclusive (disjoint) sets. It exploits the fact that any solution (i.e., any member of the set of solutions) will always have a last element (this is what is meant by "sum ending at position "). Thus, we simply have to examine, one by one, the set of solutions whose last element's index is , the set of solutions whose last element's index is , then , and so forth to . It turns out that this process can be carried out in linear time.
Kadane's algorithm begins with a simple inductive question: if we know the maximum subarray sum ending at position (call this ), what is the maximum subarray sum ending at position (equivalently, what is )? The answer turns out to be relatively straightforward: either the maximum subarray sum ending at position includes the maximum subarray sum ending at position as a prefix, or it doesn't (equivalently, , where is the element at index ).
Thus, we can compute the maximum subarray sum ending at position for all positions by iterating once over the array. As we go, we simply keep track of the maximum sum we've ever seen. Thus, the problem can be solved with the following code, expressed here in Python:
1 def max_subarray(A): 2 max_ending_here = max_so_far = A 3 for x in A[1:]: 4 max_ending_here = max(x, max_ending_here + x) 5 max_so_far = max(max_so_far, max_ending_here) 6 return max_so_far
Note: with a bit of reasoning you will see that
max_so_far is equal to . In particular, with every iteration of
x, we know that
max_ending_here contains the sum of the largest subset right before
max_ending_here will then contain the largest subset ending with
x. The algorithm will then update
max_so_far to be equal to the larger of
max_ending_here. Thus, we see that with every iteration of some element
x, we find the largest subset that could end with
x. Since we know that the largest subset has to end with an element in
A, we have found the largest subset in
The algorithm can also be easily modified to keep track of the starting and ending indices of the maximum subarray as well as the case where we want to allow zero-length subarrays (with implicit sum 0) if all elements are negative.
1 def max_subarray(A): 2 max_ending_here = A 3 startOld = start = end = max_so_far = 0 4 for i, x in enumerate(A[1:], 1): 5 max_ending_here = max(x, max_ending_here + x) 6 max_so_far = max(max_so_far, max_ending_here) 7 if max_ending_here < 0: 8 start = i + 1 9 elif max_ending_here == max_so_far: 10 startOld = start 11 end = i 12 return (max_so_far , startOld, end)
Because of the way this algorithm uses optimal substructures (the maximum subarray ending at each position is calculated in a simple way from a related but smaller and overlapping subproblem: the maximum subarray ending at the previous position) this algorithm can be viewed as a simple/trivial example of dynamic programming.
The runtime complexity of Kadane's algorithm is .
Similar problems may be posed for higher-dimensional arrays, but their solutions are more complicated; see, e.g., Takaoka (2002). Brodal & Jørgensen (2007) showed how to find the k largest subarray sums in a one-dimensional array, in the optimal time bound .
The Maximum sum k-disjoint subarrays can also be computed in the optimal time bound .
- Bentley, Jon (1984). "Programming Pearls: Algorithm Design Techniques". Communications of the ACM. 27 (9): 865–873. doi:10.1145/358234.381162.
- Richard S. Bird (1989). "Algebraic Identities for Program Calculation" (PDF). The Computer Journal. 32 (2): 122–126. Here: Sect.8, p.126
- Tamaki, Hisao; Tokuyama, Takeshi (1998). "Algorithms for the Maximum Subarray Problem Based on Matrix Multiplication". Proceedings of the 9th SODA (Symposium on Discrete Algorithms): 446–452. Retrieved November 17, 2018.
- Takaoka, Tadao (2002). "Efficient Algorithms for the Maximum Subarray Problem by Distance Matrix Multiplication". Electronic Notes in Theoretical Computer Science. 61: 191–200. doi:10.1016/S1571-0661(04)00313-5.
- Backurs, Arturs; Dikkala, Nishanth; Tzamos, Christos (2016). "Tight Hardness Results for Maximum Weight Rectangles". Proc. 43rd International Colloquium on Automata, Languages, and Programming: 81:1–81:13. doi:10.4230/LIPIcs.ICALP.2016.81.
- Bengtsson, Fredrik; Chen, Jingsen (2007). "Computing maximum-scoring segments optimally" (PDF). Luleå University of Technology (3).
- Brodal, Gerth Stølting; Jørgensen, Allan Grønlund (2007), "A linear time algorithm for the k maximal sums problem", Mathematical Foundations of Computer Science 2007, Lecture Notes in Computer Science, 4708, Springer-Verlag, pp. 442–453, doi:10.1007/978-3-540-74456-6_40.
- Takaoka, T. (2002), "Efficient algorithms for the maximum subarray problem by distance matrix multiplication" (PDF), Electronic Notes in Theoretical Computer Science, 61.
- TAN, Lirong. "Maximum Contiguous Subarray Sum Problems" (PDF).
- Mu, Shin-Cheng (2010). "The Maximum Segment Sum Problem: Its Origin, and a Derivation".
- Bae, Sung Eun (2007). "Sequential and Parallel Algorithms for the Generalized Maximum Subarray Problem" (PDF).
- "Notes on Maximum Subarray Problem". 2012.