Maximum subarray problem
In computer science, the maximum subarray problem is the task of finding a contiguous subarray with the largest sum, within a given one-dimensional array A[1...n] of numbers. Formally, the task is to find indices and with , such that the sum
is as large as possible. (Some formulations of the problem also allow the empty subarray to be considered; by convention, the sum of all values of the empty subarray is zero.) Each number in the input array A could be positive, negative, or zero.
For example, for the array of values [−2, 1, −3, 4, −1, 2, 1, −5, 4], the contiguous subarray with the largest sum is [4, −1, 2, 1], with sum 6.
Some properties of this problem are:
- If the array contains all non-negative numbers, then the problem is trivial; the maximum subarray is the entire array.
- If the array contains all non-positive numbers, then the solution is the number in the array with the smallest absolute value (or the empty subarray, if it is permitted).
- Several different sub-arrays may have the same maximum sum.
Grenander was looking to find a rectangular subarray with maximum sum, in a two-dimensional array of real numbers. A brute-force algorithm for the two-dimensional problem runs in O(n6) time; because this was prohibitively slow, Grenander proposed the one-dimensional problem to gain insight into its structure. Grenander derived an algorithm that solves the one-dimensional problem in O(n2) time,[note 1] improving the brute force running time of O(n3). When Michael Shamos heard about the problem, he overnight devised an O(n log n) divide-and-conquer algorithm for it. Soon after, Shamos described the one-dimensional problem and its history at a Carnegie Mellon University seminar attended by Jay Kadane, who designed within a minute an O(n)-time algorithm, which is clearly as fast as possible. In 1982, David Gries obtained the same O(n)-time algorithm by applying Dijkstra's "standard strategy"; in 1989, Richard Bird derived it by purely algebraic manipulation of the brute-force algorithm using the Bird–Meertens formalism.
Grenander's two-dimensional generalization can be solved in O(n3) time either by using Kadane's algorithm as a subroutine, or through a divide-and-conquer approach. Slightly faster algorithms based on distance matrix multiplication have been proposed by Tamaki & Tokuyama (1998) and by Takaoka (2002). There is some evidence that no significantly faster algorithm exists; an algorithm that solves the two-dimensional maximum subarray problem in O(n3−ε) time, for any ε>0, would imply a similarly fast algorithm for the all-pairs shortest paths problem.
This section needs attention from an expert in Computational biology. The specific problem is: fix inline tags.September 2019)(
The score-based technique of finding the segment with the highest total sum is used in many problems similar to the MSP. In genomic sequence analysis, these problems include conserved segments, GC-rich regions, tandem repeats, low-complexity filter, DNA binding domains, and regions of high charge.
For computer vision , maximum subarray algorithms are used in bitmap images to detect the highest score subsequence which represents the brightest area in an image. The image is a two dimensional array of positive values that corresponds to the brightness of a pixel. The algorithm is evaluated after normalizing every value in the array by subtracting them from the mean brightness.
Data mining is an application of maximum subarray algorithms with numerical attributes. To understand the role of the maximum subarray problem in data mining it is important to be familiar with the association rule and its parts. The association rule is an if/then statement that creates relationships between unrelated pieces of data. The two parts of the association rule are the antecedent (if statement) and the consequent (then statement). In business applications of data mining, association rules are important in examining and foreseeing customer's actions/behavior. Data mining is used for companies to anticipate common trends, by representing problems with maximum subarray problem into an sub-array to be normalized and solved. The highest clarify] of the array will give companies information of what customers are responding well to and will influence the companies' actions as a result.[
Kadane's algorithm scans the given array from left to right.
In the th step, it computes the subarray with the largest sum ending at ; this sum is maintained in variable
Moreover, it computes the subarray with the largest sum anywhere in , maintained in variable
and easily obtained as the maximum of all values of
current_sum seen so far, cf. line 6 of the algorithm.
As a loop invariant, in the th step, the old value of
current_sum holds the maximum over all of the sum .[note 4]
is the maximum over all of the sum . To extend the latter maximum to cover also the case , it is sufficient to consider also the empty subarray . This is done in line 5 by assigning
current_sum as the new value of
current_sum, which after that holds the maximum over all of the sum .
1 def max_subarray(numbers): 2 """Find a contiguous subarray with the largest sum.""" 3 best_sum = 0 # or: float('-inf') 4 current_sum = 0 5 for x in numbers: 6 current_sum = max(0, current_sum + x) 7 best_sum = max(best_sum, current_sum) 8 return best_sum
This version of the algorithm will return 0 if the input contains no positive elements (including when the input is empty). For the variant of the problem which disallows empty subarrays,
best_sum should be initialized to negative infinity instead and also in the for loop
current_sum should be updated as
max(x, current_sum + x).[note 6]
In that case, if the input contains no positive element, the returned value is that of the largest element (i.e., the least negative value), or negative infinity if the input was empty.
The algorithm can be modified to keep track of the starting and ending indices of the maximum subarray as well:
1 def max_subarray(numbers): 2 """Find a contiguous subarray with the largest sum.""" 3 best_sum = 0 # or: float('-inf') 4 best_start = best_end = 0 # or: None 5 current_sum = 0 6 for current_end, x in enumerate(numbers): 7 if current_sum <= 0: 8 # Start a new sequence at the current element 9 current_start = current_end 10 current_sum = x 11 else: 12 # Extend the existing sequence with the current element 13 current_sum += x 14 15 if current_sum > best_sum: 16 best_sum = current_sum 17 best_start = current_start 18 best_end = current_end + 1 # the +1 is to make 'best_end' exclusive 19 20 return best_sum, best_start, best_end
In Python, arrays are indexed starting from 0, and the end index is typically excluded, so that the subarray [22, 33] in the array [-11, 22, 33, -44] would start at index 1 and end at index 3.
Because of the way this algorithm uses optimal substructures (the maximum subarray ending at each position is calculated in a simple way from a related but smaller and overlapping subproblem: the maximum subarray ending at the previous position) this algorithm can be viewed as a simple/trivial example of dynamic programming.
Similar problems may be posed for higher-dimensional arrays, but their solutions are more complicated; see, e.g., Takaoka (2002). Brodal & Jørgensen (2007) showed how to find the k largest subarray sums in a one-dimensional array, in the optimal time bound .
The Maximum sum k-disjoint subarrays can also be computed in the optimal time bound .
- By using a precomputed table of cumulative sums to compute the subarray sum in constant time
MaxEndingHerein Bentley (1989), and
cin Gries (1982)
MaxSoFarin Bentley (1989), and
sin Gries (1982)
- This sum is when , corresponding to the empty subarray .
In the Python code, is expressed as
x, with the index left implicit.
- While the latter modification is not mentioned by Bentley (1989), it achieves maintaining the modified loop invariant
current_sumat the beginning of the th step.
- Backurs, Arturs; Dikkala, Nishanth; Tzamos, Christos (2016), "Tight Hardness Results for Maximum Weight Rectangles", Proc. 43rd International Colloquium on Automata, Languages, and Programming: 81:1–81:13, doi:10.4230/LIPIcs.ICALP.2016.81
- Bae, Sung Eun (2007), Sequential and Parallel Algorithms for the Generalized Maximum Subarray Problem (PDF) (Ph.D. thesis), University of Canterbury.
- Bengtsson, Fredrik; Chen, Jingsen (2007), Computing maximum-scoring segments optimally (PDF) (Research report), Luleå University of Technology
- Bentley, Jon (1984), "Programming Pearls: Algorithm Design Techniques", Communications of the ACM, 27 (9): 865–873, doi:10.1145/358234.381162
- Bentley, Jon (May 1989), Programming Pearls (2nd? ed.), Reading, MA: Addison Wesley, ISBN 0-201-10331-1
- Bird, Richard S. (1989), "Algebraic Identities for Program Calculation" (PDF), The Computer Journal, 32 (2): 122–126, doi:10.1093/comjnl/32.2.122
- Brodal, Gerth Stølting; Jørgensen, Allan Grønlund (2007), "A linear time algorithm for the k maximal sums problem", Mathematical Foundations of Computer Science, Lecture Notes in Computer Science, 4708, Springer-Verlag, pp. 442–453, doi:10.1007/978-3-540-74456-6_40.
- Gries, David (1982), "A Note on the Standard Strategy for Developing Loop Invariants and Loops" (PDF), Science of Computer Programming, 2: 207–241
- Takaoka, Tadao (2002), "Efficient algorithms for the maximum subarray problem by distance matrix multiplication", Electronic Notes in Theoretical Computer Science, 61: 191–200, doi:10.1016/S1571-0661(04)00313-5.
- Tamaki, Hisao; Tokuyama, Takeshi (1998), "Algorithms for the Maximum Subarray Problem Based on Matrix Multiplication", Proceedings of the 9th Symposium on Discrete Algorithms (SODA): 446–452, retrieved November 17, 2018
- TAN, Lirong. "Maximum Contiguous Subarray Sum Problems" (PDF).[dead link]
- Mu, Shin-Cheng (2010). "The Maximum Segment Sum Problem: Its Origin, and a Derivation".
- "Notes on Maximum Subarray Problem". 2012.
- greatest subsequential sum problem on Rosetta Code
- geeksforgeeks page on Kadane's Algorithm