Talk:Sieve of Eratosthenes: Difference between revisions
Jitse Niesen (talk | contribs) add reference saying that complexity is O(N log log N) |
|||
(3 intermediate revisions by one other user not shown) | |||
Line 109: | Line 109: | ||
break; |
break; |
||
} |
} |
||
} |
}} |
||
} |
} |
||
Functional closed circuit case.<br /> |
Functional closed circuit case.<br /> |
||
Line 232: | Line 232: | ||
:As I wrote, my implementation (which is not fast compared to several other) of the Sieve of Eratosthenes runs around 7 times faster than '''your''' implementation. I did not compare it to your references. The timings are on one core of my 2.4 GHz Core 2 Duo, where your code takes 3.4s to 10^8, and mine takes 0.47s. You cannot expect me to time it on the same hardware as you happen to have, where your code apparently takes 10s. 7 times faster is an approximately constant factor due to optimization details of the essentially same algorithm. It's not asymptotically faster but it appears you don't understand the concept of asymptotic time complexity as expressed with [[big-O notation]]. You point to many references but those I and Jitse have examined don't support your claims. You question my work and integrity on sieving. I will let my record speak for itself: My prime sieves, usually based on Eratosthenes' idea, have set over 100 computational prime records. [http://hjem.get2net.dk/jka/math/myrecords.htm] The primes are published on the web at the links and you are welcome to verify them (preferably not with your own software). I have spent hours on this discussion with no result and will try to focus on more productive editing now. But I will continue to watch the article and look out for original research. [[User:PrimeHunter|PrimeHunter]] 00:51, 14 May 2007 (UTC) |
:As I wrote, my implementation (which is not fast compared to several other) of the Sieve of Eratosthenes runs around 7 times faster than '''your''' implementation. I did not compare it to your references. The timings are on one core of my 2.4 GHz Core 2 Duo, where your code takes 3.4s to 10^8, and mine takes 0.47s. You cannot expect me to time it on the same hardware as you happen to have, where your code apparently takes 10s. 7 times faster is an approximately constant factor due to optimization details of the essentially same algorithm. It's not asymptotically faster but it appears you don't understand the concept of asymptotic time complexity as expressed with [[big-O notation]]. You point to many references but those I and Jitse have examined don't support your claims. You question my work and integrity on sieving. I will let my record speak for itself: My prime sieves, usually based on Eratosthenes' idea, have set over 100 computational prime records. [http://hjem.get2net.dk/jka/math/myrecords.htm] The primes are published on the web at the links and you are welcome to verify them (preferably not with your own software). I have spent hours on this discussion with no result and will try to focus on more productive editing now. But I will continue to watch the article and look out for original research. [[User:PrimeHunter|PrimeHunter]] 00:51, 14 May 2007 (UTC) |
||
::Harry G. Mairson, Some new upper bounds on the generation of prime numbers, ''Communications of the ACM'' '''20'''(9):664-669, 1977, {{doi|10.1145/62065.62072}}, states that the time complexity of a standard implementation of the Sieve of Eratosthenes is O(N log log N) (look at the very start of the second page). None of the links provided by Ausples say anything on the alternative algorithm that Ausples shows, therefore, we can't add to the article. -- [[User:Jitse Niesen|Jitse Niesen]] ([[User talk:Jitse Niesen|talk]]) 01:31, 14 May 2007 (UTC) |
::Harry G. Mairson, Some new upper bounds on the generation of prime numbers, ''Communications of the ACM'' '''20'''(9):664-669, 1977, {{doi|10.1145/62065.62072}}, states that the time complexity of a standard implementation of the Sieve of Eratosthenes is O(N log log N) (look at the very start of the second page). None of the links provided by Ausples say anything on the alternative algorithm that Ausples shows, therefore, we can't add to the article. -- [[User:Jitse Niesen|Jitse Niesen]] ([[User talk:Jitse Niesen|talk]]) 01:31, 14 May 2007 (UTC) |
||
"}" added. Thank you for your references, and academic approach. I will spend time researching this further. Thank for your feedback.--[[User:Ausples|Ausples]] 04:39, 14 May 2007 (UTC) |
Revision as of 04:39, 14 May 2007
Apparently:
KOΣKINON EPATOΣΘENOϒΣ. or, The Sieve of Eratosthenes. Being an Account of His Method of Finding All the Prime Numbers, by the Rev. Samuel Horsley, F. R. S. Samuel Horsley Philosophical Transactions (1683-1775), Vol. 62. (1772), pp. 327-347.
Is the paper that popularized this algorithm, if anyone wants to fill in historical details.
--Imran 21:15 13 Jun 2003 (UTC)
If a person wants fully working code, they can check code repositories. If a person wants performance hacks, they can check programmer's journals. Wikipedia is neither a code repository nor a programmer's journal.
Pseudocode is used on Wikipedia as a last resort to explain an algorithm, because most programming languages are not readable by common encyclopedia readers. There is also no excuse for repeating the fairly understandable explanation of the algorithm with pseudocode. Feel free, however, to insert external links to either. For an example, see Sieve of Atkin, which, while it does not have a good explanation yet, does have an external link to working, optimized code. — 131.230.74.194 21:28, 11 July 2005 (UTC)
I added a more simplified pseudocode -- ba 24.57.157.81 06:52, 5 March 2007 (UTC)
"A Reduced Redundant Exclusions (RRE) Prime Sieve"
The main reason for the inefficiency of the Sieve of Eratosthenes is simply that it spends much time excluding non-primes which have already been excluded. This is not so apparent in the range 1- 100 where it is usually demonstrated. But a quick check of prime densities: 25 upto 100; 168 upto 1,000; 78,498 upto 1,000,000 and 50,847,534 upto 1,000,000,000 makes it obvious that a prime sieve that reduces the number of these redundant exclusions will go much faster. I was unable to check the Atkins Sieve due to it's complexity but a simple variation of the Sieve of Eratosthenes will vastly reduce redundant exclusions if not eliminate them altogether:
Step 1 Sequentially write down the integers from 2 to m
Step2 Cross out all numbers > 2; which are products of 2 and numbers in the table >=2
Step 3 Find the smallest remaining number >2; it is 3
Step 4 Cross out all numbers > 3; which are products of 3 and un-crossed out numbers, in the table, >=3
Step 5 Find the smallest remaining number >3; it is 5
Step 6 Cross out all numbers > 5; which are products of 5 and un-crossed out numbers, in the table, >=5
Step 7 Continue while " the smallest remaining number" squared is =< m
— Preceding unsigned comment added by Rhnmcl (talk • contribs)
- I'm sorry, but I don't see any difference with the algorithm in the article, except for redundancy in your description (all numbers >n are >=n). Qwertyus 12:46, 17 June 2006 (UTC)
Consider step 4 (for example); " the algorythm in the article " will cross out { 6,9,12,15,18, etc ); where 6,12,18 are redundant exclusions....they have already been crossed out ! Whereas step 4 (if read as I intend) will cross out only ( 9, 15, 21, 27, 33 etc} — Preceding unsigned comment added by Rhnmcl (talk • contribs)
- The article already states: The crossing-off of multiples can be started at the square of the number, as lower multiples have already been crossed out in previous steps. This saves a lot of time, particularly for large numbers. Qwertyus 15:30, 19 June 2006 (UTC)
A good first step is to disregard all multiples of two: that is, the sieve represents only odd numbers. This is easy enough, because the pattern is just as simple as for consecutive numbers, you just have to be careful ablout the starting point for striking out every third, fifth, seventh, etc. Having the sieve elements not represent multiples of three as well as more difficult, and matters worsen if even more primes are not represented. Possibly this is what the fancier method involves for a certain collection of primes to be skipped. Another ploy is to choose the seive length to be the product of the first few primes, say N, with the intention of producing a batch of primes over 0 to N - 1, then start again with N to 2N - 1 for a second batch, and so on for however many batches you wish (since in actual computers, sieve arrays cannot be of indefinite length). The point of this is that the initial pattern of the sieve is the same and so does not have to be prepared more than once. But one can do better...
No redundant exclusions
Nevertheless, it is irksome to sieve away and strike out elements that have previously been struck out when seiving with a smaller prime. This can be handled. So far, prime production is via division (test factorisation), and addition (seiving). Now comes prime production via multiplication, relying on the fundamental theorem of arithmetic that every number has a unique prime factorisation. The plan therefore is to generate all products of primes, and strike out the corresponding element of the seive: there will be one strike only on each composite number. Imagine a sieve of some large size. Start a process say "Breed" with two; it generates all powers of two and naturally, strikes them from the sieve up to the limit of the sieve. Further, for each power of two (being 1 to whatever) it takes that number and commands a Breed with that and three (the next higher prime) and five, and seven, and so forth until the multiple exceeds the seive length. Likewise start a basic Breed with three to generate all powers of three and further with those powers and the next primes up, five, seven, etc. And likewise a basic Breed with five, and so on, until the first level would start beyond the length of the seive. In this scheme it is helpful to schedule matters so that the smaller composite numbers are generated first so that the needed additional primes for the process can be extracted from the seive as it is being adjusted.
As for running speeds, the administration is much more complex. Clearly the multiplicative process requires time slightly less than linear with respect to the highest number N, being one action for every composite number, of which there are N less the number of primes up to N, or N - (N/Ln(N)) approximately, whereas the additive sieving method requires one pass for every prime less than the square root of N (very slightly less), each pass of which requires N/prime(i) actions, that is N/2 actions for 2, N/3 for 3 and so on; the summation of 1/p(i) for i = 1 to n, such that p(n + 1)**2 > N. The summation of 1/p(i) has been studied (by Hardie, I recall), but I lack the reference just now. Whichever, the action count is proportional to N*(summation of 1/p(i)), or N*(something that increases with N), which is clearly more than linear.
Supposing that multiplication and addition take about the same time (analysis of the processes say they have the same complexity, but offer little advice on methods to perform the processes), accordingly, for ever larger N, the multiplicative generation of composite numbers should eventually outrun the additive generation with redundancy. NickyMcLean 22:13, 18 June 2006 (UTC)
Apart from the computational advantaages of a "zero redundant exclusions" prime sieve; which remain to be established. There is the theoretical possibility that such a sieve could be used to establish an expression for the number of compound numbers less than n; and hense the number of primes less than n, say p; from p = n - c;regards rhnmcl
A Deterministic Boolean Approach
The following algorithm code has runtime of O(N−P). This means the range we are checking for prime numbers minus the number of primes found (there are no calculations for finding a prime number here, only true/false checks, primality was determined based on previous factors).
In computer science you can use boolean checks to create the Sieve of Eratosthenes that runs in sub-linear O(N-P) memory operations using an boolean array location for each prime number in N, as follows:
O(100000000);
O(N) {
int S = N; int p = 0; //this will always be a prime number only. Pre-Deterministic. int d = 0; //this will always be a factor of a prime number only. Deterministic. bool[] primeCheck = new bool[S]; //create a boolean array for the length to be checked for primes 0->N. for (p=2; p<S; p++) { switch(primeCheck[p]) { case true: continue; case false: for (d=p+p; d < S; d+=p) { switch(primeCheck[d]) { case true: continue; case false: primeCheck[d] = true; break; } //trace(p); //p will be the NEXT prime number. break; } }}
}
Functional closed circuit case.
Closed Circuit Time Testing
6/01/2006 4:19:20 PM
Scope(O): 100,000,000
Primes Found(P): 5,761,455
Memory Operations(N): 94,238,545 (Actual CPU runtime operations) (Time Complexity. Processor memory operations)
Time(Actual):
hours:0
minutes:0
seconds:10
milliseconds:843
6/01/2006 4:19:31 PM
As you can see, the code finds 5.7 million primes in 100 million possible choices in about 10 seconds. Linear? Try it yourself.
This code means that it is possible to run the sieve with sublinear O(N-P) operations in a computer without using special data structures but simply a boolean array. Modifying the "for" loops to skip 2's does not improve the time complexity.
At first glance, it would seem this algorithm is of complexity O(N^2) or simply N^2, as they are nested loops. However, the sieve logic pre-determines what the computer will or will not operate on at runtime. The computer algorithm will only operate on memory once for each location and does not operate on prime number locations (but may optionally output at runtime). This gives us a sublinear O(N-P) where N is the length the computer is checking for primes and P is the number of primes visited. This algorithm is of course restricted to finding primes that can be accounted for as the maximum number of booleans in an array you are allowed to initialize by the code compiler. There is no conventional math in this code. Even though it works. :-)
Here's why this works in O(N-P)... At no point in this code does the computer calculate. We are only checking true/false decision gates, which is a free processor check for the algorithm, costing the algorithm "0" operations. Since we are concerned with primes, we are concerned with location based on rules, just like calculus. The rules of the sieve can be found in the article. We have determined all prime locations for this stream N by knowing what is not prime, and telling the loop to visit the next prime skipping non-primes (determined by a "true"), saving a calculation and or operation. So remember, this is a computer science algorithm with runtime complexity O(N-P). It works as a clever solution. :-)
Bibliography:
A1. Computer Science: An Overview by J. Glenn Brookshear, 1997 Addison Wesley publishing company. 0.3 The Evolution of Computer Science, p. 9 (Conceptual Theory)
B1. The World Treasury of Physics, Astronomy, and Mathematics: Edited by Timothy Ferris, Time Warner Book Group. Part Three: The Cosmos of Numbers::Artificial Intelligence and All That:::”The Computer and the Brain” by John Von Neuman 1958. p.483. (Full excerpt: p478-491) (Application Theory)
B2. The World Treasury of Physics, Astronomy, and Mathematics: Edited by Timothy Ferris, Time Warner Book Group. Part Three: The Cosmos of Numbers::Artificial Intelligence and All That:::”Can a Machine Think?” by Alan Turning ~1950. p.495-496. (Full excerpt: p492-519) (Application Theory)
C1. Algorithmics: The Spirit of Computing by David Harel, 1987~1996 Addison Wesley publishing company. 6.3 Order of Magnitude Improvements. p. 130-131. (Conceptual Theory)
D1. The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory by Brian Greene, 1999 Vintage Books. The Super In Superstrings. p. 166,167 (System~Model Conceptual Theory)
Keep in mind this is a clever computer science solution, not a conventional one. --Ausples 16:04, 11 May 2007 (UTC)
- O(N-P) is not sublinear, but linear: O(N-P) = O(N - N/ln N) = O(N). -- Jitse Niesen (talk) 16:09, 11 May 2007 (UTC)
- Below comments of the form "<-- (...)" were inserted by Ausples. PrimeHunter 18:04, 11 May 2007 (UTC)
- Your code makes the assignment "primeCheck[d] = true" when the first prime factor of the composite d is found. There are N-P composites, so the instruction is performed N-P times. As Jitse Niesen notes, O(N-P) = O(N). See Big-O notation.
- But reading memory is also a memory operation <-- (no, it's a true/false gate. it does not change the memory, no operations.), and it takes time (at least one clock cycle and usually more) <-- (this is a processor speed issue, of course the processor checks; it takes about a millisecond. But it checks free of operations. It is pre-determined. Memory operations take longer. This is not an algorithm issue, but a processor one). So all evaluations of primeCheck[d] in your "switch(primeCheck[d])" must also be counted. <-- (no. see above) The total number in the nested loop is above linear. <-- (yes. it is N.) There does not exist an implementation of the Sieve of Eratosthenes which is sublinear or linear in total memory operations or time. <-- (see code above. Memory size used is N. The array of booleans being checked for primality. There are two loops that check it in conjunction.) It can easily be modified to use sublinear memory size but that is another issue. <-- (please supply code for this. An algorithm could not be found easily by me to reduce the size of the array. It must be N. One of the conditions for this algorithm to work, is that it must be able to account for all positions in memory N, because the loops must work in conjunction with each other to create the check system the algorithm uses to identify prime locations in the boolean array.) Also note Wikipedia:No original research. Even if you think you have a sublinear algorithm, you are not allowed to add that claim to an article unless it has been supported by a reliable source. <-- (Agreed. We will say it is linear until it is fully verified as sub-linear). By the way, I have implemented the Sieve of Eratosthenes at least a dozen times and know it well. The Sieve of Atkin is asymptotically faster.(<-- This algorithm uses slopes to help determine a prime location. This algorithm requires calculation. This assumes we're trying to find a number, not a location in memory. Above we are finding logical primes, AKA the memory register number. You're trying to find just prime numbers. We're going a step further and finding the location corresponding to a number. We are limited here only by available registers. Remember, primes are derived. They are part of an algorithm, not an equation. The algorithm guarantees that a number will be prime as it is pre-determined. Atkin's does mathematical tests for primality. Above code does not.) PrimeHunter 16:37, 11 May 2007 (UTC)
- Thank you for the feedback. Please define sub-linear in terms of time complexity. This claim of sub-linear is based on the following:
We know it runs N. It does not operate on Prime number locations, therefore, it runs N-P. As above. We do not know until the end of N how many exist in P, so we know that it moves faster than N operations. If it moves faster than N, but the variable P is not known until the end of runtime, the algorithm time complexity is reduced by the factor P, so it is N-P, which is less than linear for some factor which equals the total number of primes found in N. How do we say that in Mathematics? I hope this is clearer. I am ready to concede that it is not sub-linear with further explanation of what we mean by sub-linear, as we already agree this algorithm is not more than linear. :-/ --Ausples 17:24, 11 May 2007 (UTC)
- Please don't delete talk page comments by others (unless they are vandalism, or deletion is allowed by a specific Wikipedia guideline). Write your own comments below and not inside the comments of others.
- The "segmented" Sieve of Eratosthenes (you can Google it) uses sublinear memory. People often omit to say "segmented". There are often claims that the Sieve of Eratosthenes is linear in time (also in Wikipedia), but it's actually slower. However, it's so close that the difference doesn't matter in practice at the sizes computers can handle. The sieve detects each composite d for each prime factor below sqrt(d). And the average number of prime factors is log (log N) which grows very slowly.
- Apparently we have different definitions of "memory operations", but the important issue is time, and we agree that memory reads take time. Sublinear time means o(N) time. (Note the little o, it's not O(N)). See Big-O notation for definitions. PrimeHunter 18:04, 11 May 2007 (UTC)
- You can test how many memory operations by incrementing a counter, once after a line of code that sets a value in memory. e.g. "x=3". or in the case above, "primeCheck[d] = true". Although, intereseting to note-> Asymptotic Time Complexity: The limiting behavior of the execution time of an algorithm when the size of the problem goes to infinity. This is usually denoted in big-O notation. [1]. The above code runs asymptotic, so I guess it might be sub-linear after all?
So then based on Computational_complexity_theory.
This algorithm would have following complexities:
Time Complexity: O(N-P) ~Here we are using memory operations time.
Space Complexity: O(N) ~Here would be the boolean array in bits.
Circuit Complexity: O(N) ~How long it takes the loops to work in conjunction to determine primality for N.
So we could say overall, this algorithm/codebase has runtime O(N), or O(N-P) however, it is still linear either way. Would that be correct?
Note1: The "continue" statement forces the processor to move on before any operations occur.
Note2: I see no practical test code for a sub-linear sieve. We were talking about time complexity. How fast a processor moves is arbitrary to the algorithm. You can only determine primality one way, it's calculus, they are derived from a set of instructions. It is pre-determined. There is no known equation that produces primes, nor tests for primality 100% accurate when primes are assumed. This can only be accomplished through an algorithm, following the instructions of the algorithm designer. This is exclusively a Sieve of Eratosthenes.
Note 3: The algorithm starts slow, then at N/2, it will visit only primes, making it run at O(P). So log(log N) does not apply for the duration of the algorithm. Only the first half. It still gets done in N operations. So for N=100000000 it will take the time it takes the processor to operate on N-P bits of memory to determine primality. On 2.79 Ghz, about 10 seconds + the misellaneous milliseconds for the gate checks. But that does not interfere with the time complexity. Just the actual amount of time it will physically take to produce prime numbers. Run the code and output to the screen. Start with about 100,000,000 and watch the primes stream out. You will see the difference in approach here.
--Ausples 18:38, 11 May 2007 (UTC)
- Circuit complexity is not about time. Time complexity is the asymptotic total number of individual steps needed to solve a problem. It's not limited to memory writes. And as I said, it's not linear for the Sieve of Eratosthenes. It's actually O(N*log log N) for primes up to N. Counting steps or measuring time in examples with specific N can only allow guessing of the asymptotic behaviour. The algorithm must be analyzed mathematically to prove the complexity. Other qualified mathematicians have already done that, and the log log n term is real. There does not exist a constant k such that the total number of steps is below k*N for all N. This means the algorithm is not linear.
- Category:Primality tests mentions several algorithms which can prove primality without testing potential prime factors, but that is unrelated to the Sieve of Eratosthenes. PrimeHunter 22:10, 11 May 2007 (UTC)
- okay. we said above it was linear, now it's not. Which is it? And why? so far, there has occured a lot of double chat. It sounds like you're not sure. Can we find another *I* on this? Jitse says it is not sub-linear, but it is linear. Prime Hunter agrees, but no longer agrees. Anyone else wanna stab at it? — Preceding unsigned comment added by 76.105.207.61 (talk)
- Jitse only noted that O(N-P) = O(N) is linear and not sublinear, and I agreed. None of us claimed it was the time complexity of the Sieve of Eratosthenes. PrimeHunter 23:34, 11 May 2007 (UTC)
- Jitse only noted that O(N-P) = O(N) is linear and not sublinear, and I agreed. None of us claimed it was the time complexity of the Sieve of Eratosthenes. PrimeHunter 23:34, 11 May 2007 (UTC)
- Computer science algorithm time complexities-> [2]. Test the code; runs O(N), "running in at most N steps on inputs of length N exists." It seems like that sums it up. For computer science, this code has algorithm time complexity O(N-P) processor memory operations. Which according to the books below, is how we guage time complexity in computer science. (as above):
- Computer science algorithm time complexities-> [2]. Test the code; runs O(N), "running in at most N steps on inputs of length N exists." It seems like that sums it up. For computer science, this code has algorithm time complexity O(N-P) processor memory operations. Which according to the books below, is how we guage time complexity in computer science. (as above):
A1. Computer Science: An Overview by J. Glenn Brookshear, 1997 Addison Wesley publishing company. 0.3 The Evolution of Computer Science, p. 9 (Conceptual Theory)
C1. Algorithmics: The Spirit of Computing by David Harel, 1987~1996 Addison Wesley publishing company. 6.3 Order of Magnitude Improvements. p. 130-131. (Conceptual Theory)
The Sieve of Eratosthenes is an NP complete problem. [3] so far, this is the only known sieve that satisfies the computer science definition for time complexity as per references above in N. Could another computer scientist explain further? Or explain why this is not O(N) algorithm time complexity based on computer science, not in math?
--Ausples 01:38, 12 May 2007 (UTC)
- I seriously doubt any of the mentioned books use your time complexity definition which does not indicate the time to run the algorithm. An algorithm with eN iterations of a loop but no memory writes would have time complexity 0 by your definition. NP-complete problems are a class of decision problems. The sieve of Eratosthenes is an algorithm and not a decision problem. It makes no sense to say it's NP-complete or even discuss it. Your reference [4] says nothing about the Sieve of Eratosthenes. PrimeHunter 02:47, 12 May 2007 (UTC)
- It's all in the memory assignment. When you set X=3. That is one memory operation. At runtime, the algorithm sets N-P bits of memory according to Eratosthenes instruction to determine primality. What is "0" complexity is the time to check whether the space is true or false. The computer does that for free because it is not a memory operation, but a free check.. Like I said, a clever solution. Not a conventional one. This is very practical in game logic. It writes to memory once and only once for each memory location that does not correspond to a prime number. You can compare against time complexity of other known solutions. --76.105.207.61 04:52, 12 May 2007 (UTC)
- Checking for true or false takes time so I would not call it "free". I don't think you have a reliable source (or any source) which says that it can be ignored in time complexity. If I run a program, I care about how long it takes, and not about how the time is split between memory reads, memory writes, boolean checks, and so on. PrimeHunter 13:03, 12 May 2007 (UTC)
- Those are college text books and fit the requirements for reliable sources. Here, Wikipedia says that PRIMES are in NP or coNP. Primality_test#Complexity. We are still not concerned with the actual time in seconds, as loop counters are constant time. They factor into the time complexity as c*O(N) which makes them arbitrary (although, this algorithm's actual time so far is less than any other known sieve, probably because it's not factoring. It is determined at runtime without calculation (i.e. factoring integers... 7%7=0 makes 7 prime, etc). One could not apply the current conventional rules to this algorithm. This has little to do with math. It's all boolean logic. This algorithm follows Eratosthenes requirements listed in the 3rd paragraph here -> [5]. This also is a reliable source. So far, we find this is pretty general knowledge and known. The time complexity is correct for memory operations. The loops are constants. So, I believe that it is safe to say that the algorithm/code above has runtime complexity of C*O(N). We can drop the C, since C is constant, and say O(N). ->[6]. I would suggest testing the code above against other sieves and let me know. I've already done these tests. One might have to detach themselves from conventional thoughts about the Sieve of Eratosthenes. He was pretty clear about the requirements, and gave us instructions on how to complete it in linear time (N). The code above literally follows his instructions, which are also found here from NSIT [7]. Most current implementations use factoring to determine primality. Again, we are pre-determining the prime factors at runtime above, as per the written user requirements. For comparison consideration, here is a square root implementation (so far, fastest published pre-deterministic sieve implementation) that has been tested against the above code. The following referenced algorithm comes from an Advanced Artificial Intelligence class. The above code uses less operations and is actually faster. Read Algorithm Implementation 1 here.->[8]. This square sieve is also pre-deterministic, uses an initialization loop, a nested loop, and another loop to display the primes. It cannot report at runtime (and you couldn't say this runs at N+N^2+N, it runs much faster). in the code above, we visit prime locations (and may report them) at runtime, in N. Again, there are conditions. You must start at 2, and you must have memory size of N. There is no mathematical factoring. Boolean logic only.
Note 1: If you want it to go faster, just add processors to reduce actual runtime to N/number of processors. This is basic parallel computing.
--Ausples 04:28, 13 May 2007 (UTC)
- The only thing important for Wikipedia is whether there is a reliable source stating whether the algorithm you show runs in linear time.
- The question whether the algorithm does in reality run in linear time may be interesting, but it is not of relevance for Wikipedia. We all agree that the line primeCheck[d] = true is executed N−P times. We disagree about whether the line switch(primeCheck[d]) is free, as you claim; where in what text book did you find this? What computational model are you using? And we haven't even talked about the instruction d+=p . -- Jitse Niesen (talk) 06:23, 13 May 2007 (UTC)
- Switches are conditional control structures set up by the compiler. This is a decision gate and is constant in N. Therefore, it does not interfere with time complexity. Second paragraph and Conditional section for if/else(applies to switches) -> [9]. Again, a switch is not an operation, but compiler control structures set up to reduce algorithm operational runtime and improve actual time results. Switches check for true and false. Calculations and memory checks are done by the processor free of memory operations (as per control structures). Memory operations that require the computer to change memory are 1 in complexity. These are also called statements. We make N-P statements throughout the algorithm. Our worst case time complexity could therefore be said to be O(N-P). As for d+=p, you have to control the loop in order to produce the desired result of sieving out the next non-prime, before the location is checked by the control structures, as it has been determined that "p" is a prime location based on the rules of the algorithm written by Eratosthenes. Further, using "continue" tells the compiler to move to the next location before any operation/calculation is made by the processor. This reduces the number of operations/calculations to N-P. The algorithm knows a location is prime by virtue of the fact it will be the next location it visits in the outer loop. Please ask more questions. A graph of actual test data using real time output (i.e. Sec/Millisec), will still produce a linear result. --Ausples 10:42, 13 May 2007 (UTC)
- As the link you gave says, not only statements incur a cost but expressions do too. There is a cost associated with the expression primeCheck[d] in the switch statement; O(1) according to the lecture notes. So they're not free. -- Jitse Niesen (talk) 11:03, 13 May 2007 (UTC)
- Exactly. And that O(1) expression is evaluated O(n*log log n) times in total, so the time complexity (following conventional definitions) is at least that. Ausples wrote: "One could not apply the current conventional rules to this algorithm". That sums up the main problem. Ausples is in WP:OR territory and Wikipedia does not allow that. By the way, as one who has implemented and seen the sieve many times, I can say with confidence that Ausples' implementation is relatively slow in actual running time compared to many other implementations. PrimeHunter 11:39, 13 May 2007 (UTC)
- I believe that if you plot the result, it looks linear. It's hard to see the difference between O(N) and O(N log N) and O(N log log N), and in practice there is little difference. -- Jitse Niesen (talk) 11:24, 13 May 2007 (UTC)
- As the link you gave says, not only statements incur a cost but expressions do too. There is a cost associated with the expression primeCheck[d] in the switch statement; O(1) according to the lecture notes. So they're not free. -- Jitse Niesen (talk) 11:03, 13 May 2007 (UTC)
My statement->"One could not apply the current conventional rules to this algorithm"--retracted. You can certainly apply conventional computer science rules here, as this algorithm follows those standards (see above references). I have already plotted this. It is N. Prime Hunter wrote: "I can say with confidence that Ausples' implementation is relatively slow in actual running time compared to many other implementations." What other implementations? And when did you run the code above? This research has been done several times over. Here is your hard data(where is yours?):
This was done with a 1GHz processor. (It could be done with a 1MHz processor, it's still going to come out as N). You will want to plot starting at N=1,000,000(1M) as the computer does not spend any actual time in milliseconds finding primes until N=1,000,000. (And apparently finds 1,624,528 primes when N=26,000,0000 in about 2 and a half seconds. At higher ranges, it is still consistent with N)
|N (range 0->N)| | |Time in Milliseconds| | |Primes Found| | |Total Processor Memory Operations (N-P)| |
1000 | 0 | 169 | 830 |
10000 | 0 | 1230 | 8769 |
100000 | 0 | 9593 | 90406 |
1000000 | 46 | 78499 | 921500 |
2000000 | 171 | 148934 | 1851065 |
3000000 | 265 | 216817 | 2783182 |
4000000 | 343 | 283147 | 3716852 |
5000000 | 484 | 348514 | 4651485 |
6000000 | 562 | 412850 | 5587149 |
7000000 | 656 | 476649 | 6523350 |
8000000 | 750 | 539778 | 7460221 |
9000000 | 906 | 602490 | 8397509 |
10000000 | 921 | 664580 | 9335419 |
11000000 | 1015 | 726518 | 10273481 |
12000000 | 1156 | 788061 | 11211938 |
13000000 | 1250 | 849253 | 12150746 |
14000000 | 1406 | 910078 | 13089921 |
15000000 | 1468 | 970705 | 14029294 |
16000000 | 1578 | 1031131 | 14968868 |
17000000 | 1671 | 1091315 | 15908684 |
18000000 | 1718 | 1151368 | 16848631 |
19000000 | 1859 | 1211051 | 17788948 |
20000000 | 1921 | 1270608 | 18729391 |
21000000 | 2015 | 1329944 | 19670055 |
22000000 | 2140 | 1389262 | 20610737 |
23000000 | 2234 | 1448222 | 21551777 |
24000000 | 2375 | 1507123 | 22492876 |
25000000 | 2421 | 1565928 | 23432071 |
26000000 | 2531 | 1624528 | 24375471 |
and on and on |
You can get more results for 100M->1xxM and the result will be the same. Linear. Slight millisecond variance (if any) is due to OS restrictions on processor time for multi-threading OS background tasks. This is common on most operating systems. You can test this up to the limit the OS will allocate for a boolean array. It will still be a linear result. I can draw the graph for you, or you can plot it in a spreadsheet yourself. It's O(N-P). Actual runtime indicates it, actual operation count proves it. The graph is a straight line. There is no logN anywhere in the above code. You are guessing. And please provide references for your refutation, otherwise this cannot be challenged. I do not understand terms like: "I think", "I believe", "I feel", they are not scientific terms. Please break down the time complexity using computer science based references. This has been done by myself throughout this discussion. Where is your reference for refutation? There is nothing original about this algorithm. It is based purely on material from educational sources and the user requirements of Eratosthenes. This is a clever solution to the Sieve of Eratosthenes, nothing more, nothing less. It applies here and only here. You are more than welcome to test this yourself. Remember, this sieve is not exclusively a math problem. It's an algorithm, not an equation. It is a deterministic way to find prime locations (not necessarily numbers). If I were to "believe", I believe that this is what Eratosthenes had in mind, since it runs O(N-P) to the computer, and matches exactly his user requirements. --Ausples 13:35, 13 May 2007 (UTC)
- As I have said, asymptotic time complexity must be proved by analyzing the algorithm mathematically. It can never be proved by running test cases. As MathWorld says [10], the average number below N has O(log log N) prime factors. That log log N gets into the real time complexity. This is really outside the scope of a talk page which is intended for discussing improvements to the article, but I will compare actual timings to my code. I debugged (the last "break" must be removed) and ran Ausples' C code:
- 5761455 primes up to 100000000 computed in 3.41 seconds.
- An old C implementation by me with same compiler and cpu:
- 5761455 primes up to 100000000 computed in 0.47 seconds (7 times faster).
- My implementation is the segmented Sieve of Eratosthenes and works to 10^12 with a couple MB ram. Many people have made faster implementations than me. Use Google to find some. By the way, the mentioned bug means that Ausples' program, as displayed above and previously inserted in the article, actually is linear, but it's not the Sieve of Eratosthenes, and it does not compute primes. I have spent far too much time on this discussion and may stop now. As long as you don't put original research into the article, we don't have to agree. PrimeHunter 17:56, 13 May 2007 (UTC)
Please provide references. Plenty of references have been provided by myself above, and more will be provided based on questions that must be answered about the algorithm. What you have failed to provide is an intelligent(references) counter argument. Further, no breaks need be removed from this code. The last break belongs to the false case in the first switch statement, this is not a bug. In N=100M (as you claim above producing primes in .47 seconds), we compute that around ~10 seconds using the algorithm above, not 3.4 seconds. Actual time is a technological limitation, not an algorithmic one, therefore arbitrary. What is important is that over several controlled tests that the algorithm produces N. You are also claiming that your algorithm would go faster than the above mentioned "square sieve", (or the quadratic sieve for that matter) cited in that masters level artificial intelligence class ->[11] (reference AGAIN provided for convenience). Are we to believe that you have an algorithm that runs faster 7 times faster than a masters class in artificial intelligence, without any references? Call me skeptical on that note. ;-) Please cite your sources. If you do not provide references for your claims, they must be false because they would then be based in ego("I"), not reality. We are made to "believe", which again is not scientific. Are you hiding facts that don't exist? Further, if the code above is not producing prime numbers at runtime, then what is it doing? That trace line is a console output. "p" is always a prime number in this algorithm. This can be proven by adding the console output line. Which, was apparently not done, as Prime Hunter says, "but it's not the Sieve of Eratosthenes, and it does not compute primes." Right, it's pre-determined; the outer loop knows that the next location it visits will be a prime location. And it does in fact meet the user requirements set out by Eratosthenes. We covered that. But at least we agree it runs linear O(N-P). As for your citation above, this applies outside the bounds of computer science in "factoring" primes. The Sieve of Eratosthenes is concerned with the NEXT prime, not factoring them. Read the requirements again (see above), and use a historical context this time. There is no mention of factoring anywhere. It's all about how you get to the NEXT prime (last I checked in calculus, a prime is a derivative location represented by a linear slope for an equation.). This is not a matter of agreeing. This is the Sieve of Eratosthenes (It is based on his user requirements). It is a hard fact. --Ausples 20:02, 13 May 2007 (UTC)
- Check your above code which you also inserted in the article. It contains 5 '{' and only 4 '}'. Your indentation clearly indicates the missing '}' should be at the end. The last break does not belong to a switch as you claim. It jumps out of the inner for loop after a single iteration, and that is certainly wrong. The output for N=50 if your alleged primes p are printed is:
- 2 3 5 7 8 9 11 12 13 15 17 19 20 21 23
- If a '}' is instead added in a strange (in view of indentation) place before the last break, then that would fix your bug without having to delete the break.
- As I wrote, my implementation (which is not fast compared to several other) of the Sieve of Eratosthenes runs around 7 times faster than your implementation. I did not compare it to your references. The timings are on one core of my 2.4 GHz Core 2 Duo, where your code takes 3.4s to 10^8, and mine takes 0.47s. You cannot expect me to time it on the same hardware as you happen to have, where your code apparently takes 10s. 7 times faster is an approximately constant factor due to optimization details of the essentially same algorithm. It's not asymptotically faster but it appears you don't understand the concept of asymptotic time complexity as expressed with big-O notation. You point to many references but those I and Jitse have examined don't support your claims. You question my work and integrity on sieving. I will let my record speak for itself: My prime sieves, usually based on Eratosthenes' idea, have set over 100 computational prime records. [12] The primes are published on the web at the links and you are welcome to verify them (preferably not with your own software). I have spent hours on this discussion with no result and will try to focus on more productive editing now. But I will continue to watch the article and look out for original research. PrimeHunter 00:51, 14 May 2007 (UTC)
- Harry G. Mairson, Some new upper bounds on the generation of prime numbers, Communications of the ACM 20(9):664-669, 1977, doi:10.1145/62065.62072, states that the time complexity of a standard implementation of the Sieve of Eratosthenes is O(N log log N) (look at the very start of the second page). None of the links provided by Ausples say anything on the alternative algorithm that Ausples shows, therefore, we can't add to the article. -- Jitse Niesen (talk) 01:31, 14 May 2007 (UTC)
"}" added. Thank you for your references, and academic approach. I will spend time researching this further. Thank for your feedback.--Ausples 04:39, 14 May 2007 (UTC)