I added the [dubious ] tag to the following
In most cases, the programmer would choose the former method. If it was necessary to find the minimum of two integers millions of times per second, the programmer might exploit his knowledge of binary representation of integers and come up with the latter method. Since that method does not use branching, it runs faster on some processors.
Already the first part is not true, in most cases the programmer would use the MIN(x,y) macro implemented as (x<y)?x:y, or a min(x,y) procedure that would be inlined by the compiler and avoid duplicate evaluation of x,y if this would eb a concern. But the main problem is the last phrase; since so far I only know counter-examples providing evidence against the truth of the statement, so even if it is m-m-maybe conceivable that the idea could apply (and still, I could not think of a microprogram realizing that situation), if there is no commercially available CPU with this property, then I don't think it is relevant and the statement should better be removed since it would only create confusion in > 99% of the cases. For current processors as AMD and Pentium my tests provide evidence for the fact that (x<y)?x:y virtually takes no time (almost all of the time of the loop, being needed to decrement & zero test the counter and branch back), while the "complicated" solution takes over 20% of the total time.— MFH:Talk 15:59, 5 June 2008 (UTC)
In principle the non-branching method would work better because the processor cannot predict the branching outcome, and therefore wouldn't have to wait for the result of the boolean test; but since I haven't tested this I will go with what you said. Also, most processors would only have one "pipeline" for the thread that is finding the minimum of the integers, meaning that all of the information is going to have to go through at some point. Heuvelton (talk) 01:25, 22 June 2008 (UTC)
On a simple mips machine, or RISC architecture in general both can be emulated with 5 assembly mnemonics, however the second uses less branching, which means it avoids pipeline stalls and is therefore faster on average. Although in this case the branches are short so would not cause cache misses, various architectures would slow down due to this, including AMD and Intel, and a simple experiment isn't proof that it is wrong. Also, DSPs where bit twiddling is more likely would likely reveal this result. Video cards may also benefit from such manipulation. Using a standard library definition avoids the point and is a totally different problem, specifically since this is not a language specific issue. Anonymous
(the previous paragraph was not signed, my stuff starts here) Whatever the outcome, please definitelly leave the example there, as it is very informational. I program for many years but I never realized this is possible. It doesn't matter which is faster - that always depends on other factors. Ypocat (talk) 19:23, 4 September 2008 (UTC)
The results will be compiler/language & executing architecture dependent AND actual flow dependent. A good compiler/RT arch should spew out a correct imple..but that is (still) far from always true.