Talk:Very long instruction word

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated C-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.


Does the VLIW architecture solve branch prediction problems? The article implies that it does, but as far as I know, VLIW does not help at all in that regard.

Sort of. Most VLIW processors will never make an incorrect branch prediction. Not because the branch-prediction hardware is supernaturally intelligent, but because it doesn't *have* branch-prediction hardware. Branch prediction has been moved into the compiler. Ordinary superscalar machines predict branches to keep all their functional units busy. VLIW machines, on the other hand, rely on the compiler to explicitly tell the processor exactly what every functional unit is doing at any instant -- all packed into a single instruction (the Very Long Instruction Word).
In particular, when a program has a if-then-else, normal superscalar machines will guess whether the condition is true or false and start speculatively executing the appropriate instructions. If it later finds out it guessed wrong, it cancels the effect of all those instructions and starts the other side from the beginning.
A compiler for a VLIW machine will schedule the instructions for the "true" side of the condition into some of the functional units, and the "false" side of the condition into other functional units, so both sides get executed simultaneously. Later the compiler explicitly schedules instructions that cancel the effect of the "wrong" side.
--DavidCary 05:44, 20 Jul 2004 (UTC)
David, you're half right.  :-) What you describe is "if-conversion," and is intended to eliminate short branches by replacing them with compiler-controlled speculation. On machines with many "predicate registers" (the registers that hold the conditions that determine whether the instructions execute), you can have many independent streams of execution scheduled in parallel. Scheduling algorithms such as trace scheduling rely on this heavily.
That doesn't mean there are no branches, nor that they might be candidates for hardware prediction. Branch prediction addresses the inherent pipeline latency involved in a branch. There are some number of cycles between when the machine first computes the condition that controls a conditional branch and when the branch target instructions arrive at the functional units. Flushing the pipeline is the simplest and least-performant strategy for handling these branches. Many VLIWs (such as the one I'm most familiar with, TI's C6x family of DSPs) expose the branch delay, allowing code to execute in those slots. The code may be code that logically appeared before the branch prior to instruction scheduling, or some mixture of "fall-thru" vs. "branch-target" code, suitably predicated. Still others variations on VLIW, such as Intel's EPIC, rely on branch prediction and static prediction hints to attempt to eliminate stalls due to the branch delay.
The exposed delay slot case is an interesting case, because it leaves everything to the compiler. In many cases, the compiler can fill the delay slots of the branch with code that resides at the branch target--a practice sometimes referred to as branch delay slot stuffing. There are some very important cases where the compiler cannot do this easily: Function calls and returns. If the compiler has available the text of the called function, it may be able to pull portions of the target function's code into the call's delay slots. On the return path, however, unless the compiler can prove that the function always returns to the same place (which may be true in the case of tail-call optimization), it has no text available to pull into those delay slots.
So, to sum up, I'd say that it's not fair at all to say that VLIWs have "solved branch prediction" or that "they don't need/have it," but rather have made the problem "different."
--Mr z 23:24, 16 May 2006 (UTC)
On reading David's segment a little closer, I should provide a minor mea culpa: There are two common ways that I am aware of to implement if-conversion: Speculative computation with fixup code, and predicated execution. In the "fixup-code" case, you might have a conditional move instruction or similar that commits the speculatively computed number to its final destination. In the "predication" case, you have an additional field in the instruction opcode or similar that specifies that the instruction is conditional based on some condition register (also known as a predicate register). Predication effectively adds an "if (cond)" in front of the instruction.
CPUs such as TI's C6000 DSP and Intel's EPIC implement predication. Other machines (including x86--a non-VLIW CPU) implement conditional moves. Predication is usually more general, since it can be applied to memory references as well as normal computation.
--Mr z 23:40, 16 May 2006 (UTC)

This page is missing any mention of Cydrome, the other company that was pioneering VLIW concepts. Dyl 21:38, Aug 14, 2004 (UTC)

Since a Multiflow article was recently created, it might be better to move company specific information there, leaving the VLIW article a little more general. As I previously mentioned, Cydrome was a company pioneering VLIW concepts in the same timeframe as Multiflow, but there's no mention of that in the article. Dyl 21:29, August 9, 2005 (UTC)

I think it could make sense to capitalize the title, as it is usually seen. Nicolas1981 20:37, 30 January 2006 (UTC)

Ugh... I wish I had more time. I would love to step back, refactor this article, and write a more comprehensive survey of VLIW architectures and related technologies. I have been involved with TI's C6000 DSP architecture development over the last several years, and I feel I have concrete expertise to contribute in this space. So far, I've made a few edits around the edges. Perhaps I'll find time to make more later. --Mr z 01:22, 17 May 2006 (UTC)

This article really could use a rewrite. Anyway, Pizzadeliveryboy added "also known as static superscalar or compile-time superscalar" right upfront. I'm taking the libery of removing this for several reasons. (1) "static superscalar" is an all-but-trademarked term for the TigerSharc VLIW, and one doesn't hear "compiled superscalar". One could make up lots of good terms, many better than VLIW, but that doesn't mean they're used. (2) More importantly, these terms make VLIW sound like a variant of superscalar--just as one wouldn't say "dynamic VLIW" for superscalar. It's its own thing, with similarities and differences. --Josh 18:53, 6 June 2006 (UTC)

Someone changed VLIW to stand for Very Large Instruction Word. It's long, not large. See the term as initially defined in 1983 ISCA paper. Some people do say "large", but it is incorrect. Really, this article could use a total rewrite. People have been making in incrementally better in the Wiki way, but, man, this needs a fresh start. I'll try to get to it sometime.

Benoit.dinechin 19:16, 10 March 2007 (UTC): I think reorganization of the article should make the following points clear:

  • VLIW are historically the first compiler-friendly statically scheduled processors. Compiler friendliness was the support of trace scheduling for the Trace machines and the support of modulo scheduling for the Cydrome machines.
  • A reliable check to know if we have a VLIW architecture is whether the machine can do a register swap in one cycle by using a bundle like: MOVE r2 = r1 || MOVE r1 = r2. (Suggested by S. Freudenberger.)
  • Other statically scheduled processors include horizontal microcoded machines (AP-120, FPS-164), some RISC without register interlocks like MIPS and i860, traditional DSPs like the TI C5xxx series, EPIC like IA64, and more exotic machines like Transport Triggered Architectures (TTA).
  • Trace scheduling compiler support in Trace-like VLIWs included register-register architecture, partial predication (select instructions), multiway branches, and front-door / back-door memory interfaces. Someone from Multiflow should also confirm that predicated store instructions were planned for addition.
  • Modulo scheduling compiler support in Cydrome-like VLIWs included register-register architecture, full predication, rotating registers, and programmable memory latency.
  • Clustered VLIWs originate from the Trace-like VLIWs. These include nowdays TI C6K machines and the Lx.
  • Rotating register files and full predication as seen in EPIC originate from Cydrome-like VLIWs, the intermediate step being the HP-PlayDoh.
  • EPIC is more than a Cydrome-like VLIW. Specifically, it adds memory hierarchy hints, data speculation support (advanced loads), and a bundle structure that can encode sequentiality. So EPIC is more than VLIW (but not better IMHO).
  • There could be a short discussion of trace scheduling and modulo scheduling and IF-conversion as key techniques for effective VLIW compilation. It should be mentioned that modern compilers for VLIWs combine the best of these three techniques, in addition to loop dependence analysis and loop restructuring techniques inherited from vector computing. See for instance Open64, or Ruttenberg's modulo scheduler in MIPSPRO.

The rotating register files were in fact introduced by the AP120 / FPS164, based on architecture designed by Glenn Culler in 1972. One could argue these machines are the direct ancestors of the modern Transport Triggered Architectures. Also, modulo scheduling was first productized in the FPS-164 Fortran compiler (Touzeau CC'1984 paper). The traditional DSPs are also directly inspired from the AP120.

The first two paragraphs of this article are very confusing, because it begins by talking about what is not VLIW. Only in the third paragraph does it describe what VLIW is. I realize the author is trying to explain it by contrasting it to other technologies, but for an introduction to the topic its very difficult to comprehend. I had to read the first paragraph several times, gave up, read the second paragraph several times, gave up, moved to the third paragraph, then finally understood what the article was saying. The third paragraph should be first, in my opinion. (talk) 13:48, 22 June 2010 (UTC)

Michael Flynn in Computer Architecture [1] describes VLIW as SISD. There are no sources in article indicating that VLIW should be MIMD. Can anyone provide sources for VLIW being MIMD? (talk) 07:52, 4 June 2013 (UTC)

  1. ^