Into the Itanium, Part 2
(Page 1 of 4 )
In our last piece, you were introduced to Itanium from a business and historical standpoint. Today however, we're going to get into the nitty gritty, and show how it's different from the current x86 processor the vast majority of you are running right now.
Instruction Execution
In your normal P4 or Athlon, code comes in, gets executed and so on right? Well, there's a bit more to it than that. In a CISC (complex instruction set computer) architecture like those mentioned, that original code written in C, Java, or other languages gets turned by a compiler into code that your processor can understand natively. That's using the "instruction set," in this case x86-32, a set of commands that all make sense to the CPU on a low level. These are then passed off to it through the memory hierarchy, until they reach the front end of the processor, all in order, pretty much one at a time.
At this point, the instructions are decoded further into smaller pieces called "micro-ops" or "uops," which the hardware tries to align as much as possible in parallel, and pass off to the various execution units. They also can be executed "out of order," which is one reason why x86 processors are capable of being so blindingly quick. However, all this hardware takes up a large number of transistors and die space, as well as being extremely complicated to have function correctly and quickly. After the uops have been executed, they are all tossed together again in a "re-order buffer" to line them up sequentially the way they came in. This is all called hardware based instruction scheduling.
In Itanium, the compiler is responsible for that scheduling. Instead of having the hardware dynamically assign the order of execution, and try to figure out what can all be done at the same time in parallel by looking at dependencies and available resources, this is all done ahead of time when the program is first compiled. This puts a lot of the emphasis on the compiling software for performance of the chip.

(Click for larger image.)
Instead of the way a normal compiler works by just translating code to machine instructions, Itanium compilers take those instructions and bundle them up in groups of three, and affix a template to them to distinguish the type. Those are the standard types, such as memory access, integer, floating point, branch, and so on. By taking the time at compile time, the three instructions in the bundle are assured to avoid "Read after Write" and other such conflicts, and extract the most possible parallelism.
Next: Effects of Bundling >>
More Computer Processors Articles
More By DMOS