Into the Itanium, Part 2 - Additions Specific to IA-64
(Page 3 of 4 )
Most people are aware of how Branch Handling is done on modern CPUs. For a brief recap of that: if/else statements are used when you aren't sure which of two or more possible scenarios are going to happen. For example, something like this might show up at some point in your code:
if (a > b)
c = c + 1
else d = d*e + f
This is what's called a "control" dependency. Coming into that area, the processor doesn't know what to do. What it has to do, essentially, is guess. Now, modern CPUs like the Pentium4 have become very accurate at guessing which one is going to occur. However, when they guess wrong, a pipeline flush has to occur. And with the exceedingly long number of stages found in the Prescott core, that's a real pisser to performance.
What IA-64 does instead is turn a "control" into a "data" dependency. This is done with something call "predication." There is one separate register in the core, with each bit able to be set separately, which of course leaves 64 mini registers. Within an instruction, you can manipulate these predicate bits, and use them to make decisions. Instead of having branches and jumps dependent on performing a compare, both paths are loaded into the processor in parallel. While those are in the pipeline, the predicate bits determine which one is made use of, and which one is treated as a "NOP" or "no operation." For example, our code above instead becomes:
p1, p2 = compare (a>b)
if (p1) c = c + 1
if (p2) d = d*e + f
In that first line, if the statement is true then p1 (the predicate bit in position 1) is set to a value of 1, while p2 is set to zero. The opposite is true if the statement is false. Now the instruction is dependant on the values of p1 and p2, which can all be done in parallel, avoid jumps or branches, and best of all, avoid a pipeline stall/flush if the CPU tries to execute the wrong one.

(Click for larger image.)
Speculation is another added feature of the Itanium architecture. Control and Data speculation are a way to help hide memory transactions. Waiting on memory is a good way to kill performance. For every current processor, the speed of main memory is usually quite a bit slower than the CPU itself. So when the CPU asks for something from memory, it has to sit and wait for that data to come back before it can do anything. Since the CPU runs faster, it has to waste multiple cycles, which causes inefficiency obviously. Even when something is only in one of the higher cache levels, there is still latency involved in finding that piece of information, and then bringing it into the register file.
To get around that, Itanium allows for the compiler to shift around load commands to "hide" them earlier in the code. I mentioned above that for some bundles, the compiler simply isn't able to find enough stuff to put in parallel as far as execution is concerned, and inserts "nops" to fill up the bundle. One option is to move a "load" call from memory that happens later into that empty space. This way, when the data is needed later, instead of waiting on it, it's already available.
"Data" speculation is where a load is made before a store that originally preceded it. Since that store might have made the preloaded data "dirty," a check is needed to ensure that has not occurred. Instead of the normal code for a load where it was supposed to be, a check is inserted to see if the data is valid or not. If it's fine, you save a memory access. If the store did change the data, you go and perform a recovery load, and just don't gain any performance. A control speculation is similar, except it's used to protect loads that are involved in branches.
Next: Software Pipelining and Register Stacking >>
More Computer Processors Articles
More By DMOS