Into the Itanium, Part 3 - Introducing Montecito
(Page 4 of 4 )

As you can make out, there's two cores there, as well as a boatload worth of cache. On top of that, a form of Hyper-Threading has also entered the picture within each specific core. While it's not the best of diagrams, you can see that the functional units in each individual core have not changed much from the diagrams further back in the article, only a relocation of the bus interfaces, and the huge amount of surrounding cache added to the die.
Meet Montecito. The idea is to build this monster on the 90nm process, with 1.7 BILLION transistors. That's going to take up nearly 580mm square of prime silicon real-estate. As can be expected, it's not going to be cheap to buy. But what about to run? Itanium has never been known to be particularly efficient power wise thanks to all the functional units, and putting that on the currently rather leaky 90nm process is asking for a large appetite for watts.
Not so, sayeth Intel. Their "Power Management," as dictated in the diagram, is supposedly going to keep this thing in the 100W area. Outright, that seems like an impossible goal, since its Madison 9M predecessor has a max power use of 130W, with only one core. The addition of cache isn't a concern; those don't affect your power budget much on their own. Intel seems confident, however, that a judicious use of voltage reduction and clock manipulation will be sufficient to accomplish the task. The interesting part is that Foxtron can work on a local scale, not just globally as previous power management tools such as SpeedStep do. This means that if certain parts of the processor are not being used (say the integer units and cache), it can clock and volt them down, while increasing those of the FP units to dynamically keep within a given power envelope.
The threading part is interesting. Unlike the P4 design, Itanium is usually very good at keeping its units working away, assuming that the compiler did its job and found a large amount of parallelism in the code. I can only assume this is instead to offset the bus deficiencies that two cores on one die are going to incur. Need something from memory, but the other core has control of the bus to the outside world? Switch to another thread, and work from there instead. With copious amounts of register space, the ability to dynamically reassign the register stack, and of course all that cache, each core should be able to switch off to another thread while buffering the memory request. This way, when the new thread locks on a memory dependency, you can go back to the one you previously left, without much penalty, because of the Itanium's rather short pipeline, and just quickly rename the registers to get back to where you were previously without having to load anything.
So, this concludes our look at Intel's big iron architecture. It's a vast departure from x86, and that's a good thing. x86 has become a mess; it's like a Christmas tree that a bunch of people, both young and old, keep trying to hang things on, without removing previous decorations. This is new, different, and a vast improvement from the time a programmer first writes the code until it gets executed. Especially when you look at how it can handle going "wide," it's a much better solution for future dual processing needs. And from SUN to IBM to AMD to now Intel, that IS going to be the future.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |