Computer Processors

  Home arrow Computer Processors arrow Page 2 - Into the Itanium, Part 2
Watch our Tech Videos 
Dev Hardware Forums 
Computer Cases  
Computer Processors  
Computer Systems  
Digital Cameras  
Flat Panels  
Hardware Guides  
Hardware News  
Input Devices  
Mobile Devices  
Networking Hardware  
PC Cooling  
PC Speakers  
Power Supply Units  
Sound Cards  
Storage Devices  
Tech Interviews  
User Experiences  
Video Cards  
Weekly Newsletter
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us 
Contact Us 
Site Map 
Privacy Policy 
  >>> SIGN UP!  
  Lost Password? 

Into the Itanium, Part 2
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 2 stars2 stars2 stars2 stars2 stars / 15

    Table of Contents:
  • Into the Itanium, Part 2
  • Effects of Bundling
  • Additions Specific to IA-64
  • Software Pipelining and Register Stacking

  • Rate this Article: Poor Best 
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article


    Into the Itanium, Part 2 - Effects of Bundling

    (Page 2 of 4 )


    The compiler is essentially creating a record of execution; the hardware is merely a playback device, the equivalent of a DVD player for example. The focus here is parallelism, hence the use of three instruction bundles. The Itanium architecture is actually capable of handling more than that; it's able to dispatch two bundles, or six instructions worth per cycle if the compiler can find that many pieces to fit together. Otherwise, "no-ops" are added in to fill out the bundle, and a "stop" added to the template to show that bundle should be executed without waiting for the next one after it.

    Into the Itanium Part 2
    (Click for larger image.)

    When the front end receives a bundle, it then takes the instructions and distributes them across the available units. In the Itanium, there's a lot of those with which to work. For general purpose registers, where data has to come and go through before being worked on, there are a massive 128 to work with in the programmer model. This is as opposed to the meager 8 GPRs in x86-32. Itanium also possesses 128 floating point registers, 128 application registers, and 8 branch registers.

    All those registers are necessary for two reasons. One is to allow for all the code to execute in parallel without fighting for resources, and to allow more data to sit internal to the CPU, reducing calls to the cache and memory, avoiding the latency involved in such operations. For operations that are sent to the execution core, there are many possible places to dispatch to. The original Itanium's execution core houses four (with two ports) integer arithmetic logic units (ALU), two floating point units (FPU), and three branch units. It can execute two memory operations, and theoretically all of these could be pumping in any given cycle. That's a lot of hardware.

    Into the Itanium Part 2
    (Click for larger image.)

    In the current Itanium2 revision, there are 11 issue ports, created by adding two more multimedia/integer ports. The corresponding execution hardware has also been increased, with a total of six MM/I execution units. The memory interaction was also bumped up, in that it can now do two loads and two stores per clock, instead of the previous two loads or two stores, but not both. These were all added after the original Merced chip was found to have weak integer and memory performance compared to it's astounding floating point capabilities.

    By comparison, a P4 is a very "narrow" processor. Instructions come in more or less one at a time, and instead of being sent "wide" like in the Itanium, are put into a long pipeline where there can be a couple of separate uops in each stage, depending on available execution units in the out of order execution core, and what can be shifted around without breaking any dependencies. The Itanium possesses a pipeline only 1/3 the length of the current Prescott iteration of the P4, much closer to the length found in the Pentium3 or Athlon core. This is one reason why the P4 must reach insane clock speeds in order to kick out decent performance, compared to the Itanium2 which runs at a maximum of 1.5GHz. By going "wide," the EPIC architecture simply gets much more done in each one of those clock cycles.

    More Computer Processors Articles
    More By DMOS

    blog comments powered by Disqus


    - Intel Unveils Itanium 9500 Processors
    - Intel`s Ultra-Quick i5 and i7 Processors Ava...
    - Intel Nehalem
    - VIA Nano
    - Intel Atom
    - Intel Celeron 420
    - Intel Pentium E2140
    - Inside the Machine by Jon Stokes
    - Chip History from 1970 to Today
    - A Brief History of Chips
    - Intel Shows Off at Developer Forum
    - Core 2 Quadro Review
    - Core Concepts
    - AMD Takes on Intel with AM2 and HT
    - Intel Presler 955: Benchmarking the First 65...

    Developer Shed Affiliates


    © 2003-2019 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap
    KEITHLEE2zdeconfigurator/configs/INFUSIONSOFT_OVERLAY.phpzdeconfigurator/configs/ OFFLOADING INFUSIONSOFTLOADING INFUSIONSOFT 1debug:overlay status: OFF
    overlay not displayed overlay cookie defined: TI_CAMPAIGN_1012_D OVERLAY COOKIE set:
    status off