Memory Bandwidth And Timings - Memory Bandwidth And Timings
(Page 2 of 2 )
Memory Bandwidth And Timings
Latency
While increasing bandwidth, as seen above, is relatively easy, simply by adding more channels and widening the bus, by causing more data transfers to occur per clock, or simply by increasing the speed of the data transfer (speeding up the cars on the highway), lowering latency is not nearly as easy. Yes, that's right, we want to lower the latency all we can. Latency is defined as the time differential between when a command is given, and when that command is actually executed. In any memory read/write operation, there are many factors that add to the latency. One of the first happens when the Front Side Bus and memory controller are not running at the same clock speeds. When using any divider other than 1:1, the two devices have to wait for the clocks to match up before any signal can be sent between the two. For example, when a memory divider of 5:4 is chosen, for every 5 cycles of the FSB, the memory will only complete 4. That means that only every 5 cycles of the FSB can they actually talk to each other. So if a command is sent from the CPU to the memory requesting a read of data on the second cycle after they just matched, the request must sit there waiting 3 more cycles before that request can actually go out. This is why it is always better for memory performance to try and use a 1:1 divider over all others, and hence the proliferation of high speed modules in today's market, to match up with the faster and faster FSB's being used.
Much of the rest of the latency is picked up in the memory module itself. When a request for a certain address is sent out by the north bridge memory controller, an ACTIVE command is first sent to the memory. This is followed by the row and bank address, which cause the desired row to go from "pre-charge" to "active". This is the t RP that can be adjusted in most BIOS's. Usually this operation can be done in 2, 3 or 4 cycles. Following that is the Row Address Strobe (RAS) to Column Address Strobe (CAS) Delay or t RCD operation. This is the amount of time required to send the contents of the row to a buffer on the module. It is measured in 2, 3 or 4 clock cycles as well. After this, comes the much ballyhooed t CL or CAS Latency operation. This operation carries with it the column address, and takes 2, 2.5, or 3 clock cycles to send the contents of the now defined cell to the driver. From there the burst occurs to send the data out onto the bus. Now here is why the CAS Latency matters. If the next memory read or write is to the same row, the only latency that is incurred is the t CL to move from one cell to another. This often happens, as longer data strings typically take up multiple consecutive addresses. Once another row is needed however, a latency called t RAS is incured before the next row can be called, and the t RP starting the process over again. t RAS can be executed in 5,6,7 or 8 clock cycles. This means that if a new row is needed, as many as 12 clock cycles of NOPs (No OPeration) must be waited before the data can be sent. This is a much simplified version of what happens during a read/write operation. There are many more operations that occur, however these are the only ones that most motherboards let you alter in the BIOS.
They say a picture is worth a thousand words. I don't know if it's worth that much, but here is a diagram showing what I explained above. This is for a t RP, t RCD and t CL of 2, t RAS of 5, and burst length of 8.
(click to enlarge)
Now, that probably was a bit confusing. I hope this example will illustrate WHY it is difficult to lower latency, compared with bandwidth. Keep in mind that latency's have lowered over the past 10 years from about 120ns (nanoseconds) to around 50ns. In less time than that, bandwidth has moved from a max of about 1GB/s (theoretical) for PC133, to approximately 8GB/s (theoretical) for a dual PC4000 setup!
Think of a NASCAR team during a pit stop. The car first has to come in and stop in it's stall. Then it must be jacked up on the driver side, fuel inserted, have the tires removed and replaced, be dropped back down, everyone race to the other side, jack up the passenger side, do the same procedure with the tires, drop the car again, make any adjustments to the spoiler necessary, and remove the fuel hose before the car can leave. Now, if a crew chief has trained his crew to the point where they physically can't get any faster, what is he left to do? His only option is to remove some of the processes. Like only do the passenger side tires, or add less fuel, etc. He has to CHANGE the process. The problem with latency is it is difficult to remove any of the steps, and still retain stability. Witness Intel's issues with Performance Acceleration Technology, better known as "PAT" in the i875 boards, and many different names in i865 ones. Enabling it tends to cause stability problems in the not as fast silicon of the Springdale's, because it is removing some of the steps during memory access.
Springdale/Canterwood Divider Issues
Dividers are what allow you to run your RAM at a speed other than that of the FSB. Normally, in the ideal case, both would be identical. However, when overclocking, in many cases the CPU and it's corresponding FSB is much higher than current memory modules are capable of dealing with. So a divider is used, to augment the memory transfer speed. What it does, is use multiples of the clock. For example, in a current overclocked P4 setup, most CPU's will reach 250MHz FSB. Very few memory modules can run this speed. However, there are many PC3200 modules, and those would be able to be used with a 5:4 divider. The memory controller knows to only run at 4 clock cycles for every 5 of the FSB. With the previous 533MHz FSB CPU's, and a single channel memory controller board, you wanted to run the memory faster than the FSB to try and make up the bandwidth difference. Here 4:5, or 3:4 dividers were useful, in order to have a higher transfer speed.
As many of you know, there have been many memory incompatibilities with these chipsets, the boards that house them, and the memory used in them. The problem lies in the BIOS of the boards, and the SPD (serial presence detect) of the memory. In many cases, most well known in the ABIT IC7/IS7 series of boards, many sticks of memory refuse to run any divider besides 1:1, or will run one up to a certain FSB speed. I have the same problem in my Soltek 86SPE-L "Springdale" motherboard. Using OCZ 3700 Gold memory, I can not enable either the 5:4 or 3:2 dividers at ANY speed. The systems simply refuses to boot, regardless of what I set the timings in the bios. However, I also have two sticks of Infineon PC3200 memory, which are quite happy running any divider at any speed. To prove to myself that it wasn't a physical issue with the OCZ, I simply plugged all four modules into the board. Motherboard BIOS' simply default the timings to the slowest SPD detected, which of course was the Infineon one. With those timings loaded in, I was again capable of running any divider, which would not have been possible if it was a hardware issue with the OCZ. Unfortunately, there is no way for me to reprogram the SPD of the OCZ modules with the same one found in the OEM PC3200, and then gain access to both dividers, and the much higher speed the OCZ modules are capable of. Just to make sure that I wasn't limiting myself in the testing, I also used Kingston HyperX 3500 and Corsair XMS 3200 modules. The Corsair would run 5:4, up to about 250MHz on the FSB, but not 3:2. The HyperX exhibited the same behaviour as the OCZ, and would not run either divider. Putting them in with the Infineon, the dividers were no longer an issue.
I believe this is the same problem found in the ABIT boards. The only solution is to keep for an updated BIOS that solves the incompatibility issue. It seems that the ASUS and EPoX series of i865/i875 have less problems, and unless you are willing to test many modules, would be the better choice for overclocking using the 5:4 or 3:2 dividers. 1:1 though, there are no longer any known issues with the ABIT mobo's.
Conclusion
I hope that this article has helped to explain some of the questions surrounding memory in current computer systems. It's another topic that generates a lot of confusion for people just getting into computer hardware, and is not typically explained as well as other ones. It's not that hard of a topic, when you relate it to reasonable anecdotes. If you have any questions, flames or props about this article (and we're SURE you will), head on over to the Forums, or email me.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |