Memory speed vs capacity

Memory speed isn’t often a consideration when building a system except for those seeking ultimate Overclocking performance. While OC memory exceeds the JEDEC standards, there are other considerations which may rob you of maximum performance.

I will discuss memory technologies ranging from DDR2 FB-DIMM to modern DDR4 ECC memory and how CPU memory controller limitations affect the actual performance you can expect. The TL;DR is that when you add more DIMMs per channel or more ranks, the memory frequency goes down.

What is a Rank?

In the simplest terms, a memory rank is a bank of memory chips. DIMMs store data 64bits at a time, however memory chips are only 8 or 16 bits wide, so multiple chips are combined to serve the 64bit wide memory bus.

DIMMS are designated as Single, Dual, or Quad rank, representing 1, 2, or 4 banks of chips on a single DIMM. Each rank is one bank of memory chips, how many ranks are on a DIMM depends on the density of the memory chip.

A bank of memory chips creates an electrical load, this load is measured as capacitance at the speeds memory chips operate. The higher capacitance, the slower the maximum speed. That means that the more ranks you have per DIMM, the slower the maximum possible speed that RAM can run at.

Buffered and Unbuffered RAM

When purchasing RAM, you are confronted with purchasing Buffered or Unbuffered memory. Buffered memory is generally designated with an R in the part number, indicating that it is Registered memory. This doesn’t mean the memory is registered with some central authority, it means that there are buffers called registers that buffer the Address and Command signals on the DIMM. Note that registered memory does not buffer the data signals!

Unbuffered memory, also called UDIMMs, comes in both ECC and non-ECC, but unlike Registered memory there are no buffers on the Address and Command signals. ECC memory is not typically compatible with desktop processors, however that rule can be broken in some cases if you have Server or Workstation grade motherboards. AMD has unofficial support for Unbuffered ECC memory with Ryzen processors, the Threadripper line of HEDT processors officially support ECC memory. Unbuffered memory is typically limited to 2 DIMMs per memory channel.

There are 2 common types of DIMMs that buffer the data signals, FB-DIMM and LR-DIMM. FB stands for Fully Buffered, this means that the Address, Data, and Control signals pass through a buffer chip. FB-DIMMs also use a serial communication scheme as well as performing ECC on the commands, not just the data. LR-DIMMs are “load reduced” DIMMs, they add data buffering and behave like registered DIMMs. LR-DIMMS are used to achieve added capacity by reducing the effective load of the DIMM. A Quad rank LR-DIMM behaves like a Single or Dual rank DIMM, effectively doubling your maximum memory capacity.

DIMMs per channel (DPC)

DPC is the term used to describe how many physical memory modules are installed per memory channel. The number of DIMMs per channel is largely what drives the maximum memory speed you can achieve. The frequency your memory runs at is dictated largely by the number of DIMMs per channel, while the maximum capacity is often dictated by the total number of memory ranks per DIMM. As you will see later, this isn’t always true for AMD processors.

Intel memory speeds

Intel CPUs support 2, 3, or 4 DIMMS per channel depending on the socket and CPU generation. Motherboards with 3DPC are usually reserved for Server use, while Workstation motherboards typical offer 2DPC. Some server motherboards only offer 1DPC in an effort to save space, energy, and reduce heat output.

The Intel E5500/E5600 series processors have 3 memory channels and support up to 3 DIMMs per channel, that means a dual socket system can have up to 18 DIMM slots. Populating all of these slots will cost you memory speed however. A typical dual socket system can have up to 288GB of memory when using 16GB LR-DIMMs.

There are 3 memory speeds that the E5500/5600 CPUs can run at: 1333Mhz, 1066Mhz, and 800Mhz. To further complicate things, Intel offers processors in this series with a 1333Mhz, 1066Mhz, and 800Mhz bus speed, you will see why in a minute.

The ratio of DPC and memory speed is 1:1 with E55xx processors: 1DPC is 1333Mhz, 2DPC is 1066Mhz, and 3DPC is 800Mhz. The E56xx processors are also limited to 800Mhz at 3DPC, however they have a trick up their sleeve: they support 1333Mhz with 2DPC!

There are some enthusiast motherboards that will support 1333Mhz with 2DPC and an E55xx processor, but those are the exception rather than the rule. There is 1 further exception: LR-DIMMs will run at 1066Mhz whether you have 1, 2, or 3 DPC, so there is a small uplift with LR-DIMM memory over RDIMM memory when using 3 DPC.

Later generation Intel microarchitectures suffer less derating because they will support 2DPC at fully speed. Once you exceed 1333Mhz memory clock, derating comes back into play with 1866Mhz being derated to 1600Mhz and sometimes 1600Mhz being derated to 1333Mhz.

Rank derating

Most of our focus has been on DPC derating because Intel primarily focuses on the number of DPC to determine memory speed. We saw E56xx ease this restriction slightly, but while the real issue is memory ranks per channel, that isn’t often take into account, except for high density modules.

DDR3 memory density is mostly limited by the memory chips that are available. Manufactures only make DDR3 chips up to a given density, as systems moved towards DDR4, chip densities increased and allowed us to break the 16GB barrier.

The largest DDR3 modules are typically 16GB, these are represented electrically as 2 8GB DIMM modules in a single socket, thus they are all Quad Rank DIMMs. Normal QR DIMMs are limited to just 2DPC because the memory busses have a maximum load of 8 ranks. Practically this means a single CPU can address up to 96GB of RAM without resorting to LR-DIMMs. This also means that you are limited to 1066Mhz because there are 2DPC. If you use LR-DIMMs the maximum capacity becomes a whopping 144GB of memory per CPU and you get the bonus of 1066Mhz with 3DPC.

Exotic Server Hardware

You may now be asking yourself “but what about those 1TB VMWare servers I’ve seen that guy on YouTube talk about?” Well, that’s an entirely valid question and the answer centers around specialist (exotic) hardware. Intel produces an E7 series of processors that are made for 2, 4, and 8 way systems. If you have 4 processors with 8 DIMM slots and you use 64GB LR-DIMMs, you can achieve 2TB of RAM at 1066Mhz. These types of systems use the LGA1567 socket and have 4 memory channels. Later E7 processors support even larger densities and allow for very large aggregate memory capacities. There are tradeoffs made to have high core counts and memory capacities, accessing all of that memory over a myriad of NUMA interlinks, combined with relatively slow per-core clocks, means they are not hotrods! Your application design must be NUMA aware to take advantage of large memory machines. These are typically seen as database servers or VMWare clusters, serving to consolidate hundreds of physical servers into virtual servers.

AMD

I’ve been discussing Intel Xeon hardware extensively, but a brand new AMD Ryzen processor has limitations on memory speed. The Ryzen series can address up to 128GB of RAM in 2 memory channels, with 2DPC of 32GB. The speed of the Ryzen memory bus is derated based on rank loading rather than DPC like Intel. If you have 1DPC, the memory bus can run up to 3200Mhz, however when you have 2DPC, the speed is dependent on whether you have single or dual rank DIMMs. For single rank you are limited to 2933Mhz and with dual rank that drops to 2666Mhz.

When I built my new desktop machine I purchased 2666Mhz RAM for the system because I intended to upgrade it to 128GB at some point in the future. Since dual rank RAM with 4 slots can only run at 2666Mhz, I didn’t buy more memory than I could use. The price differential between ECC 2666Mhz RAM and ECC 3200Mhz RAM was over $100. As luck would have it, what I actually got was 2933Mhz “e” RAM that was labeled as 2666Mhz, so I “overclocked” my memory to get the rated spec! “e” RAM is 1 timing tier faster than typical JEDEC standards, so your RAS/CAS timings are reduced by 1 tier for that frequency.

Conclusion

Whether you are dealing with old server hardware or brand new hotrod desktop hardware, memory loading and capacity will play a factor in how fast you can access that memory. If you understand the rules and plan carefully, you can bias towards capacity or speed while also not overspending on performance you can’t use.

If you plan on loading up all memory channels on a server, and the maximum speed of the memory is 1066Mhz, then buying a Xeon CPU with a 1066Mhz bus speed can save you some dollars, but if you want to favor performance over capacity, you can forgo some memory to realize the highest access speeds.

Aside

About 10 years ago I worked for a company that had a mishmash of hardware and virtual machines. The first thing I did was build a Supermicro LGA775 server to consolidate all of these VMs and PMs onto. I built 2 of these machines and named them yin and yang, they both had 32GB of RAM. A few years later and I wanted to upgrade yin to have 64GB, because I wanted to have a little more headroom. I repeated my last order for 32GB of RAM from the same vendor, it arrived and I populated the server, but only 32GB of RAM was seen. Supermicro published conflicting data about the max memory capacity for this board, in one place it said 32GB and another said 64GB. I chalked this up to a BIOS flashing issue, and since this was a production machine, I didn’t want to tackle a BIOS upgrade (and potential bricking -> downtime) if I didn’t have to, so I left this exercise to a later date. One thing leads to another and I switched jobs before I could resolve the memory capacity issue. Fast forward a few years and I acquired all of those machines from my former employer. 2Ghz LGA775 machines are not exactly speedy today, but I did have the opportunity to figure out the memory issue. It turns out that the vendor had shipped me Quad Rank memory instead of Dual Rank, and since you can only have 8 ranks per channel it refused to see that 32GB of RAM. Ultimately I swapped the QR memory into yang, interleaving 2 DIMMs per channel, then put the full 64GB of DR memory into yin. It was this experience that sent me down the memory rabbit hole.

Leave a Reply

Your email address will not be published. Required fields are marked *