I had the opportunity to acquire some Enterprise hardware from a former employer. This hardware is equipment I purchased and built when I worked there, almost 10 years ago. At the time, I was trying to balance cost with performance, some of the components were not top of the line and others were performant for the day.
In all I acquired a couple LGA771 dual socket 2U systems and a 4U system with a 24 drive enclosure and LGA1366 Xeon. All systems had Adaptec 5x05Z RAID controllers with 2TB Seagate drives. The LGA1366 Xeon is/was relatively modern because it represents the first generation of the Core i series architecture. The LGA1366 E5500 Xeons have a base clock of 133Mhz with 3 memory channels, and 4.8GT/s, 5.6GT/s, or 6.4GT/s transfer rates on the QPI bus. Depending on the model number, the max memory speeds are 800Mhz, 1066Mhz, or 1333Mhz.
It’s worth discussing memory speeds a bit. The LGA1366 Xeons have some interesting memory speed behavior. The motherboard that was in the 24 drive NAS machine was a Supermicro X8SAX board. This is a Workstation class board without IPMI, but has 2 PCIe x16 slots. When the machine powers on there is an NVIDIA SLI logo displayed, it’s clear this board was targeted at dual card SLI workstations. If you installed the proper combination of Xeon CPU and memory, XMP was also enabled. This motherboard would accept up to 24GB of unbuffered ECC or non-ECC RAM, and it would run the memory at 1333Mhz with all slots populated. During system bootup the BIOS will tell you the memory is running at 1066Mhz, but when the OS starts it is actually bumped to the maximum — some Xeon boards support this.
Now let us look at Enterprise motherboards such as the X8DTN+, this has an IPMI slot [or integrated if you get the -F variant] and it has 18 DDR3 DIMM slots. That’s right, the Xeon chips were designed to access up to 3 DIMMs per memory channel, times 2 sockets per board, you could have up to 288GB of registered ECC memory in these machines, but at a cost.
I found a really excellent whitepaper by Fujitsu which sets out to quantify the limitations I’m about to lay out for you, please have a look at it here: Memory performance of Xeon 5500 (Nehalem EP) based PRIMERGY servers.
The Xeon 5500 series (LGA1366) processors allow you to access memory at 1333Mhz if you have 1 DIMM per channel (1DPC), if you have 2 DIMMS per channel (2DPC), you can access the memory at 1333Mhz on select motherboards (like the aforementioned X8SAX), however on Enterprise boards the speed drops to 1066Mhz. If you populate 3 DIMMs per channel (3DPC), the speed drops further to 800Mhz. By the numbers 800Mhz is much slower than 1333Mhz, and for some applications it matters. If you have a memory intensive application that is effectively memory bound instead of CPU or I/O bound, then the slower memory speeds will affect you more.
Raw memory speed (multi-core access) is actually better than many modern sockets like the LGA1150, LGA1151, etc, because they only have 2 memory channels and the max speed is limited by the memory technology and speed. Intel long locked DDR3 performance to 1333Mhz, or in some cases 1600Mhz.
The maximum performance of the LGA1366 socket, using 1DPC and 1333Mhz, access rate is 35.5GB/s. When you use 2DPC the throughput drops to 32.1GB/s, and when you go all out with 3DPC, performance drops to 25.5GB/s. That last figure is an effective 800Mhz DDR3 access speed. Now let’s look at a more modern enthusiast processor, the i7-4790K. This is a 4th gen Haswell processor with a base frequency of 4Ghz and single-core turbo of 4.4Ghz, the all-core turbo is 4.2Ghz. The i7-4790k is still one of the fastest per-thread processors Intel made, great for workloads that need single threaded performance. This processor has an Achilles heel and that is the dual channel DDR3 memory bus. The LGA1150 socket doesn’t have enough pins to do 3 or 4 channel memory, so these are constrained in performance. The LGA2011 is the higher TDP quad channel memory variant typically seen to extend this 4th generation architecture. Chips like the E5-2690 v3 are representative of the 4th gen LGA2011 genre. The LGA1150 socket is limited to 25.6GB/s memory throughput because of the dual channel DDR3-1333 memory, you can use DDR3-1600 memory with this socket, which will give you a slight bump, but not as much as 3 memory channels.
So, all this talk has been theoretical and based on published specs and whitepapers, why don’t I reveal some real world data that might have you deciding on an older space system vs a newish system.
My test environment is a Supermicro X8DTN+ with 2 Xeon X5570 processors and 24GB of dual rank UDIMMs (DDR3-1333Mhz 2Rx4 UDIMM). There is 1DPC, the CPUs have a 95w TDP, 133Mhz base clock, and 3.3Ghz single core turbo with 3.2Ghz all-core turbo.
The competitor is a Supermicro X10SAE with i7-4790k and 32GB of dual rank Corsair DDR3-1333Mhz memory in 2DPC configuration. The base clock is 100Mhz, 4.4Ghz single core turbo and 4.2Ghz all-core turbo, with a 95w TDP.
By the numbers the X5570 scores 5,393 on Passmark, while the 4790K scores 11,140. The raw clocks are about 25% faster on the 4790K and it is 3 generations newer. The dual Xeon board will consume about twice the power of the single i7, so power consumption will always be a loss there.
My test was simple, to compare the relative performance of the CPUs I used Handbrake to transcode an h.264 1080p video file into an equivalent h.264 1080p video file. This test accomplishes a few things at once: It loads all of the cores on the processor, it causes the processor to go to all-core turbo, it causes the TDP to saturate at maximum, and it exercises the entire motherboard.
When doing CPU intensive tests, it’s important to ensure the thermal management solution (heatsink) is capable of removing enough heat from the CPU so that it doesn’t go into thermal throttling. The passive 2U Supermicro heatsinks are rated for 95w TDP and work exceptionally well. Intel (Foxconn) produced a similar looking active cooler with 4 heat pipes, but half the fins, and that solution is incapable of properly cooling a 95w TDP. Under load the X5570 processors reached equilibrium at 75-78 degrees C. The i7-4790K has a Noctua active cooler with dual fans and never gets hotter than about 68 degrees C.
The Xeon X5570 processors were able to maintain an all-core turbo of 3.2Ghz sustained throughout the transcoding test, while the i7-4790K reached a sustained 4.2Ghz. The interesting thing to note is that the 4790K shows a CPU Max Mhz of 4.4Ghz in lscpu where the X5570 shows 2.934Ghz. It seems the rating of Enterprise and enthusiast processors is different for max frequency.
The conclusion of the test is that the dual Xeon X5570 system was able to sustain an average of 50fps during transcoding, while the i7-4790K could only muster 40fps. This same behavior was repeated when actually transcoding a raw Blu ray movie to h.264 1080p. The X5570 system has twice the threads, 3/4 the speed, a 3 generation microarchitecture handicap, 33% more memory bandwidth, and came out 20% faster than the 4th gen Haswell hotrod.
Let’s break down the costs for these systems. The Xeon X5570 CPUs are about $10-12 for a pair, the heatsinks are $24 for a pair, the motherboard was $59, and the memory would cost ~$30. The i7-4790k was $250, the cooling solution was $70 (or you could use the intel boxed cooler), the motherboard was $125, and memory cost $200 (32GB of Corsair Vengeance 1600Mhz, but only XMP and since it doesn’t have a 1600mhz JEDEC profile it’s just 1333mhz). You can buy registered ECC memory for a fraction of the cost of normal PC memory, 128GB costs from $120-$150.
In the end, if you have an EATX case, the dual Xeon X5570 system could prove to be quite the budget beater. I never added up the costs of my desktop machine before, but at $645 vs $123, I might be looking at a dual LGA2011 system for my next upgrade. The TDP of the new AMD Ryzen systems is pretty close to that of 2 Xeon E5-2690 V3 processors, but the Xeons are matched about the same on benchmarks and you could put together a system for less than the cost of the Ryzen CPU.