An army of Xeon cores to do your bidding
44 Cores
I continuously explore options for cost-effective (or just plain cheap) parallel computing rigs, and while next-generation, cutting edge hardware is always interesting, I find that retired options from yesteryear can also show potential when their significantly lower cost is part of the overall assessment. Retrofitting a retired Dell workstation with high-core-count CPUs and the maximum allowable RAM, I built a 44-core compute behemoth for less than $600 to run Monte Carlo [1] simulations. Let me dive into the details!
Bill of Materials
Table 1 details my hardware configuration. I found a refurbished Dell Precision T7190 workstation [2] on eBay in nearly perfect cosmetic condition with a motherboard sporting two LGA 2011-3 processor sockets – which were both vacant (Figure 1). The stock power supply is rated at 1,300W, more than sufficient for this project, but alas, one of the CPU heat sinks was missing. The description promised no ventilation shrouds or disks, but the unit came with four hard disks, one DVD-ROM drive, and all the air shrouds, making this a happy purchase ($159).
Table 1
Shockwave Compute Server Specs
Component | Spec |
---|---|
Chassis and motherboard | Dell Precision Workstation T7910 |
Power | 1,300W |
CPU | 2x Intel Xeon Gold E5-2699 V4, 22 cores, 2.4GHz, 55MB of cache, LGA 2011-3 |
GPU, NPU | n/a* |
Memory | 256GB DDR4 ECC PC4-19200 2,400MHz |
Storage | 4x3.5-inch drive bays, slimline optical drive, LSI SAS 3008 12Gbps SAS (6Gbps SATA) |
Networking | Intel I217 and I210 Gigabit Ethernet controllers, remote wake-up |
Video | NVIDIA Quadro |
HDMI, DP | |
*Not applicable. |
After temporarily installing a 10-core Xeon from the parts archive and flashing the BIOS to its latest revision with Dell's very reasonable bootable tooling [3] [4], I was able to install two newly procured CPUs, which are Intel Xeon Gold E5-2699 v4 CPUs running at 2.4GHz and each sporting 22 cores and 55MB of cache memory [5]. Fortunately, I had a second heat sink and fan on hand (Buying a new one would have cost nearly as much as the workstation itself!). This purchase set me back $250 for two CPUs, which were engineering samples (ES) verified to run validation tests reliably at speed. Unfortunately, the second socket also came with a bent pin, which sent me on a two-week wild-side quest troubleshooting the CPUs and its memory banks until I located it, cleaned it, and very delicately and patiently bent it back into its original position. (See the "Fixing an LGA 2011-3 Socket" box.)
Fixing an LGA 2011-3 Socket
Identifying the issue with the CPU socket required some investigative work of the bisection variety: installing one CPU, then installing the other (both worked), then installing half of the RAM, then the other half (the second test failed), then continuing to divide this failed half until I identified the pair of DIMMs that were not working. However, the DIMMs themselves were working (swapped with another pair). Connecting this picture back to the CPU pin was fortuitous: As I was re-seating a heat sink, I noticed some thermal paste out of place, and when I removed the CPU, I found thermal paste in the socket – not a good thing, even when things are working. I washed the thermal paste out with 70 percent isopropyl alcohol loaded in a Waterpik-type device I sourced on AliExpress for $20. Another $20 went to Amazon for a handheld USB microscope [12] to examine the damaged area (Figure 2). Patient use of an index card and tweezers enabled me to rectify the failure. The bent pin controlled the affected DIMM banks.
Total Recall
Sixteen empty DIMM memory slots stared at me asking for attention. Raiding my lab's archive, I found eight perfectly suitable modules already in my possession (16GB DDR4 error correction code (ECC) PC4-19200 2,400MHz, exceeding spec). A little bargain hunting led me to find another eight modules on Amazon ($182 in total) with equivalent specs manufactured by SK Hynix [6]. Collectively, the 16 modules combine to provide 256GB of RAM, half of the maximum that could be installed without resorting to more expensive load-reduced (LR)DIMM, which in turn maxes out at 1TB. The current design provides almost 6GB of RAM per core, and I retain the option to budget another $2,000 to quadruple that amount if a workload is found needing it – a very reasonable compromise.
I completed the setup with this newfangled technology called software – Ubuntu 23.10 "Mantic Minotaur" providing the operating system layer, with the distribution including all the necessary message-passing interface (MPI) and parallel processing tools one may want. The btop
[7] tool is everyone's new favorite in-terminal system monitor, and it provides a first look at the completed system (Figure 3). Note the inclusion of core temperatures.
I already discussed BashTop in a previous issue [8], but today I shall focus on just one aspect of its CPU view: What happens when all those cores heat up? The system idles at 121W once booted up, so I will drive up the temperature with the matrix
option of the stress-ng
tool [9], an ominous stressor known for its Intel CPU heating properties [10]:
stress-ng --matrix 0 -t 1m --tz --times
The zero count syntax requests one stressor to run on each CPU core in the system: --times
generates statistics on userland and kernel time, and the --tz
option includes CPU temperature data where available. The CPU clock ran up from a cozy 1.2GHz to 2.1GHz across the board, with all cores pegged at 100 percent, eventually reaching a perfect 44 load average [11] (Figure 4). Temperature did not exceed 72C at the hottest sensor (good cooling on Dell's part), but the power draw tripled, rising to 367W (Figure 5). The power-hungry nature of the beast needs to factor in any cost calculation as much as, if not more than, the hardware cost itself.
The 22 cores (44 threads) of each CPU could turbo boost up to 3.6GHz individually, but it is more interesting in this case to note that 2.4GHz is the maximum speed they can all accelerate to concurrently.
At the time of this writing, an Amazon AWS m7g.12xlarge
instance with 48 virtual cores and only
192GB of RAM will cost almost $2/hr ($1.9584, US East, on-demand pricing), so you could think of this new machine as costing 12-1/2 days (300 hours) of AWS compute. Not bad!
Infos
- The Monte Carlo method: https://en.wikipedia.org/wiki/Monte_Carlo_method
- Dell Precision T7910 workstation: https://i.dell.com/sites/doccontent/shared-content/data-sheets/en/Documents/Dell-Precision-Tower-7000-Series-7910-Spec-Sheet.pdf
- Dell BIOS updates: https://www.dell.com/support/kbdoc/en-us/000124211/dell-bios-updates
- Dell bootable DDDP: https://www.dell.com/support/kbdoc/en-us/000145519/how-to-create-a-bootable-usb-flash-drive-using-dell-diagnostic-deployment-package-dddp
- Intel Ark: Xeon E5-2699 v4: https://ark.intel.com/content/www/us/en/ark/products/91317/intel-xeon-processor-e5-2699-v4-55m-cache-2-20-ghz.html
- Hynix 16GB DDR4 PC4-19200 2,400MHz ECC REG DIMM: https://www.amazon.com/gp/product/B01N6O511Z/
- btm(1) man page: https://manpages.ubuntu.com/manpages/noble/en/man1/btop.1.html
- "Next-generation terminal UI tools" by Federico Lucifredi, ADMIN , issue 64, 2021, https://www.admin-magazine.com/Archive/2021/64/Next-generation-terminal-UI-tools
- stress-ng by Colin King: https://manpages.ubuntu.com/manpages/jammy/man1/stress-ng.1.html
- "Creating load for fun and profit" by Federico Lucifredi, ADMIN , issue 75, 2023, https://www.admin-magazine.com/Archive/2023/75/Creating-load-for-fun-and-profit
- "Law of Averages" by Federico Lucifredi, ADMIN , issue 11, 2012
- Low-cost USB microscope: https://www.amazon.com/gp/product/B06WD843ZM
Buy this article as PDF
(incl. VAT)