Lead Image © Lucy Baldwin, 123RF.com

Lead Image © Lucy Baldwin, 123RF.com

An army of Xeon cores to do your bidding

44 Cores

Article from ADMIN 79/2024
By
Explore low-cost parallel computing.

I continuously explore options for cost-effective (or just plain cheap) parallel computing rigs, and while next-generation, cutting edge hardware is always interesting, I find that retired options from yesteryear can also show potential when their significantly lower cost is part of the overall assessment. Retrofitting a retired Dell workstation with high-core-count CPUs and the maximum allowable RAM, I built a 44-core compute behemoth for less than $600 to run Monte Carlo [1] simulations. Let me dive into the details!

Bill of Materials

Table 1 details my hardware configuration. I found a refurbished Dell Precision T7190 workstation [2] on eBay in nearly perfect cosmetic condition with a motherboard sporting two LGA 2011-3 processor sockets – which were both vacant (Figure  1). The stock power supply is rated at 1,300W, more than sufficient for this project, but alas, one of the CPU heat sinks was missing. The description promised no ventilation shrouds or disks, but the unit came with four hard disks, one DVD-ROM drive, and all the air shrouds, making this a happy purchase ($159).

Table 1

Shockwave Compute Server Specs

Component Spec
Chassis and motherboard Dell Precision Workstation T7910
Power 1,300W
CPU 2x Intel Xeon Gold E5-2699 V4, 22 cores, 2.4GHz, 55MB of cache, LGA 2011-3
GPU, NPU n/a*
Memory 256GB DDR4 ECC PC4-19200 2,400MHz
Storage 4x3.5-inch drive bays, slimline optical drive, LSI SAS 3008 12Gbps SAS (6Gbps SATA)
Networking Intel I217 and I210 Gigabit Ethernet controllers, remote wake-up
Video NVIDIA Quadro
  HDMI, DP
*Not applicable.
Figure 1: Inside view of the system before the build-out. Note how sockets are protected by plastic shielding plates.

After temporarily installing a 10-core Xeon from the parts archive and flashing the BIOS to its latest revision with Dell's very reasonable bootable tooling [3] [4], I was able to install two newly procured CPUs, which are Intel Xeon Gold E5-2699 v4 CPUs running at 2.4GHz and each sporting 22 cores and 55MB of cache memory [5]. Fortunately, I had a second heat sink and fan on hand (Buying a new one would have cost nearly as much as the workstation itself!). This purchase set me back $250 for two CPUs, which were engineering samples (ES) verified to run validation tests reliably at speed. Unfortunately, the second socket also came with a bent pin, which sent me on a two-week wild-side quest troubleshooting the CPUs and its memory banks until I located it, cleaned it, and very delicately and patiently bent it back into its original position. (See the "Fixing an LGA 2011-3 Socket" box.)

Fixing an LGA 2011-3 Socket

Identifying the issue with the CPU socket required some investigative work of the bisection variety: installing one CPU, then installing the other (both worked), then installing half of the RAM, then the other half (the second test failed), then continuing to divide this failed half until I identified the pair of DIMMs that were not working. However, the DIMMs themselves were working (swapped with another pair). Connecting this picture back to the CPU pin was fortuitous: As I was re-seating a heat sink, I noticed some thermal paste out of place, and when I removed the CPU, I found thermal paste in the socket – not a good thing, even when things are working. I washed the thermal paste out with 70 percent isopropyl alcohol loaded in a Waterpik-type device I sourced on AliExpress for $20. Another $20 went to Amazon for a handheld USB microscope [12] to examine the damaged area (Figure 2). Patient use of an index card and tweezers enabled me to rectify the failure. The bent pin controlled the affected DIMM banks.

Figure 2: The bent CPU socket pin as seen under the microscope.

Total Recall

Sixteen empty DIMM memory slots stared at me asking for attention. Raiding my lab's archive, I found eight perfectly suitable modules already in my possession (16GB DDR4 error correction code (ECC) PC4-19200 2,400MHz, exceeding spec). A little bargain hunting led me to find another eight modules on Amazon ($182 in total) with equivalent specs manufactured by SK Hynix [6]. Collectively, the 16 modules combine to provide 256GB of RAM, half of the maximum that could be installed without resorting to more expensive load-reduced (LR)DIMM, which in turn maxes out at 1TB. The current design provides almost 6GB of RAM per core, and I retain the option to budget another $2,000 to quadruple that amount if a workload is found needing it – a very reasonable compromise.

I completed the setup with this newfangled technology called software – Ubuntu 23.10 "Mantic Minotaur" providing the operating system layer, with the distribution including all the necessary message-passing interface (MPI) and parallel processing tools one may want. The btop [7] tool is everyone's new favorite in-terminal system monitor, and it provides a first look at the completed system (Figure 3). Note the inclusion of core temperatures.

Figure 3: Forty-four cores humming along, but why are numbers 32 and 33 doing all the work?

I already discussed BashTop in a previous issue [8], but today I shall focus on just one aspect of its CPU view: What happens when all those cores heat up? The system idles at 121W once booted up, so I will drive up the temperature with the matrix option of the stress-ng tool [9], an ominous stressor known for its Intel CPU heating properties [10]:

stress-ng --matrix 0 -t 1m --tz --times

The zero count syntax requests one stressor to run on each CPU core in the system: --times generates statistics on userland and kernel time, and the --tz option includes CPU temperature data where available. The CPU clock ran up from a cozy 1.2GHz to 2.1GHz across the board, with all cores pegged at 100 percent, eventually reaching a perfect 44 load average [11] (Figure 4). Temperature did not exceed 72C at the hottest sensor (good cooling on Dell's part), but the power draw tripled, rising to 367W (Figure 5). The power-hungry nature of the beast needs to factor in any cost calculation as much as, if not more than, the hardware cost itself.

Figure 4: Turning the system into a space heater with the right benchmark.
Figure 5: Tripled power draw. The electric bill is the largest expense for this system.

The 22 cores (44 threads) of each CPU could turbo boost up to 3.6GHz individually, but it is more interesting in this case to note that 2.4GHz is the maximum speed they can all accelerate to concurrently.

At the time of this writing, an Amazon AWS m7g.12xlarge instance with 48 virtual cores and only 192GB of RAM will cost almost $2/hr ($1.9584, US East, on-demand pricing), so you could think of this new machine as costing 12-1/2 days (300 hours) of AWS compute. Not bad!

The Author

Federico Lucifredi (@0xf2) is the Product Management Director for Ceph Storage at Red Hat and IBM, formerly the Ubuntu Server Product Manager at Canonical, and the Linux "Systems Management Czar" at SUSE. He enjoys arcane hardware issues and shell-scripting mysteries, and takes his McFlurry shaken, not stirred. You can read more from him in the new O'Reilly title AWS System Administration .

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus