A Brief History of Supercomputers
Trajectory
Of course, clock speeds are not the best approach for comparison, but in the absence of any benchmarks that lasted for more than 15 years during that time period, it will have to serve as a guide. Although absolute clock speed numbers are not important, what is relevant is the growth in clock speeds, as well as the relative values.
From the 1990s through the early 2000s, you can see the trajectory of CPUs. The PC CPUs were very quickly gaining in performance (e.g., adding SIMD instructions). Finally, they became 64-bit in 2003 with the AMD Athlon 64. Clock speeds were also quickly increasing to well over 1GHz and on to 3GHz. Then in 2005, the Athlon 64 X2 introduced multiple cores in a single die that ran at 1.9MHz.
At the same time, supercomputer processors, which were made in much, much smaller quantities, still had lower clock speeds. Cray was using vector processors that could run vectorizable code extremely fast, but even then, the clock speeds barely reached 1GHz in 2005, when PC CPUs were approached 2GHz. The SGI MIPS processors, which were also made for the workstation market, were still under 1GHz when the Origin 3000 launched in 2000.
During the 1990s, the pace of PC CPU development was quickening, with good increases in clock speed and increasing parallelism. The L2 cache was also increasing in capacity over time. Then, in 2003, PC CPUs reached 64-bit with high clock speeds, quickly followed by two cores on a die with clock speeds of 2GHz and greater.
Supercomputers enjoyed a great period of growth in the early 1990s, with better clock speeds, great vectorization, and even additional parallelism across nodes. The early experiments of the late 1980s and early 1990s showed that parallelism from large numbers of processors was possible, although software had challenges trying to take advantage of all that processing.
At the same time, Cray was only making a small number of processors compared with the PC market. Large investments were spread across the development of a small number of processors. However, Cray also used workstation processors, specifically DEC Alpha processors, to reduce costs while still maintaining great performance, as reflected in the popularity of the Cray T3D and T3E systems.
SGI also tried using their MIPS processors in both their workstations and Origin supercomputers to help keep system prices down, making them competitive with Cray.
Overall system performance for these supercomputers was increasingly driven by parallel processing across multiple nodes. The PC processors were very quickly catching up and surpassing supercomputer processors. Tables 1 and 2 show a brief glimpse of the trajectory of PC CPUs and supercomputer processors from the late 1980s into the early 2000s.
Table 1: PC Processor Progression
Date | Processor | Highlights |
---|---|---|
Apr 1989 | 486DX | On-die L1 cache, much better performance than 386L2 on motherboard |
Mar 1992 | i486DX2 | 2:1 clock multiplier, 40/20, 50/25, 66/33 speeds; L2 on MB |
Mar 1994 | i486DX4 | 3:1 clock multiplier, 75/25, 100/33 speeds; 16KB L1 cache on-die, L2 on motherboard |
Mar 1993 | Pentium | Data bus width doubled to 64 bits, superscalar, FSB of 60-66MHz, clock multiplier of 1; 16–32KiB L1, still external L2 cache |
Nov 1995 | Pentium Pro | 150–200MHz on-package L2 cache (256KB to 1MB); decoupled, superscalar, 14-stage super-pipelined, out-of-order execution, two integer units |
Jan 1997 | Pentium MMX | SIMD (MMX), 166–200MHz |
Apr 1997 | AMD K6 | Supports MMX, 166–300MHz; L1 cache 32+32KB, L2 on motherboard |
May 1997 | Pentium II | Improved Pentium Pro, first Xeon naming, 233–450MHz |
May 1988 | AMD K6-2 | MMX and 3DNOW! SIMD, 200–570MHz; 64KiB L1 cache |
Jun 1998 | Pentium II Xeon | SIMD; L2 cache from 512KB to 2MB |
Feb 1999 | Pentium III | 9.5 million transistors, 450 and 500MHz clock speeds (600MHz in 1999); new SIMD, SSE, introduced; achieved 1GHz in early 2001; max. clock speed of 1.3GHz |
Feb 1999 | AMD K6-III | 400 and 450MHz initial clock speed, ending at 500MHz; L2 cache of 256KB; Socket 7; MMX and 3DNOW! SIMD instructions |
Jun 1999 | AMD Athlon | 500–700MHz |
Nov 2000 | Pentium 4 | NetBurst architecture (not successful); introduced SSE2 (still used today); code could be fast but needed new code optimizations; eventually reached 3.8GHz |
Early 2001 | Pentium III | ≥1.0GHz |
May 2001 | Xeon | 32-bit; 1.4, 1.5, 1.7GHz |
Sep 2001 | Xeon | 2.0–3.6GHz |
Sep 2003 | Athlon-64 | 1.0–3.2GHz |
Feb 2005 | Pentium 4F | 64-bit, 2.8–3.8GHz |
May 2005 | Pentium D, Smithfield | Dual-core, 2.66–3.2GHz |
May 2005 | Athlon 64 X2 | Dual-core, 1.9–3.2GHz |
Dec 2006 | Xeon Clovertown | Quad-core, 1.86–2.66GHz |
Jan 2010 | Nehalem | Dual-core; 32+32 L1, 256KB L2, 3MB L3; 2.8GHz, two threads per core |
Table 2: Supercomputer Processor Progression
Date | Processor | Highlights |
---|---|---|
1985 | NEC SX-1, SX-2 | SX-2: four sets of high-performance vector operation pipelines with up to a maximum of 16 arithmetic units, capable of multiple/parallel operation |
1988 | Cray Y-MP | Eight 32-bit vector processors, 167MHz, SRAM main memory single-vector pipeline |
1990 | NEC SX-3 | SIMD, MIMD, four arithmetic processors, up to four sharing the same main memory |
1991 | Cray C90 | Dual-vector pipeline, 244MHz, three times Y-MP performance |
1994 | Cray T3D | DEC Alpha 21064 processors, 3D Torus, 64-bit |
1994 | Cray J90 | Up to 32 vector processors, 100MHz, 4GB of memory, 32-processor T932 costing $59.76 million in 2020 dollars |
1994 | NEC SX-4 | First shipped in 1995, several CPUs arranged into a parallel vector processing node; then, those nodes were installed into a regular SMP arrangement |
1995 | Cray T90 | Evolution of C90, 450MHz processors |
1996 | Cray T3E | DEC Alpha 21164 processor, 300MHz, future processors: 450, 600, and even 675MHz; can scale from 8 to 2,176 PEs, each PE 64MB and 2GB of memory |
1996 | SGI Origin 2000 | R10000 MIPS processor, 180 to 300 and 400MHz |
1998 | Cray SV-1 | Vector cache, 300MHz, later ran at 500MHz |
1998 | NEC SX-5 | 4TFLOPS, each node used 16 CPUs, up to 128GB memory |
2001 | NEC SX-6 | Single node, up to eight vector processors, up to 64GB of memory, connect up to 128 nodes in a single system; became Earth Simulator |
2003 | Cray X1 | NUMA, vector, 800MHz, eight-wide vector; air-cooled, up to 64 processors; liquid-cooled, 4,096 processors; 1,024 SMP nodes in 2D Torus; code with Python virtual machine (PVM) and message passing interface (MPI) |
2004 | SGI Origin 3000 | R12000 MIPS processor, up to 360MHz; later R14000 up to 500MHz |
2005 | Cray X1E | Dual-core, 1,150MHz |
Commodity Networking
A critical aspect to making distributed computers work together is networking. When PCs were still in their infancy, specialized networks were awfully expensive and sometimes a little fragile. They were used for critical information transmission in industries such as Telco, finance, and government. Supercomputers through the 1990s used some of this specialized networking to achieve high bandwidth and low latency for that time.
For PCs, networking had to match PC pricing. You could not have a $500 to $2,000 PC with a $10,000 networking interface. The specialized networks did not match the low-cost expectation. PCs had to wait for cheaper networking to be developed. This came from Ethernet.
Ethernet, developed around 1973 and 1974, was developed at Xerox PARC, as were so many innovative technologies. Initially, Ethernet ran at 2.94Mbps and was used in several server applications, but not with PCs. In 1980, the Ethernet specification was upgraded to a 10Mbps protocol. Version 2 of the specification, known as Ethernet II, was published in November 1982. By the end of the 1980s, Ethernet had become the overall dominant network technology.
In the early 1980s, Ethernet used 10BASE5 and coaxial cable, which later changed to the 10BASE2 cabling many should remember (recall the “vampire taps”?). Then the world moved on to 10BASE-T, which used twisted-pair cables, as is still used today for common networking.
With 10BASE2 coaxial cabling, the use of Ethernet started to grow outside of supercomputers and specialized networks, bringing the prices of Ethernet, including Ethernet switches and routers, down, which caused more usage, and so on.
Around 1995, the next generation of Ethernet, Fast Ethernet, was introduced. This is probably the start of true commodity networking, with a performance of 100Mbps, 10 times faster than the previous generation. It was a quantum leap in performance, with prices dropping rapidly to the point where it became ubiquitous. The low prices allowed Fast Ethernet network interface cards (NICs), Ethernet switches, and Ethernet routers to be put into homes.
The first cluster I helped bring into Lockheed Martin used Fast Ethernet as the cluster interconnect. For the computational fluid dynamics applications, we used Fast Ethernet, which allowed code to scale very well to the point of a single application across the entire cluster. Granted it was only 64 nodes with dual processors at that time, but the price and performance were revelations to us.
Gigabit Ethernet, commonly referred to as “GigE,” runs at 1,000Mbps and was introduced in 1999. It could still use 10BASE-T twisted-pair, keeping prices low, and delivered another 10 times jump in performance for commodity networking. GigE is still going strong for small HPC systems and in homes.
Commodity networking, starting with Fast Ethernet, came about around the same time as commodity processors (PC CPUs). In a definite sense, they feed off each other. As networking got less expensive, it was cost effective to buy more PCs and add more capability, pushing PC prices down. As processors got less expensive, more systems were purchased, which needed networking, which increased the amount of networking needed and drove down networking costs.