Supercomputers "R" Us
Linking PCs for Unprecedented Power
In 1993, Hank Dietz and his graduate students at Purdue University decided to do some tinkering. They built a "cluster"took four off-the-shelf personal computers, wired them together, and added their homemade network hardware to make a do-it-yourself supercomputer. By February 1994, the first generation of their special Linux PC cluster was born. It was cheap. It was fast. And they were first.
About half a year later, some other folks built their first cluster, named it 'Beowulf,' and went down in history as the creators of the Linux cluster.
The Purdue group had already built several clusters by that point, the largest of which was named "Spareparticus." So why didn't Spareparticus become synonymous with cheap, fast, computing clusters?
"Maybe it was the name," Dietz says with a shrug. "It just doesn't sound serious enough."
In any case, he never had any intention of treating Spareparticus (shown at right) as a serious computer. What began as an experiment to test what Dietz calls a "sneaky way" of performing certain kinds of communication within a parallel supercomputer (work his research group had been doing since 1987) turned into something much more.
In 1993 his group had a breakthrough on how to implement his ideas in this custom hardware, [see Putting the "Custom" in Network Hardware at bottom], but they needed to build a mock-up to try it out. "So after a little bit of head-scratching, we realized that instead of building the machine from scratch the easiest way to build a prototype would be to take a bunch of PCs and tie them together with a very simple version of our custom stuff and use Ethernet to pass the messages between PCs," he says.
"Only after we actually built our first cluster we realized, 'Gee, this is working really well.'" And by "really well" Dietz means he was getting better performance from the shabby-looking cluster than he was getting from the million-dollar-plus supercomputer that sat next to it.
And since then Dietz, who came to the University of Kentucky in 1999, has had two goals: one, make it easier for anyone to build his own supercomputing cluster, and, two, create clusters capable of high-end processing for cutting-edge research.
Exploiting the Parallel
The concept of parallel processing can be boiled down to a need for speed. The best way to rev up a computer program is to divide it into multiple fragments that can run simultaneously, each on its own processor (that's the Pentium or Athlon chip, the brain of a PC). Intel's Pentium 4 or AMD's Athlon XP processors are based on the same idea. Inside those tiny chips of silicon, hundreds of things are happening simultaneously.
The same kind of parallelism is being used to take complex computer programs into the world of day-to-day manufacturing. "Suddenly, you don't have to spend tens of millions of dollars to buy a supercomputer to run the fancy applications the top national labs have been working on for the past five to 10 years," Dietz says. In fact, for a mere $40,000, you can build a cluster to solve mysteries like computational fluid dynamics (CFD), something engineers at Lexmark, the international company headquartered in Lexington, need to understand as they design the next generation of laser and inkjet printers.
"The complexity of those problems is not much different from the complexity of doing CFD modeling for the space shuttle. The budget is the big difference," Dietz says. Thomas Hauser, Ray LeBeau and George Huang, UK mechanical engineers who have worked on things like how to make printers quieter and how to make ink dry faster for Lexmark, have been using Dietz's clusters and are currently building a similar cluster dedicated to their research.
Dietz's work was initially funded by the Office of Naval Research, he's won grants from NSF and NASA, and has received donated equipment and funds from a number of corporate sponsors such as AMD, Intel, IBM, and Texas Instruments.
His focus can be summed up in three words: whole system design. And this focus is evident in the name he gave the UK research group: KAOS. This is not only a Kentuckyfied spelling of CHAOS (which stands for the three basic components of a cluster: compilers, hardware architectures, and operating systems), but also a reference to the old "Get Smart" TV series, in which KAOS was the organization fighting against CONTROL.
"This strange name gives a feel that, in some sense, this is a chaotic field. We're looking at how a lot of different things interact and that, by its own nature, is a complicated problem," he says. "A lot of people are building clusters, but we're just about the only ones who are looking at compilers, operating systems, and the hardwareeven building our own custom hardwareall as one problem."
For those of you who don't speak geek, a compiler is what translates programs written in source code (human-readable language) into object code (language the machine can understand). "You write code in PASCAL, Java, C, C++, FORTRAN, then you run that code through a compiler and it generates what actually runs on the machine," Dietz explains. Instead of focusing on writing the world's best compiler for one programming language, Dietz's team built tools so that anyone can build customized compilers. These tools, known as the Purdue Compiler Construction Tool Set or PCCTS, have been used all over the globe by people in industry and academia.
Research assistant Tim Mattox (left), grad student Galen Rasche and Hank Dietz with the tools of their trade: a rear-projection screen, the cluster supercomputer KLAT2, and a video wall made up of standard computer monitors.
Dietz's focus on hardware architectures is pretty straightforward. "Hardware is the stuff you can touch. What we're looking at is how the different components are put together to build a cluster."
The operating system of choice for most computer scientists is Linux, primarily because it's "open sourced," which means you can modify the source code. (With operating systems like Windows you don't have access to the source codeall you get is the object code, the stuff only the machine understands).
"We use Linux because it's tuned for giving you peak performance with just a handful of things running on the machine at a time," Dietz says. "Linux was intended primarily for use on a personal computer, because if you're running a program, you want that to take priority over everything else. But the reason it's good for us is when you're running a parallel program across a whole bunch of machines you don't want any other crud getting in the way.
"Think of it like this: If you have 64 machines working together on a problem and one of them is off doing something else rather than its part of the job, everybody's going to end up waiting for that guy."
A Groundbreaking Cluster
KAOS's credo: while using the most economical parts possible, get the peak performance out of a computer application by "tuning" a cluster so that all of the parts work in perfect harmony. In April 2000, with the help of more than 30 UK students led by Tim Mattox (a research assistant who has been working with Dietz since the days of Spareparticus), Dietz built a cluster that is the poster-child for this concept.
Its name: KLAT2 (Kentucky Linux Athlon Testbed 2). It's an intimidating machine purely on the basis of numbers. KLAT2 is 64 PCs (plus two "hot spares"back-ups that can be swapped in if one of the 64 fails). Each of the PCs (a.k.a. nodes) has four network interface cards (NICs), the standard for Ethernet connectivity. KLAT2 has nine 32-way switches and nine different colors of wires.
"In order to have a cluster work well as a single machine, all of the nodes need to be able to communicate very efficiently. Part of that is having very low latency (a brief delay in communications), and part of it is having very high bandwidth (being able to send a lot of data through at a time)," Dietz says.
The ideal way to connect machines is to tie them directly to each another. "But when you have a large number of machines, the number of wires involved makes that infeasible almost immediately, so you have to start thinking about a network structure that has at least one switch between the machines," he says. "A switch is nothing more than a box that a bunch of machines can be wired to so that any machine plugged into that box can talk to any other machine, and they can all be talking simultaneously."
The normal way, he says, to connect a cluster of PCs is by building a hierarchy of switchesevery machine plugs into a switch, and the switches plug into another switch, with the result that a message has to go through three switches to get from one machine to another. But that's a sluggish solution, and Dietz and his team chose to think outside the box.
"We realized that the optimal network would have any pair of PCs able to communicate with only a single-switch latency and full bandwidth," he says. "But who said it had to be the same switch for all pairs of PCs? So you realize, Ah! It's cheap to put multiple network interface cards in each machine, and what we can do is plug each of those NICs into a different network switch in such a way that the wiring pattern guarantees that any pair of PCs has at least one switch in common." Dietz calls this concept a "Flat Neighborhood Network," because there is only one level of switches.
When KLAT2 was built, the latest-greatest word in network speed was gigabit technology (which can transfer data at a rate of one billion bits per second). But the cost was prohibitive.
A year-and-a-half ago, NICs for gigabit hardware ranged from $320 to $1500 (today the cheapest ones are $50) and gigabit switches were very expensive. With four NICs per PC, you can see why gigabit wasn't the economical way to go. "Fast Ethernet100 megabits per secondis really, really cheap," Dietz says. "NICs cost about 10 bucks apiece." And when it came to switches, cost was also the deciding factor: "Thirty-two-way fast Ethernet switches were cheap."
But one question remained: What was the best way to create a blueprint to show how to hook up this 264-wire beast?
Let a computer figure it out. So they did.
Natural Selection Creates a Cluster
KLAT2 was the first machine ever built with a network designed by computer, and the application that concocted the groundbreaking wiring scheme is known as a genetic search algorithm. These kinds of algorithms aren't new. They've been around for over a decade, Dietz says, used mostly to solve straightforward mathematical problems.
"Genetic search algorithms do the same kind of thing natural selection does," Dietz says. "We set up a 'population' of different network designs (interconnection patterns) and the program evaluates them based on this question: 'Can every machine talk to every other machine?' If they can't, that pattern is thrown out."
The search algorithm then takes the attributes of the good wiring patterns and mixes them up a little to see if the resulting designs are more efficient. "So now the question is, 'How do two wiring patterns have sex with each other?' (We call it crossover.) We've written into the algorithm a very simple way of taking subsets of the wiring patterns and exchanging them (like gene sequences in human DNA). It's a little devious, and I believe we're the first ones to do that with a genetic search algorithm."
By repeating this exchange over many "generations," Dietz's algorithm evolved a wiring pattern for KLAT2. But it took an incredibly powerful computer to do this. Dietz just happened to have several such computers: the other clusters in his lab.
KLAT2 relies on Fast ethernet switches to pass messages among PCs.
"The computing power it took to run the genetic search algorithm was more than most people would have been able to invest in it, so, in some sense, having our older clusters was the enabling technology for building this one," he says.
At the time KLAT2 was built, it cost $41,000. Today it would cost about $25,000. "And at the time KLAT2 was created, it was the first general-purpose supercomputer to achieve better than one gigaflop [one billion floating-point operations per secondthe basic measurement of computing performance] per $1,000 spent on the machine," Dietz says.
In fact, KLAT2's price-performance ratio was $650 per gigaflop. (Traditional supercomputers were closer to $10,000 per gigaflop, and most Beowulf clusters were around $3,000 per gigaflop.)
The machine drew praise from the computing community. The technologies developed and demonstrated in KLAT2 were recognized by Computerworld magazine as one of the six most significant contributions of information technology to the advancement of science in 2001, and earned an honorable mention in the price-performance category of the 2000 Gordon Bell Awards. ("These are the closest thing to a Nobel Prize in supercomputing," Dietz says. "They give out zero to six awards each year, so it's a big deal.")
There are several clones of KLAT2 based on the Flat Neighborhood Network concept. Last summer Mattox flew to Keele University in England to help researchers set one up to do environmental geophysics modeling. Dietz and Mattox are currently working with Tim Dowling at the University of Louisville to set up a slightly smaller version (with 40 PCs) to do planetary weather modeling.
Dedicated lab equipment. That's what Dietz sees as the future of his supercomputing clusters. "We want to find a handful of very important applications that lots and lots of people would like to have available to them and, essentially, beat those applications to death in terms of making them as efficient as possible," he says. This would mean rewriting the code to take the best advantage of parallel processing, something that, over the years, he's developed a fairly large bag of tricks to help him do.
"We want to come up with a little cookbook-like design mechanism for building the clusters and essentially put that out as full public domain so anybody can take that set of design specs and either build it themselves or have somebody build it for them. The goal is that they'll not even realize it's a computerit'll just be a funny-looking piece of lab equipment."
The Right Tool for the Job
Dietz unveiled an innovative tool, "The Aggregate Cluster Design Rules," at the Supercomputing 2001 conference in Denver last November. This Web-based form allows users to input very specific information on the application they want to run and the desired components, including their cost, and it spits out the best possible network designs for a cluster.
It does an exhaustive search: when Dietz filled out the form as an example, it searched 57 million different cluster designs. Of those, a mere 3.5 million were viable solutions. It then ranked those 3.5 million according to efficiency. "You can win arguments with people: 'Why is this a good design?' 'Here's why.' It gives you specs on every aspect of the cluster and lets you do side-by-side comparisons of the designs."
Built into this tool are many of the tricks Dietz has picked up in seven years of designing clusters, and, while he says it can't build the cluster for you, it offers a realistic starting point for someone in the know who wants to make their own cluster and who takes about 10 minutes to fill out the form.
"The standard way people have been told to build clusters is to go out and buy the best components they can afford, throw them together using the best networking hardware they can afford, and wah-la. They'll have a system that works perfectly and within a day they'll be running their code on it.
"It doesn't work that way." You don't need the most expensive stuff out there, but you do need some common sense, he says. "The design tool is nice because it makes it very clear exactly why the winning designs are the winners, and it forces people to be a little more logical about the whole idea that what they're building needs to be designed as a cluster supercomputer, not just thrown together and cross-your-fingers."
Dietz says a number of computer vendors who saw this tool at the conference can't wait to put a version on their own Web sites, customized with their components and current prices, and Dietz is more than happy to let them do it. Money isn't the issue for him. He's investing in the future of computing by designing cheap solutions to complex problems.
Show Me the Pictures
Something that started as a way to show just how fast his custom hardware can perform aggregate operations has become one of the big draws at research exhibitions: video walls.
Dietz's lab features a 16-PC video wall using standard computer monitors. The PCs were donated by AMD, the second-largest supplier of PC processors in the world. "This machine is actually a parallel supercomputer that does a lot of processing on data in addition to display processing. That allows you to do computationally demanding things like simultaneously pan and zoom," he says.
Dietz's laptop video wall displays an image of Tux, the Linux mascot. His custom network hardware (bottom) uses parallel printer ports to synchronize the nine laptops.
Dietz's video wall of nine laptops at last year's supercomputing conference drew much more attention than the nine video projectors that made a 7.5 by 10 foot rear-screen projection display. (Both were displaying a computer simulation of fluid flow around a turbine blade.) "Other people had rear-screen projection, but nobody was stupid enough to bring a cluster of laptops," he laughs.
"And the really scary part was our laptop cluster was the highest-resolution video wall at the show." Through some special technology known as "sub-pixel rendering," Dietz's laptop wall showed a much clearer, more detailed picture than anything else at the conference. The video wall library software created by Dietz's group and released as public domain isn't the most powerful video wall software available, he says, but it does make it easy for almost any application to use the video wall, without rewriting any code.
Putting the "Custom" in Network Hardware
It's smaller than a breadbox. In fact, it's about the size of an old-style toaster. But this humble-looking hardware is unbelievably efficient at allowing a cluster of unmodified PCs to do the kind of processing once possible only on pricey supercomputers.
Named PAPERS (Purdue's Adapter for Parallel Execution and Rapid Synchronization) and created in 1994, Dietz's design uses a custom board that connects to PCs via their parallel portsthe same port on your desktop computer where you normally hook up a printer.
Why printer ports? Because the waiting period is about as short as you can get.
"If you just want to be able to get a little bit of data out of the machine as fast as possible, the parallel printer port is the way to go, because you can get something out in about one microsecond," Dietz says. That's faster than any other standard PC hardwarethings like Ethernet interfaces take a while to get started sending data.
"Standard networking hardware is very good at sending a message from one processor to another, point-to-point. But if you want to ask questions like, 'Is everybody done yet?' or 'Who's done with their work so I can give them some more?'the kind of questions that come up a lot in parallel programsyou can't answer very efficiently using point-to-point messages.
"Those are what we call aggregate questionsthey require you to accumulate data from all of the processors." PAPERSwhich has already gone through 19 generations and which Dietz describes as an "aggregate function network"was specifically designed to do this.
"In a single operation, an aggregate function network collects data from everybody, reduces that data down to one piece of information (the answer to the question), and then returns that to everybody." He says if you tried to do this kind of thing with a traditional network, there'd be a whole flurry of communications that would slow the entire system down.
"Our network hardware is not commercially available from anybody at this time, which means we have to build the stuff ourselves," Dietz says. "But the good news is our designs are free, publicly available, and well documented on the Web."
Dietz's first acquaintance with the University of Kentucky was through Jim Lumpp, an associate professor in electrical and computer engineering. Lumpp was working on cloning Dietz's custom hardware for his own research about a year before Dietz came to UK.
Dietz says the real draw of UK was the fact that it sports both a top-notch engineering and medical school. "I got married two and a half years ago. My wife [Sabire Ozcan, a biochemistry researcher] was in a non-tenured faculty position at Washington University in St. Louis, I was tenured at Purdue in Indiana, and we were looking for a university where we could both work," he says. "There are very few universities that have good engineering and med schools right next to each other.
"On top of that, this is a university, and a state, that has a strong focus on building up computer engineering, and I liked that attitude." Dietz joined UK in July 1999 and a few months later was named the James F. Hardymon Chair in Networking. This chair is part of the Research Challenge Trust Fund program known as "Bucks for Brains," where the state matched, dollar-for-dollar, gifts by UK's major donors to endow chairs and professorships.