The ongoing evolution of scientific supercomputing

Ubiquity, Volume 2000 Issue June, June 1 - June 30, 2000 | BY John Gehl

Full citation in the ACM Digital Library

UBIQUITY: Not too many years ago, most information technology professionals had backgrounds such as your own, which is in mechanical engineering and nuclear engineering. How do you perceive the changes that have happened since people formally trained in computer science have begun to play a greater role in shaping the profession?

SIDNEY KARIN: That's a very good question. People who were trained as I was trained thought that learning Fortran was all there was to computer science. Of course, we eventually learned that Fortran is a very tiny window into the world of computing, but we're still resolving the gap between the hard scientists and the computer scientists. One of the things we're doing at the San Diego Supercomputer Center and with NPACI (the National Partnership for Advanced Computer Infrastructure) is bringing together computer scientists and applications people to work on research problems that are simultaneously interesting to both kinds of people.

UBIQUITY: Can you declare the effort to be a success?

KARIN: Absolutely -- but the fact remains that there are still many applications people who don't appreciate the value that computer science brings to the table beyond presenting them with a language in which to program.

UBIQUITY: Are some disciplines better and some disciplines more recalcitrant?

KARIN: Not disciplines so much as individuals. Some individuals are better and some individuals are more recalcitrant, perhaps. Many people who do a lot of high-performance computing generally write their own codes, because they have to; for example, there aren't any standard community codes or commercial codes in the high- energy physics area. In contrast, most people in mechanical engineering who use computers are using a standard like Nastran and not writing their own codes. That's not to belittle standard code in any way, but the end-users of standard code are not the ones who in any real sense interact with computer science. The originators of the community codes are in a similar position to the high-energy physicists.

UBIQUITY: What happens to the people who don't originate community codes?

KARIN: They're forced to wait. When something new comes along like, for example parallel computing -- which is more or less replacing vector supercomputing as we speak - - the early adopters are scientists like the high-energy physicists. Of course, it turns out that their problems happen to be parallel in nature, so maybe this is a slightly specious example. But the point is that most mechanical engineers have to wait until the authors of the community codes migrate that software onto the parallel systems.

UBIQUITY: So what will the future be like?

KARIN: I think what's going to happen in the long run is something that we're already involved with to a certain extent, which is that all of this will be hidden. In fact, I think that's not only the way it's going to happen; I think it's the desirable way for it to go. These days most people have little idea about how an internal combustion engine works in principle much less are capable of identifying the components when they open the hoods of their cars. Even I, as a mechanical engineer and one who used to play with hotrods 30 or 40 years ago, open the hood and don't know what the pieces are. But I can drive my car just fine. I think the future use of high-performance computing is going to have a similar take on it.

UBIQUITY: In passing, you mentioned that vector computing is "going away as we speak." Is vector computing essentially dead?

KARIN: It's a long way from dead, but it's no longer the premier or predominant way in which high-performance scientific computing takes place. There's a transition going on. For example, at the San Diego Super Computer Center, we operate a 14-processor Cray- T90 (one of the biggest or fastest vector supercomputers anybody ever made), and we also have a more powerful IBM SP with more than one thousand nodes. The SP is more difficult to use because of the level of parallelism, and the efficiency of use -- that is the fraction of the peak that's achievable in practice -- is much lower. But the peak is sufficiently higher so that on balance it's a more powerful machine for real problems -- that is, the real problems that can be effectively implemented on a very highly parallel system. So we're in this transition where people are learning how to implement algorithms on those machines with a very high degree of parallelism. It takes much more effort than it did even in the early days of vector machines. One of the differences that I've always noted is that any implementation on a scalar machine that worked was not very far from the optimum implementation.

UBIQUITY: Say some more about optimization.

KARIN: Well, people like myself worked very hard many years ago to optimize. When we got a factor of two, we thought that we'd really done something, and had probably just implemented it wrong the first time. But on the vector machines, we could get much more than a factor of two. And on parallel machines, we can get factors that go up as high as the number of processors. There's a lot more to be gained from the extra effort that it takes to deal with the extra complexity.

UBIQUITY: You may think this interview is taking a morbid turn, because the next question is: what else is dead? Is Fortran dead?

KARIN: No. I'm not sure it will ever die completely. In fact, it's perfectly appropriate for some things.

UBIQUITY: So its future is secure for reasons other than that there's a lot of legacy code lying around?

KARIN: Legacy is a very big driver but it's not the only driver. Legacy can be looked at in a number of ways, one of which is the code itself, but another is the number of people who understand the language and are still writing it. And, furthermore, the numbers of people who are still teaching and learning it. It's certainly not on the ascendancy at the moment, but I don't see it going away for a long time to come.

UBIQUITY: What is the language of choice now?

KARIN: The field is no longer so narrow that you could say there is a language of choice. I would say C is probably the most predominant language in scientific computing these days. There's also a lot of activity in C++ and in Fortran and in Java and so on. Although Java is not yet exciting anybody in high performance applications, it's clear that that it's on the horizon.

UBIQUITY: Is there some reason Java is not exciting anybody in high performance applications?

KARIN: I'm not the expert here, so I don't want to comment on the specifics, but Java doesn't deliver the last ounce of performance that people generally look for when they've gone to the trouble of getting a high performance computer and want to get the most of it.

UBIQUITY: In the not too-distant past some engineers and scientists liked to mock parallel computing by comparing it to the idea of using thousands of chickens to pull a wagon.

KARIN: There are a lot of jokes like that and they're not entirely off the mark. On the other hand, suppose you had a horse and a billion chickens and you had a wagon that weighed a million pounds. Well, the horse has no chance. Assembling all those chickens is hard -- but at least it has a chance. Chuck Leith, a physicist at NCAR and at Livermore, liked to say: "You don't understand. Parallel computing isn't any good. It's only necessary." And I think that's the essence of it. Nobody on the applications side wants a parallel computer. Frankly, there's nothing interesting from a physicist's standpoint about parallel computing. You just want to get the computing job done. It's a lot simpler to do it on a scalar machine with one processor. But we're running out of performance there by factors of a thousand and more.

UBIQUITY: What's the hot issue now in high-performance computing?

KARIN: The hot issue these days is clusters. Let me back into your >>question. Supercomputing, in some sense, started in 1976 when Seymour Cray delivered the first Cray-1 to Los Alamos. Now, you could argue that we had CDC-7600s before that and maybe some people were already using the term, but it became popular with the Cray-1, which was different in that the previous hottest performance machine had a maximum performance of 4 megaflops, whereas the Cray-1 came out and had a peak performance of 160 megaflops. That was a huge difference. We haven't seen anything since of that scale of change in things in a single leap. That put supercomputing in a class by itself. It turns out that not only was the Cray-1 the highest performance machine of its time by far, but in addition it was the highest price/performance machine. So if you had enough of a demand, even if you didn't need the peak performance, to justify a machine like that, there was a lot of interest in using it because of the price/performance for large scale computing. That's changed over the years and it's no longer the case that the peak performance supercomputers, whatever kind they are these days, are also simultaneously the best price/performance computers.

UBIQUITY: Where is the best price/performance to be found?

KARIN: The best price/performance these days seems to be in the commodity cluster arena. At the moment, that's the way to do what we call capacity computing rather than capability computing, which aims at the peak. There's this interesting disconnect between the people who say, "Why would you do anything but this because it's the most cost- effective to compute" and the people who say, "Why would I bother with that? I can't get the maximum performance out of it." Usually it's just a matter of people answering different questions when they have that kind of a disagreement. You ask a question appropriately and you get the right answer. If the question is "How do I get the most bang-for-my-buck?" then clusters are probably the answer these days. But you may just want to ask the question "How do I get the most bang?" -- if you're trying to do something like predicting climate change and so on, where you can't go back and say, "I don't have the right answer but I've got a cheap answer." It's not acceptable. Then you need to be looking at the highest peak performance machine. Right now, the commodity clusters are not the highest peak but they're definitely the highest price/performance. There's a fair amount of contention and argument about when and what you should get now, and how much money should you spend for what purpose and so on.

UBIQUITY: Can you give an example of a typical commodity cluster? What's in it?

KARIN: People often think of commodity clusters as these piles of PCs that have a relatively low-speed interconnect, like Ethernet, and a bunch of off-the-shelf boxes from Dell or someplace like that. Another thing that people think of as a commodity cluster is bunch of workstations, let's say, high performance Suns or Alphas from Compaq, that are connected with a dedicated, high speed interconnect. There is a big price difference between those two configurations and also a big performance difference. At one end of the price spectrum you have a pile of commodity PCs linked by cheap, low-speed networks used in concert; next you have high-performance workstations linked by high- performance networks; further along the spectrum you get dedicated, very high-speed- switches provided by a single vendor like, say IBM, who is the vendor for the machine we have now.

UBIQUITY: Where does the scientific community stand on open source versus proprietary operating systems?

KARIN: The community -- I wouldn't say is in chaos -- but it's exploring lots of different considerations. There's a fairly, strong open-source Linux community, which we're exploring a bit at SDSC. It has the potential to revolutionize the way things are done. But the community of people interested in Linux clusters at the department level -- 128 up to 256 boxes -- is a lot different than the much bigger community interested in thousands of processors linked together. There are different problems that have to be dealt with. It's an interesting and a rapidly changing landscape that we're looking at the moment.

UBIQUITY: Some of the words that you've used -- like convergence and commodity and for that matter, bang-for-the-buck -- bring up the question of the commercialization and commoditization of supercomputing. What's happened?

KARIN: Those of us in the scientific computing community who are my age are used to the idea that vendors provided machines designed to our specifications. That's not quite precisely right, but the vendors were targeting -- as best they could -- scientific applications. And there were different vendors for different applications. Cray didn't sell a lot of machines to banks. IBM didn't sell a lot of machines to Los Alamos. They optimized their machines for different customer bases. These days, for a lot of reasons, high-end machines are essentially the same architecture as low-end and commercial machines.

UBIQUITY: What are some of those reasons?

KARIN: Gary Smaby who's a popular analyst in this area, has pointed out that the traditional high performance, computing marketplace dropped below a billion dollars a year a few years ago. And yet, if you go to IBM and ask what is the size of the market for the IBM SP parallel systems, the company says its share is several billion dollars. The explanation, of course, is that to a good approximation, the thousands of 16, 32 and 64 processor systems that IBM ships as servers in the commercial space are identical to the machines shipped to Livermore and to San Diego Supercomputer Center and so on. The engineering is amortized over the entire marketplace. As a result, machines are available to the scientific community that would not have been available if we weren't using the same technology.

UBIQUITY: Is the scientific community better or worse off using commercially available systems?

KARIN: You have things like the Cray-T3E, a very powerful machine. But it's not popular enough and you can't make small versions of it and so the engineering is amortized over a small base. It doesn't matter that I might be able to get a higher fraction of the peak performance on a machine like that when for a much lower price, I can get more performance, though not as efficiently, on something based on the same kind of servers used by The Charles Schwab Company and AOL. So it is true that we're now using the same kinds of systems that the commercial world uses. It's also true that the scientific community has less influence over the design of our systems than in the past. I think it's also very true that we're better off because we have higher performance systems than we would have had if we continued to have an industry supporting only high-end, scientific computing. I'm not sure the whole community would agree with that statement, but I've been saying that for some time and I believe it strongly. There's little question in my mind that we're better off this way. We're too small a part of the overall computing enterprise to drive things successfully for only our purposes.

UBIQUITY: Speaking of influence, how would you rate the influence of the scientific supercomputing community with federal funding sources?

KARIN: You mean, this morning or this afternoon? I don't know. I think that scientific computing as well as computing overall is beginning to be appreciated as a huge, economic driver in the economy. That's not news -- you see those kinds of comments in the popular press. On the other hand, I just saw that the follow-on terascale system at NSF was deleted in the House committee projections for next year's budget. That one didn't please me so much. But there are many steps along the way to getting a budget.

UBIQUITY: In general, is there a fair amount of optimism within the community about the future?

KARIN: I think there is. If you go around and find out how many people are grumbling and annoyed and dissatisfied, well, it's almost everybody. But what if you look at where things are today and where they are going? High performance computing is on the same exponential as every other form of computing. We have far more powerful computers today than we did five years ago, ten years ago, and so on. I think that's largely a result of progress in computer technology, driven by the commercial marketplace as well as scientific advances. So everybody wants more, everybody wants an easier way to program, everybody wants more stable environments in which to program rather than changing architectures, and all of that. Those complaints are legitimate. If you want to list all of the complaints and sympathize with the people complaining, it's easy to do. But if you step back and look at what we can do today versus what we could do 20 years ago, it's incredible and it's continuing to explode.

COMMENTS

Articles

The ongoing evolution of scientific supercomputing