Tuesday, May 22, 2007

Petascale Systems Integration into Large Scale Facilties workshop

I didn't get to go to this. I really wanted to, but, unfortunately, work and life conspired against me. My wife has finals this week and she needed me home pronto to take care of our daughter and cook, etc, so she could study. Also I was on rotation from last Tuesday until today. Bleh.

There was some discussion of the workshop after the fact and there was a sense of disappointment wrt to it because there was a lot of 'the problems we're going to face are the problems that we have been' at least as far as integration into existing sites. One of the anecdotes though that was related back to me was what was done for LLNL and their petascale system. They have a multiple hundred million dollar budget for it and they are supposed to get 30 megawatts - yes, thirty fscking megawatts - as an upgrade to their building for HPC systems. The whole East SF Bay, so goes that same source, is only 90 megawatts in usage now. Now, as an exercise, start scaling up for that 500 petaflop system that the climatologist quoted in that presentation I mention now and again. um, yeah, can you say very dedicated power source?

Another anecdote that was related, and I think you, Horst, are the source (if you're reading) - was that with the vast numbers of processors that petascale systems are going to be - and MPP doesn't even begin to describe it! We're talking 100s of 1000s here - the equivalent would be like programming the old 68k line processors from Motorola and programming such that you tell each and every transistor when it should expect a one or zero by hand. Possible, marginally, but insanely inefficient for effective and timely coding. The current models for coding parallel programs will have to be dumped. There were few, according to my source, suggestions on how to go about it. Even though systems on that scale are rapidly approaching. (Sorry if I mangled your analogy, Horst, feel free to correct).

Thoughts of managing such a system with a 99.99% reliability with 100s of 1000s of processors just gives me a headache even thinking about it. 10 nodes down and requiring a sysadmin to work on them at any given time.

Anyone that intends on talking to me about the Pending Singularity better be ready to be giggled at as far as I am concerned.

No comments: