Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.

June 2008 - Posts

  • Diminishing returns from virtualization will affect larger core count server sales

    I just got back from an HP show where I had some interesting conversations regarding the crisis that hasn't seemed to have happened yet.

    The key to the Sherlock Holmes mystery, "The Hound of the Baskervilles" was this: Why the dog didn't bark?

    Why hasn't the so-called multicore crisis been seen as a crisis? Why isn't there any angst over lack of parallelized applications and engines? Why is IT still buying multicore servers when they aren't doing development that can utilize that architecture?

    One answer has to do with virtualization. The hypothesis is that the initial transition to multicore has been welcomed with open arms as IT managers sought a way to reduce power and cooling requirements, and multicore servers combined with virtualization software made that easy, inexpensive, and readily available. "Yes boss, we had a problem with heat and power, but we have taken care of it." Multicore crisis averted, because these boxes helped solve a real problem for IT. A different problem, but still a real one.

    So the sales of 4 and 8 core servers have been robust and the hardware guys are happy, the software guys are happy, and IT is happy - what could possibly go wrong?

    It is that these sales will not continue at larger core counts. It doesn't take an MBA to understand that there are diminishing returns -- there is no reason to go to a 32 core server when 4 8-core servers will do the job. Those 4 are less expensive than the single 32-core machine, we have put our eggs in 4 baskets instead of one, and we already are following that pattern. So the crisis that didn't happen when single processor servers became multicore has been delayed until now, when there is no obvious reason for IT to buy the larger core count boxes.

    For companies like HP, AMD, and Intel, this bump in the road is coming later than predicted, but it is coming -- virtualization did not eliminate it. The smart people in these companies are beginning to ask if the forestalled crisis is arriving, and how they can help IT continue to gain business value from continued investment in larger core count machines.

    What are you going to do with a 32-core machine? Break it up into 32 app servers? Or take advantage of the power of concurrent programming to enable high performance commercial computing (HPCC)?

  • the zettaflop, the yottaflop and the xeraflop

    At a recent industry conference, it was shown that the biggest growth area in computing is HPC, or High Performance Computing. This was surprising, even to those in this field, as it has historically been a fairly small, insular, academic area populated by geeky professors and hard-working grad students, all using Fortran.

    But a new phrase has become common, HPCC, or High Performance Commercial Computing. This is an attempt to make a distinction between the academic world and the business world, while still acknowledging their joint interests. But how closely aligned are these two spaces?

    From our perspective, not so much. Academics use clusters, running Fortran. A speaker at a Supercomputing conference I attended pounded the lectern and declared, "Fortran is good enough, dammit!" Academics use grad students, who are very smart, are essentially immobile, and are virtually slave labor, and while performance matters to all, academics don't really care how long the overall project takes, as you get 7 years to write your dissertation.

    In the commercial space, Java is the most used application language, people are diverse in their abilities, and they cost a lot. Worse yet, if they don't like their situation, they quit and move. And projects have far shorter lifespans -- if the ROI doesn't meet the hurdle rate, the project is killed.

    These thoughts were triggered by a New York Times article, Military Supercomputer Sets Record

    "Solving that programming problem is important because in just a few years personal computers will have microprocessor chips with dozens or even hundreds of processor cores. The industry is now hunting for new techniques for making use of the new computing power. Some experts, however, are skeptical that the most powerful supercomputers will provide useful examples."

    Maybe that skepticism is justified, what do you think??

  • Explaining our secret sauce - Part Two

    The first time I blogged on this, Explaining our secret sauce, I referenced the Wikipedia entry for "dataflow programming".

    I have subsequently learned that it would be more accurate to point to Kahn Process Networks as the specific type of dataflow that was the genesis for our Pervasive DataRush library and engine.

    "KPN is a common model for describing signal processing systems where infinite streams of data are incrementally transformed by processes executing in sequence or parallel. Despite parallel processes, multitasking or parallelism are not required for executing this model.

    In a KPN, processes communicate via unbounded FIFO channels. Processes read and write atomic data elements or tokens from and to channels. Writing to a channel is non-blocking, i.e. it always succeeds and does not stall the process, while reading from a channel is blocking, i.e. a process that reads from an empty channel will stall and can only continue when the channel contains sufficient data items (tokens). Processes are not allowed to test an input channel for existence of tokens without consuming them. Given a specific input (token) history for a process, the process must be deterministic so that it always produces the same outputs (tokens). Timing or execution order of processes must not affect the result and therefore testing input channels for tokens is forbidden."

  • Multi-threaded development joins Gates as yesterday's man

    "...the validity of multi threading is under attack. Veteran programmer Knuth said in a recent interview that multi threading may not be up to the task and could fail. As such, he is "unhappy" with the current trend towards multi-core architectures."

    “To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multi threading idea turns out to be a flop..." Knuth said."

  • Analysis -- the "Plastics" of today

    Before our current crop of graduates were born, there was the movie "The Graduate". One famous scene has our hero walking around at a party in his honor. He is given some sage career advice: "One word. Plastics".

    The newest member of the DataRush team, who is still a student, found the following quote and thought it was interesting and very DataRush related. Hal Varian, Google’s chief economist (and intellectual superhero), says on a Freakonomics blog post:

    “If you are looking for a career where your services will be in high demand, you should find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap. So what’s getting ubiquitous and cheap? Data. And what is complementary to data? Analysis. So my recommendation is to take lots of courses about how to manipulate and analyze data: databases, machine learning, econometrics, statistics, visualization, and so on.”

  • The X=X+1 Issue

    A very clear explanation of why writing parallel apps is a challenge with procedural languages, versus declarative approaches. Although Java is procedural, our Java implementation of dataflow is avoids this pitfall.

    "The ability to assign a memory location with variable data is the cornerstone of computer programming. It turns out the ability to re-assign that same memory location is perhaps one of the biggest consternations in the parallel programming world. At the time, multiple assignment seemed like (and was) a good idea. Coupled with a looping control structure, massive calculations could be performed. For instance, one could write a program to sum the numbers from 1 to 100 (and beyond with simple variable change):"

    Let SUM = 0
    FOR I = 1 to 100
    LET SUM = SUM + I
    NEXT

    "While incredibly useful, multiple assignment creates “state” within the program because the value of variables can change as program runs. The typical programming language allows you to manage the state of program variables. i.e. the memory location that holds the variable SUM changes over time. As any programmer will tell you, program state is easily managed on a single Von Neuwman CPU through the plethora of programming languages. (Some may argue this point, however.) On a large number of CPUs, managing coupled program states becomes extremely difficult. In a parallel environment, it is up to the programmer to manage the state of I and SUM. In a sequential single CPU environment, the programmer can assume I = 100 after the loop completes. In a parallel environment, where the loop may have been broken into parts, I may equal 10 on one CPU, and 20 on another. Of course, the example is simple and obvious. Creating large complex parallel applications does not present such luxuries, however."

    "The simple powerful idea of multiple assignment has come back to bite us in the asymptote."

  • Multicore is most disruptive

    Gartner has published a list of the top 10 most disruptive technologies of the next 5 years, and multicore is number one.

    You can see the presentation here:

    Top-10 Disruptive Technologies for 2008 to 2012 Speaker: Carl Claunch & David Cearley

    View presentation

    We would probably include cheap storage as the second item on the list - Fry's has 1-Terabyte drives for $169 this week - but the combination of cheap storage and multicore power is surely a disruptive pairing. Why is more power disruptive? It is because that additional capability is being provided in a new form, and this changes all the steps that come before it in the IT work flow. The old ways of utilizing this resource don't work, and the switch from a procedural code-path to a data flow architecture is certainly disruptive to all the programmers, analysts, coders, and architects that have to construct applications that run on these new platforms.

    Is your team ready?

More Posts