Pervasive
Sign in | Join | Help
in

Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.
  • Diminishing returns from virtualization will affect larger core count server sales

    I just got back from an HP show where I had some interesting conversations regarding the crisis that hasn't seemed to have happened yet.

    The key to the Sherlock Holmes mystery, "The Hound of the Baskervilles" was this: Why the dog didn't bark?

    Why hasn't the so-called multicore crisis been seen as a crisis? Why isn't there any angst over lack of parallelized applications and engines? Why is IT still buying multicore servers when they aren't doing development that can utilize that architecture?

    One answer has to do with virtualization. The hypothesis is that the initial transition to multicore has been welcomed with open arms as IT managers sought a way to reduce power and cooling requirements, and multicore servers combined with virtualization software made that easy, inexpensive, and readily available. "Yes boss, we had a problem with heat and power, but we have taken care of it." Multicore crisis averted, because these boxes helped solve a real problem for IT. A different problem, but still a real one.

    So the sales of 4 and 8 core servers have been robust and the hardware guys are happy, the software guys are happy, and IT is happy - what could possibly go wrong?

    It is that these sales will not continue at larger core counts. It doesn't take an MBA to understand that there are diminishing returns -- there is no reason to go to a 32 core server when 4 8-core servers will do the job. Those 4 are less expensive than the single 32-core machine, we have put our eggs in 4 baskets instead of one, and we already are following that pattern. So the crisis that didn't happen when single processor servers became multicore has been delayed until now, when there is no obvious reason for IT to buy the larger core count boxes.

    For companies like HP, AMD, and Intel, this bump in the road is coming later than predicted, but it is coming -- virtualization did not eliminate it. The smart people in these companies are beginning to ask if the forestalled crisis is arriving, and how they can help IT continue to gain business value from continued investment in larger core count machines.

    What are you going to do with a 32-core machine? Break it up into 32 app servers? Or take advantage of the power of concurrent programming to enable high performance commercial computing (HPCC)?

  • the zettaflop, the yottaflop and the xeraflop

    At a recent industry conference, it was shown that the biggest growth area in computing is HPC, or High Performance Computing. This was surprising, even to those in this field, as it has historically been a fairly small, insular, academic area populated by geeky professors and hard-working grad students, all using Fortran.

    But a new phrase has become common, HPCC, or High Performance Commercial Computing. This is an attempt to make a distinction between the academic world and the business world, while still acknowledging their joint interests. But how closely aligned are these two spaces?

    From our perspective, not so much. Academics use clusters, running Fortran. A speaker at a Supercomputing conference I attended pounded the lectern and declared, "Fortran is good enough, dammit!" Academics use grad students, who are very smart, are essentially immobile, and are virtually slave labor, and while performance matters to all, academics don't really care how long the overall project takes, as you get 7 years to write your dissertation.

    In the commercial space, Java is the most used application language, people are diverse in their abilities, and they cost a lot. Worse yet, if they don't like their situation, they quit and move. And projects have far shorter lifespans -- if the ROI doesn't meet the hurdle rate, the project is killed.

    These thoughts were triggered by a New York Times article, Military Supercomputer Sets Record

    "Solving that programming problem is important because in just a few years personal computers will have microprocessor chips with dozens or even hundreds of processor cores. The industry is now hunting for new techniques for making use of the new computing power. Some experts, however, are skeptical that the most powerful supercomputers will provide useful examples."

    Maybe that skepticism is justified, what do you think??

  • Explaining our secret sauce - Part Two

    The first time I blogged on this, Explaining our secret sauce, I referenced the Wikipedia entry for "dataflow programming".

    I have subsequently learned that it would be more accurate to point to Kahn Process Networks as the specific type of dataflow that was the genesis for our Pervasive DataRush library and engine.

    "KPN is a common model for describing signal processing systems where infinite streams of data are incrementally transformed by processes executing in sequence or parallel. Despite parallel processes, multitasking or parallelism are not required for executing this model.

    In a KPN, processes communicate via unbounded FIFO channels. Processes read and write atomic data elements or tokens from and to channels. Writing to a channel is non-blocking, i.e. it always succeeds and does not stall the process, while reading from a channel is blocking, i.e. a process that reads from an empty channel will stall and can only continue when the channel contains sufficient data items (tokens). Processes are not allowed to test an input channel for existence of tokens without consuming them. Given a specific input (token) history for a process, the process must be deterministic so that it always produces the same outputs (tokens). Timing or execution order of processes must not affect the result and therefore testing input channels for tokens is forbidden."

  • Multi-threaded development joins Gates as yesterday's man

    "...the validity of multi threading is under attack. Veteran programmer Knuth said in a recent interview that multi threading may not be up to the task and could fail. As such, he is "unhappy" with the current trend towards multi-core architectures."

    “To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multi threading idea turns out to be a flop..." Knuth said."

  • Analysis -- the "Plastics" of today

    Before our current crop of graduates were born, there was the movie "The Graduate". One famous scene has our hero walking around at a party in his honor. He is given some sage career advice: "One word. Plastics".

    The newest member of the DataRush team, who is still a student, found the following quote and thought it was interesting and very DataRush related. Hal Varian, Google’s chief economist (and intellectual superhero), says on a Freakonomics blog post:

    “If you are looking for a career where your services will be in high demand, you should find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap. So what’s getting ubiquitous and cheap? Data. And what is complementary to data? Analysis. So my recommendation is to take lots of courses about how to manipulate and analyze data: databases, machine learning, econometrics, statistics, visualization, and so on.”

  • The X=X+1 Issue

    A very clear explanation of why writing parallel apps is a challenge with procedural languages, versus declarative approaches. Although Java is procedural, our Java implementation of dataflow is avoids this pitfall.

    "The ability to assign a memory location with variable data is the cornerstone of computer programming. It turns out the ability to re-assign that same memory location is perhaps one of the biggest consternations in the parallel programming world. At the time, multiple assignment seemed like (and was) a good idea. Coupled with a looping control structure, massive calculations could be performed. For instance, one could write a program to sum the numbers from 1 to 100 (and beyond with simple variable change):"

    Let SUM = 0
    FOR I = 1 to 100
    LET SUM = SUM + I
    NEXT

    "While incredibly useful, multiple assignment creates “state” within the program because the value of variables can change as program runs. The typical programming language allows you to manage the state of program variables. i.e. the memory location that holds the variable SUM changes over time. As any programmer will tell you, program state is easily managed on a single Von Neuwman CPU through the plethora of programming languages. (Some may argue this point, however.) On a large number of CPUs, managing coupled program states becomes extremely difficult. In a parallel environment, it is up to the programmer to manage the state of I and SUM. In a sequential single CPU environment, the programmer can assume I = 100 after the loop completes. In a parallel environment, where the loop may have been broken into parts, I may equal 10 on one CPU, and 20 on another. Of course, the example is simple and obvious. Creating large complex parallel applications does not present such luxuries, however."

    "The simple powerful idea of multiple assignment has come back to bite us in the asymptote."

  • Multicore is most disruptive

    Gartner has published a list of the top 10 most disruptive technologies of the next 5 years, and multicore is number one.

    You can see the presentation here:

    Top-10 Disruptive Technologies for 2008 to 2012 Speaker: Carl Claunch & David Cearley

    View presentation

    We would probably include cheap storage as the second item on the list - Fry's has 1-Terabyte drives for $169 this week - but the combination of cheap storage and multicore power is surely a disruptive pairing. Why is more power disruptive? It is because that additional capability is being provided in a new form, and this changes all the steps that come before it in the IT work flow. The old ways of utilizing this resource don't work, and the switch from a procedural code-path to a data flow architecture is certainly disruptive to all the programmers, analysts, coders, and architects that have to construct applications that run on these new platforms.

    Is your team ready?

  • Explaining our secret sauce

    We wish we had some competition.

    If there was some other vendor implementing dataflow, then we would not have the sole responsibility of teaching people about this approach. While there are certainly a large number of people who are familiar with the concept, most of the people we are meeting in presenting DataRush are not.

    When we explain and illustrate the phenomenal results (as in 3 hours to 22 seconds), often the first question is "How can this be?" My response has been to point them to the Wikipedia entry for dataflow programming.

    "Dataflow languages contrast with the majority of programming languages, which use the imperative programming model. In imperative programming the program is modeled as a series of operations, the data being effectively invisible. This distinction may seem minor, but the paradigm shift is fairly dramatic, and allows dataflow languages to be spread out across multicore, multiprocessor systems for free."

    Check it out as a good place to start!

  • The tubas are blaring and the drums are pounding.

    I just returned from our first presentation of DataRush to potential partners, SIs, and developers. While we have exhibited at a bunch of shows like JavaOne, Innotech, and the HP Global Showcase, and while we have briefed a bunch of smart press people and analysts, this was our first time with this type of audience.

    Overall, we were very pleased at the interest and response to our story. The IT environment is looking for practical solutions to utilizing the multicore power that is flooding into the market, and these people see the tremendous opportunity inherent in platform transitions.

    Today's Fry's ad shows a Gateway quad-core for $560.

    Charles DeGaulle said that to be successful, one needs to "find a parade and get in front of it", and this parade is already marching down the street.
  • Low-Latency Technology Outpacing Programmers’ Capabilities

    "Again and again, executives said that finding enough programmers who are able to write "parallel" code -- programs that efficiently divide workloads across distributed processors -- is almost impossible. As Wall Street firms rely on multicore processing and even distributed computing to handle the ever-growing number of trade-related messages that are sensitive to any increase in data latency, the divergence between the capabilities of the technology and the capabilities of the programmers is becoming painfully evident.",/i>

  • The Lawnmower Law

    This article is a simple illustrative introduction to Amdahl's Law.

    "Indeed, just as with parallel processors, there is a point of diminishing return. Adding the first 10 riding mowers reduced the time by 36 minutes. Adding another 30 only saved me 4 minutes. Adding 100 mowers makes little sense since I’ll never get below 20 minutes. (Although I would love to see such a lawn mowing demolition derby — in my neighbors yard of course.)"

  • Intel Itanium to go quad-core in early 2009

    I didn't know there was an Itanium Solutions Alliance, but this article includes the road map for Intel's continued commitment to Itanium, notably, going to a single-die quadcore.

    I noticed a display for the Itanium JVM in the Intel booth at JavaOne, and it is also mentioned in the article, but it doesn't specify the OS. We are looking forward to testing on it as soon as we can.

    As you can see here, we achieved great results in our scalability tests on the 32-core HP Integrity server running the HP-UX JVM, developed by HP.

    These unit growth numbers are very impressive as well:"...On another front, the Itanium Solutions Alliance announced that worldwide annual Itanium-based factory system revenue and system volume continued to grow in 2007, with a year-over-year increase of 30.8 and 36.3 percent, respectively. The Asia-Pacific region led the way, with year-over-year growth in factory system revenue and system volume of 61 percent and 45 percent, respectively."

  • Back from JavaOne

    We all made it back without anyone getting the norovirus, and it was a really great show for us. We were excited to be invited to show DataRush in the AMD booth and to participate in their keynote talk. It is clear that lots of developers, architects, and project managers are feeling the pain of parallel programming, and are realizing that this challenge is real, is not going to go away, and that they have to find alternative approaches.

    I was too busy to walk the floor, but it was fun for me to talk to the Tommy II autonomous vehicle team and hear firsthand about the DARPA challenge. What caught your eye?
  • Multicore Parallel Programming: Can We Please do it Right This Time? – IEEE Electronic Design Processes Workshop 2008

    Dr. Tim Mattson of Intel is someone to whom we listen closely. He has a better sense of the current situation than nearly anyone else we have spoken with, and he understands and appreciates our efforts with DataRush. He recently spoke at the IEEE DATC Electronic Design Processes Workshop, and this article highlights some of his points.

    Mattson demolished another favorite idol of computer science: the parallel programming language. It’s a well-known trait of computer scientists that they will try to solve every problem with a new programming language. The parallel programming problem is no exception. Mattson displayed an eye chart listing the names of nearly 250 parallel programming languages developed just during the 1990s. “This is silly,” said Mattson, “If creating a new language was the solution, the problem would already be solved. This is not the path to a solution.”

  • Programming Language popularity

    Although less frequently now, cccasionally we are still asked about our choice of Java as the language we used to create DataRush. There were many reasons, all valid, but my initial response when I heard the question was why not Java? I had not understood how much market perception still existed around the original impressions from a decade ago -- Java is slow, is big, is constraining. But those who are actively writing applications today are clear on the reality of Java today. We were recently told that in one of our most interesting arenas, financial services, over 50% of new applications are being developed in Java, for example. But the topic still comes up, and I was pleased to be pointed to this link showing the top 10 languages over time.

    The TIOBE Programming Community index gives an indication of the popularity of programming languages. The index is updated once a month. The ratings are based on the number of skilled engineers world-wide, courses and third party vendors. The popular search engines Google, MSN, Yahoo!, and YouTube are used to calculate the ratings.

More Posts Next page »
© 2008 Pervasive Software Inc. All Rights Reserved.