Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.

Of teraflops and terabytes

Before the holidays, I attended SC '09, this year's supercomputing conference.  While supercomputing has traditionally been the domain of academia, the needs of business and scientific computing are converging.  The datasets and processing needs of companies are growing such that the sizes of the problems addressed in both worlds are approaching the same.  Hardware vendors are producing solutions which address both the need to store large volumes of data and to provide large amounts of computational power.  But to do anything, the data and the computation must - at some point in time - be co-located.  With conventional server computing, this isn't really a problem, since everything is local to the machine.  In the more scalable architectures, however, data can be non-local.  Furthermore, the cost of accessing remote data can be great (and even variable in grid-based architectures).  So can you bring the terabytes and teraflops together to solve your problems?

The traditional supercomputing solution is message passing (MPI), moving the data to the computation.  Nodes send messages to and receive messages from each other as needed to exchange data.  A very flexible, low-level approach.  But you also want to make sure you spend most of your time processing the data, not communicating with other nodes.  I attended a number of presentations discussing aspects of this, such as: optimizing the I/O during the initialization phase, load balancing the workload, assigning work to nodes to minimize data transfer costs, utilizing asynchronous messaging to reduce stalls in processing.

Of course, the opposite approach is possible too, moving the computation to the data. In the ideal case, this what happens in map/reduce models like Hadoop.  The computational and storage architecture is one; data is spread across the nodes, with each map running on local data.  This means that distribution of the data becomes part of the storage - it doesn't completely disappear, but becomes a one-time, up-front cost.

And what about dataflow?   As the name suggests, in the dataflow model the data moves.  In fact, you can view dataflow as a more structured form of MPI.  However, this structure provides one thing for free - pipelining allows computation and communication to overlap without special programmer effort.  This works equally well for both disk I/O - you can get good throughput on sequential reads from disk - and for cross-node communication.

None of these are a universal solution.  Each has strengths and weaknesses.  As stated, MPI provides flexibility, but also places most of the responsibility on the programmer.  On the other hand, while map/reduce and dataflow do much of this work transparently, they both require thinking about problems in a way which fit the paradigm - which may not be the intuitive approach to the solution.

 

Comments

No Comments