Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.

What could you do with a 100x performance improvement?

Traditionally Java Performance has always been a bit of a misnomer.  In the early days of Java 1.1, performance was a secondary consideration after ease of programming and usability.  But the last few years have seen some amazing performance enhancements in the JVM.  Escape Analysis, Compressed References, and other JDK7 performance enhancements.  I'm not even including the myriad of smaller changes that the Sun JVM engineers are working on.

As a Java developer, you don't have to wait for JVM improvements to get better performance out of your code.  There are best practices on how to use the Java language (Java Concurrency, Java Performance Tuning) and specialized hardware such as Azul Systems that can help.  But what if after doing all sorts of optimizations you're code STILL isn't running as fast as you want it?  Especially with regards to using multiple cores, which can be difficult to fully utilize.

The DataRush team has been working on DataRush 5.0 which is scheduled to be released sometime in the middle of 2010.  Until the release of the new version of DataRush, I thought that showing you some of the performance that we're seeing with the current builds.  All of the applications were run on the following system config:

  • 4p/24c AMD Opteron 8435 2.6GHz
  • 64gb DDR2
  • Windows 2008 R2 64bit
  • Java6_u16 64bit Server JVM

The following algorithms were run:

The run time for each algorithm were:

  • Naive Bayes
    • Learner - 3.6 seconds
    • Predictor - 7.8 seconds
  • Kmeans - 3.2 seconds

The DataRush engine fully utilized all twenty four cores through the complete run!  The Naive Bayes algorithm was run on an 8gb data file, while the Kmeans was run with a 3.2gb data file.  There are more algorithms that the DataRush team is working on and these are the first set of impressive results. 

What does this mean for your business?  Imagine being able to do complex calculations, analytic and other algorithms in seconds!  Instead of waiting to run calculations overnight, you could run computations as needed and as often as needed.  Update models in near real time, taking different inputs and view results as they happen to make better business decisions based on your whole data set instead of a sample.  If this sounds interesting, head over to Pervasive DataRush to get your two week trial of DataRush.

Comments

No Comments

About azeemj

Azeem Jiva is currently the software development manager for Pervasive DataRush. He leads a team of highly skilled developers to publish innovative and leading edge parallel programmed data processing platforms and applications designed to fully utilize multicore processing. Prior to joining Pervasive, Azeem contributed to the development of the Sun HotSpot JVM compiler while working at Sun and AMD. He has submitted US Patents for code generation, virtualization and GPU. Azeem is also a respected member of the Java development community, is widely published and speaks regularly at industry conferences.