Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.

June 2010 - Posts

  • Need to Boost Your Analytics?

    You probably understand that Pervasive DataRush is a fast processing engine and is great for data preparation, but did you know that the just released version 4.4 also includes an analytics platform?  Pervasive DataRush for Analytics now allows users to analyze their entire gigantic data set (instead of just a sample) and make use of the newly-available core analytics library to perform data mining operations leveraging the high throughput and scalability of the Pervasive DataRush engine. 

    The new DataRush Core Analytics Library includes several classifiers including Naïve-Bayes, KNN, and decision trees; clustering using a k-means algorithm; association rule mining (ARM); linear regression, logistic regression, multiple regression and polynomial regression; and PMML model support.    

    Pervasive DataRush for Analytics helps organizations dramatically reduce the time necessary to do complex analysis of big data on a single multicore server.  A recent Pervasive DataRush case study shows how the ability to read production parameters and adjust production optimization models quickly with DataRush for Analytics in seconds versus minutes or hours positively affected their bottom line.  Now organizations have the ability to process complex risk calculations faster than before with higher accuracy, which is a huge advantage for timely decision support.  

    The enhanced 4.4 release also includes expanded data preparation capabilities, a JAVA SDK to extend and customize data preparation and analytics capabilities, and a JavaScript-based scripting interface to help you take control of your ever-growing data.    


     

  • Hadoop as a stepping stone ...

    Gordon Brown posted a blog on Redfin commenting on a presentation by Jeff Hammerbacher discussing the use of Hadoop at Facebook.Having solved many big data problems at Facebook, Jeff has great credibility in this area. Jeff discusses how Hadoop was used to replace a massive, centralized data warehouse that was pushed to its limits. Hadoop definitely saved the day at Facebook. With Hadoop, Facebook as been able to process the huge amounts of data required to continue adding needed features to the site.

    Jeff makes several interesting points about Hadoop. I think one of the more interesting is that he views Hadoop as a stepping stone to what's really needed: an extremely flexible approach to large data problem solving. He also points out that once developers moved away from the constraint heavy environment of SQL into the more open environment of Hadoop, their creativity went up, way up. Remove unnecessary constraints and innovation follows.

    Jeff mentioned dataflow as an architecture made for big data processing. Working on Pervasive DataRush, I couldn't agree more. DataRush is based on a dataflow architecture. This enables building big data applications that can handle the volumes in a scalable manner. Not only that, the programming model is very flexible and easy to learn. To Jeff's point, DataRush can remove some of the constraints and inflexibility of a limited Hadoop programming model. As always innovation will follow.

    But can DataRush handle the volumes of Facebook? Perhaps not at the highest levels of Hadoop, at least not today. But, we have shown incredible throughput numbers of 2 Terabytes per hour on the Malstone B10 benchmark. And that is on a single machine, based on Intel 7500 processors. This 32-core box outperformed a 20-node (80-core) cluster by a factor of 26. As compute density increases dramatically, scaling up to take advantage of many-core systems is imperative and pushes out the need to invest in a cluster for handling big data. Scaling up is where DataRush started and continues to excels today.

More Posts