Gordon Brown posted a blog on Redfin commenting on a presentation by Jeff Hammerbacher discussing the use of Hadoop at Facebook.Having solved many big data problems at Facebook, Jeff has great credibility in this area. Jeff discusses how Hadoop was used to replace a massive, centralized data warehouse that was pushed to its limits. Hadoop definitely saved the day at Facebook. With Hadoop, Facebook as been able to process the huge amounts of data required to continue adding needed features to the site.
Jeff makes several interesting points about Hadoop. I think one of the more interesting is that he views Hadoop as a stepping stone to what's really needed: an extremely flexible approach to large data problem solving. He also points out that once developers moved away from the constraint heavy environment of SQL into the more open environment of Hadoop, their creativity went up, way up. Remove unnecessary constraints and innovation follows.
Jeff mentioned dataflow as an architecture made for big data processing. Working on Pervasive DataRush, I couldn't agree more. DataRush is based on a dataflow architecture. This enables building big data applications that can handle the volumes in a scalable manner. Not only that, the programming model is very flexible and easy to learn. To Jeff's point, DataRush can remove some of the constraints and inflexibility of a limited Hadoop programming model. As always innovation will follow.
But can DataRush handle the volumes of Facebook? Perhaps not at the highest levels of Hadoop, at least not today. But, we have shown incredible throughput numbers of 2 Terabytes per hour on the Malstone B10 benchmark. And that is on a single machine, based on Intel 7500 processors. This 32-core box outperformed a 20-node (80-core) cluster by a factor of 26. As compute density increases dramatically, scaling up to take advantage of many-core systems is imperative and pushes out the need to invest in a cluster for handling big data. Scaling up is where DataRush started and continues to excels today.