Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.

July 2009 - Posts

  • The Power of Social Media

    Beyond blogging, social media plays an increasingly important role for organizations.  Social media provides the chance for organizations to not only share information through multiple outlets, but to also connect on a more intimate level with our fans.  We all understand the importance of networking and cultivating relationships.  Well, social media is another way to connect and cultivate in the virtual world.


    Another bonus for social media is site optimization.  Most organizations are following SEO rules and may still be struggling for higher rankings.  One reason may be because SEO is an on-going process that needs to be updated and evolve for higher rankings....not a one-time fix it all.  Read more about blogging for SEO purposes. 


    So....what does that all have to do with Pervasive DataRush?  Here at PDR, we understand the importance of evolving with technology and getting our feet wet beyond ourselves.  There is so much information out there to share.  We want to share our thoughts on technology, business, marketing and more.  Stay connected and follow us on Twitter and Facebook, and more outlets in the near future.    Feel free to share your thoughts with us the soon-to-be old fashioned way ... by email

  • Back to Washington D.C. Next Week

    Data Rush is gaining considerable traction in the Federal marketplace.  We have recently added Summit Government Services as our partner to help develop our market presence.   I will be visiting the D.C. area again next week for more meetings with Systems Integrators who are seeing significant interest on how DataRush can provide significant value add to their current and new bid programs to support the Government.  Key interest points have been the ability of DataRush to quickly speed up applications areas where they are required to aggregate data from a variety of disparate data sources (both Military and Civilian)

    With the constant need to support the Terrorism threat, our Predictive Analytic capabilities are creating interest in the Intelligence communities.  With President Obama's goal of a One Payer health care system, we are being asked to support significant efforts in the ares of Data Quality and Data Matching.   Lots of good things happening here...with more to come.

     

    More to follow

     


  • Just returned from KDD’09 conference

    The Fifteenth ACM SIGKDD International Conference on Knowledge
    Discovery and Data Mining (KDD’09) in Paris, France was last week.
    The annual
    ACM SIGKDD conference is the premier international
    forum for data mining researchers and practitioners from academia,
    industry, and government to share their ideas, research results and
    experiences.  
    For several reasons, this year’s KDD was special.  First,
    it received a record number of 659 submissions, more than 10% up
    from last year.   Second, it marks the first time the event was held in
    Europe - beautiful Paris! Third, it received a record number of 4741
    complete valid entries out of 7877 total entries in the KDD Cup 2009
    competition.
    Pervasive Software had a team of five people in attendance and
    sponsored exhibitor booth#4 in the “
    Foyer Rives de Siene
    ”.

    In the booth, we ran demonstrations of our KDD’09 featured project
    “Pervasive’s Dataflow Solution to the Netflix Recommender System”
    on a HP 4-core (Quad core Intel Q9300 @2.53GHz) laptop.
    On Monday June 29th, the plenary invited talk by David Hand attacked
    the widespread use of AUC by exposing its fundamental incoherences
    and closing with a family of coherent alternative scores.  The evening Gala
    and Poster Session was hosted by the Mayor of Paris “Bertrand Delanoë”
    at the site of the City Hall of Paris: “Hotel de Ville”. The poster on
    MapReduce cluster implementation of the Bayesian Browsing Model for
    Petabyte-scale click log data was most interesting to me.  Using exact
    inference and single pass algorithm implementation, it reportedly
    performs 1.15 billion queries in 3 hrs.
    On Tuesday June 30th, a big day for Pervasive Software at KDD 2009!
    Pervasive Software was recognized as technology leader on two fronts:
    selected for panel on open standards in data mining (PMML) and
    selected for industrial research talk/presentation on parallel data mining.

    Panel: Open Standards and Cloud Computing
    Pervasive’s own CTO Mike Hoskins joins a distinguished panel of thought leaders
    in the data mining industry including
    DMG / Open Data Group, IBM, KNIME, KXEN,
    Microstrategy, SAS, SPSS and Pervasive Software.
     Mike’s presentation included:

    Talk: Pervasive Parallelism in Data Mining
    Industry Session Presentation that unveils unmatched runtime performance
    for its Dataflow Solution to the Netflix Recommender System.  The bottom
    line for this research is that the Pervasive DataRush parallelism engine
    produced movie recommendations with comparable accuracy 9-44 times
    faster than top Netflix Prize solutions.  Our solution predicts 100 million
    ratings in 16.31 minutes and achieved an RMSE of 0.88846.


    Most of the feedback I received on the presentation happened during
    the evening poster sessions.  From the academic perspective, parallel
    data mining is a hot topic. The scientific community will no longer settle
    for sampling or weeks of processing times. At the SciDAC 2009
    Conference in San Diego three weeks ago, scientists are solving
    exascale data and computational problems with parallelism and
    149,504 processing cores (Jaguar XT5). From the industrial
    perspective, the volumes of data, high cost of power and facilities
    has brought about infrastructure as a service, application as a service
    and platform as a service. Commodity multi-core, cloud and cluster
    computing are the options on the hardware but the software requires
    re-architecture using from coarse-grain to fine-grain parallelism. The
    impressive runtimes attracted attendees to our talk.  The dataflow
    computational model solved their data mining problems otherwise
    constrained by heap size. And the DataRush framework facilitated
    fine-grain parallelism for rapid algorithm development even for those
    without parallel programming experience.

     
    Overall at KDD09, Parallel Data Mining and scalable algorithms
    received special attention with a total of 13 talks (9.3% of talks)
    dedicated to this topic. The common thread was the need to
    efficiently mine gigabytes and terabytes of data in a timely fashion.
    It is time to have Industry and Research Track Sessions
    dedicated to High Performance and Parallel Data Mining!
      
    REFS:  13 KDD09 papers on scalable & parallel DM

    1.       Pervasive Parallelism in Data Mining: Dataflow solution to Co-clustering Large and Sparse Netflix Data (Srivatsava Daruru, Nena M Marin, Matt Walker, Joydeep Ghosh)

    2.       Demo D07 - SHIFTR: A Fast and Scalable System for Ad Hoc Sensemaking of Large Graphs (Duen Horng Chau, Aniket Kittur, Hanghang Tong, Christos Faloutsos, Jason I. Hong)

    3.       Parallel Community Detection on Large Networks with Propinquity Dynamics (Yuzhou Zhang, Jianyong Wang, Yi Wang, Lizhu Zhou)

    4.       Scalable Graph Clustering Using Stochastic Flows: Applications to Community Discovery (Venu Satuluri, srinivasan parthasarathy)

    5.       W02> SkyTree: Scalable Skyline Computation for Sensor Data (Jongwuk Lee, Seung-won Hwang)

    6.       W05> Scalable Clustering and Keyword Suggestion for Online Advertisements (Anton Schwaighofer, Joaquin QuiÒonero Candela, Thomas Borchert, Thore Graepel, Ralf Herbrich)

    7.       Scalable Pseudo-Likelihood Estimation in Hybrid Random Fields (Antonino Freno, Marco Gori, Edmondo Trentin)

    8.       BBM: Bayesian Browsing Model from Petabyte-scale Data (Chao Liu, Fan Guo, Christos Faloutsos)

    9.       Large-Scale Graph Mining Using Backbone Refinement Classes (Andreas Maunz, Christoph Helma, Stefan Kramer )

    10.   Social Influence Analysis in Large-scale Networks (Jie Tang, Jimeng Sun, Chi Wang, Zi Yang)

    11.   Large-Scale Behavioral Targeting (Ye Chen, John F. Canny, Dmitry Pavlov)

    12.   Mind the Gaps: Weighting the Unknown in Large-Scale One-Class Collaborative Filtering (Rong Pan, Martin Scholz)

    13.   Large-Scale Sparse Logistic Regression (Jun Liu, Jieping Ye, Jianhui Chen)

     

     

  • Jazoon 2009

    I was lucky enough to present and attend Jazoon 2009 in Zurich, Switzerland.  My talk How badly written optimizations can undo automatic JVM benefits was aimed at informing people on how attempting to apply optimizations at random can actually hurt performance and in some cases make it harder to read the code.  The talk itself was well received, and about 150 people showed up and listened to me talk.  The slides are up on the web page and I'd love comments.

    The first day I spent at the GlassFish Community Day and while I'm really more of a compiler guy, I did find the new features in Glassfish 3.0 to be quite interesting.  The new OSGi modularity is useful as it allows features installed and used as necessary without having to use everything that comes with GlassFish.  I'm also finding that as I learn more about dynamic languages, support for Grails and Rails in Glassfish is a good thing and should speed up adoption of these languages and platforms.  Overall it was an enlightening day all about GlassFish, which I thoroughly enjoyed.

    The rest of the conference was quite good, although a couple of the talks were geared towards promoting software that the speaker's company was selling rather than being a talk on how to do something cool and interesting in Java.  But even so there were some amazing talks, and especially the general session talks by Neal Ford called "Smithying in the 21st Century" and "What they don't teach you about software at school: Be Smart!" by Ivar Jacobson.  Both of these speakers are excellent and their talks were geared towards concepts that programmers should know beyond just programming.   

    Other talks I attended that were enjoyable included:

    A few personal things that I noticed during my time attending Jazoon.  There were many talks that discussed JavaFX and it was heavily discussed during the breaks by many of the attendees.  JavaFX is a game changer, and we'll have to see how this plays out in the months and years ahead as Silverlight and Flex react to JavaFX and the performance and ease of use that it brings.  The other technology that is big in the Java world is using the JVM to run other languages besides Java.  Clojure, and Scala were the big alternatives and had the largest presence and most discussion but JRuby, Jython and others were represented as well.  In a weird way these dynamic languages could be what save Java from being surpassed by newer technologies.   Clojure, Scala and other languages will use large set of Java libraries to fill in gaps and integrate with the JVM and Java as the platform may find new life to remain useful and competitive. 

    Posted Jul 07 2009, 11:57 AM by azeemj
    Filed under:
More Posts