Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.

June 2011 - Posts

  • Pervasive Not Only Presenting But Introducing Pervasive TurboRush for Hive at Hadoop Summit

    Pervasive Software’s Chief Integration Technologist Paul Dingman will be a presenter at Hadoop Summit 2011 on June 29, 2011, in Santa Clara, CA. Paul is participating in the Application and Research track and will present from 1:15 pm to 1:45 pm PDT. Paul's presentation is "Hadoop on a Personal Supercomputer."

    He will discuss the potential advantages of running Hadoop on single many-core machines with large disk arrays, illustrating cases where Hadoop running on one, or a few, fat nodes can deliver faster results and be more cost effective than Hadoop running on a greater number of lower end machines. He will also discuss opportunities to exploit intra-server parallelism to improve task communication and coordination overhead.

    The Yahoo Hadoop Summit Agenda is now posted—and you can register online.
     
    Big News to Expect at the Hadoop Summit: We’ll Introduce Pervasive TurboRush™ for Hive

    As more companies adopt Apache Hadoop software for big data preparation and analytics, many are using Apache Hive for its ability to run SQL-like queries and analyze large datasets stored in the Hadoop file system. 

    We saw the opportunity to provide a “turbocharger” to make Hive queries run more efficiently, without developers having to modify them or learn another tool. At the Hadoop Summit the Pervasive DataRush team will introduce Pervasive TurboRush for Hive. Queries run faster on less hardware, all without code changes. Stay tuned!

    You’ll also hear news about the Pervasive DataRush, Community Edition for Hadoop – a version of Pervasive DataRush for developers doing prototyping in Hadoop and looking to get better performance in their Map/Reduce cluster jobs. If you’re prototyping a Hadoop job and you’re sensitive to scaling and containing costs, this edition is for you. We’ll offer users a managed community site with forum, as well as support.
     
    And Don’t Forget: BigDataCamp the night before the Hadoop Summit

    The evening before Hadoop Summit 2011 users of Hadoop and other big data-related technologies will exchange ideas in a fast-paced, peer-driven “unconference.” BigDataCamp will be a knowledge transfer and networking opportunity for data engineers, enterprise architects, developers, analysts, data miners and business intelligence professionals. Led by CloudCamp's Dave Nielsen, attendees will share their thoughts in open discussions with pre-defined and majority-vote topics, including best practices in application development and advanced analytics. Pervasive DataRush Chief Technologist Jim Falgout will present a brief lightning talk, as will Concurrent Founder and CTO Chris Wensel, Foursquare Engineer Ben Lee and others. 

    Supported by Pervasive DataRush, Aster Data, Concurrent and EMC, BigDataCamp will be on June 28, 2011, from 5:30 PM-10:00 PM (PT), at the Network Meeting Center at Techmart.

    Tickets are going fast so you may want to register now.


     

  • Big Data Analytics Digest: Receive the latest news, thought leadership and Pervasive Big Data Analytics activity through RSS

    If you’ve explored our website lately, you may have noticed that we now offer a daily news aggregator called Big Data Analytics Digest. It’s accessible on our home page, too.

    We avidly follow the torrent of news and commentary in the realm of Big Data Analytics, so we rolled up our sleeves and created the digest to give our customers, industry watchers and analysts a glimpse of what we find interesting in the ever-changing Big Data Analytics scene—including breaking Hadoop-related news.

    So, if you subscribe, what will you get?

    Links to the latest Big Data news from the likes of BeyeNetwork, Computerworld, Enterprise Irregulars, Forrester, GigaOM, Infoworld, IT Business Edge, O’Reilly Radar, ReadWriteWeb, SDTimes, TechCrunch, The 451 Group and others.

    Thought leadership from the leading names in Big Data Analytics, including Pervasive DataRush Chief Technologist Jim Falgout and Pervasive CTO Mike Hoskins. 

    (FYI – if you haven’t had a chance to read Jim’s widely read article on enhancing existing applications with embedded analytics in eWeek, take a look.)

    A source for upcoming Big Data industry events and presentations

    Technical resources

    Updates on the companies and products shaping Big Data Analytics

    Pervasive DataRush news PLUS updates from our Innovation Labs 

    In addition, we offer a comprehensive blogroll that will keep you in touch with the leading bloggers on the Big Data Analytics market.

    I’m open to all suggestions and criticisms to make the digest even better. Just drop me a line at joe.dubin@pervasive.com.


    Joe Dubin,
    Pervasive DataRush Product Manager

  • Pervasive DataRush Chief Technologist Jim Falgout to Speak at AMD Fusion Summit on June 14

    Jim Discusses Leveraging Multicore Systems for Hadoop and HPC Workloads

    Check out Pervasive DataRush Chief Technologist Jim Falgout at the AMD Fusion Developer Summit June 13-16 at Meydenbauer Center in Bellevue, WA. The Summit includes more than 90 technology sessions across eight Technology Topics. AMD tech leaders, industry experts, and members of academia lead the sessions.

    Jim’s presentation, Session 1421 slated for Tuesday June 14th at 5:15 pm, will center on the critical importance of unlocking the parallelism of multi-core servers and clusters, particularly Hadoop-based clusters, to solve big data problems. Despite the promise of scaled-out hardware, users are encountering long processing times and complexities building MapReduce jobs.

    Jim argues that the right approach to solving these challenges is to better exploit the performance potential of multicore. His presentation includes examples of Hadoop and HPC workloads running on clusters and single multi-core servers—including web analytics and bioinformatics—detailing the performance and energy efficiency gains that are possible with scaling up, as well as out.

    Learn more about the technology behind his approach.

    As mentioned, the Summit covers eight technology topics including:

    Developer Tools: Covers development tools ranging from compilers and debuggers to performance visualization tools. Sessions cover the state of the art in compiler technology (CPU and GPU), debugging and profiling OpenCL™, and automatic data movement.

    Enterprise Computing: Features sessions that discuss using multicore technology to handle large data, showcase software being developed today utilizing multicore CPUs, and show early work of applying the data parallel capabilities of GPUs to databases.

    High Performance Computing: Presents a sampling of portable and standards based heterogeneous computing. Come see innovative uses of GPUs, extreme optimizations, power efficient implementations, benchmarks, libraries, and real world applications in physics, chemistry, finance and rendering.

    Multimedia Processing: Sessions on image processing, audio processing, video processing, telepresence, video quality enhancement, computer vision, transcoding, content recognition, image retrieval, multimedia algorithm optimization for parallel processing and codecs.

    Professional Graphics and Visual Computing: Focuses on various areas of visual computing, including mixed-mode OpenGL/DX/OpenCL™ interoperability, and advanced rendering and compute techniques.

    Programming Models: Showcases the state of the art in parallel programming models and techniques for heterogeneous platforms. Topics covered include: programming models for next generation GPU architectures and techniques for building domain specific languages on heterogeneous platforms.

    Security: Sessions on password recovery and audit, encryption, and steganography detection.

    User Interface and Media Experiences: A focus on gesture recognition, touch recognition, face recognition, UIs for new user experiences, video management, video playback, and Web user experiences.

    There’s something for everyone. The Pervasive DataRush team hopes to see you at the AMD Fusion Developer Summit!

     

  • A Detailed Summary of McKinsey Report on Big Data

    We wanted to share a detailed summary of the report on Big Data by McKinsey Global Institute (MGI) that was released last month, as it contains relevant points and interesting statistics.
     
    The report describes the state and growing role of digital data that has now entered every sector and economy, as well as the impact of the growing amount of data. MGI claims that there is strong evidence that Big Data can contribute significantly to national economies, creating substantial value for the overall world economy. Their research suggests that the public sector can increase its productivity through effective use of Big Data. For instance, the value to the US healthcare system could be $300 billion a year, and US retailers could boost their operating profit margins by 60 percent. However, MGI notes the challenges organizations face with reaching the full potential of Big Data, such as limited analytical and managerial talent to make big data advantageous and valuable for businesses.
     
    Some relevant key findings from the research include:

    1.     MGI estimates the new data stored by enterprises exceeded 7 exabytes of data globally in 2010. In addition, the new data stored by customers around the world exceeded 6 exabytes in the same year.

    2.     Organizations are using Big Data analytics more to make decisions by analyzing datasets, including from mobile and social networks, on customers, employees and sensors embedded in products. This is leading to innovation of new business models, products and services. For example, there will be a better match between products and customers.

    3.     The use of Big Data will encourage new growth opportunities and competition among businesses.

    4.     Some sectors are positioned for greater gains from the use of Big Data. Those sectors include computer and electronic products, financial and insurance, and government. Public sectors, such as the education vertical, have experienced negative productivity growth due to high systemic barriers.

    5.     Concerns over the use of Big Data include data policies, privacy issues, developing new techniques and technologies, organizational change and talent, access to and integration of information from various data sources, and industry structure.

    6.     Under techniques for analyzing Big Data, MGI listed various methods from statistics and machine learning for data mining, including association rule learning, cluster analysis and classification. Data fusion and data integration are also significant techniques that allow analysis of data from multiple databases to extract valuable insight.

    7.     Visualization supports Big Data and is becoming more important to portray information in a consumable way for people to understand.
     
    Please see below for a direct link to the full report. The report includes an overview of Big Data techniques and technologies, the potential of Big Data and key findings in five types of verticals (health care, public sector administration, retail, manufacturing and personal location data), and the implications for organization leaders and policy makers.
     
    McKinsey Global Institute, May 2011
    Big data: The next frontier for innovation, competition, and productivity

     

More Posts