Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.
  • We have moved

    We have moved our blogs here.
  • Making The Most Out Of Your Data: Big Data Opportunities

    Last month, Forrester Research released a report, "Expand Your Digital Horizon With Big Data,” for CIOs that focused on how they should approach big data in order to take full advantage of it for their businesses. It addressed how big data is influencing markets across industries and is prevalent in various business sectors, such as healthcare, web marketing and telecommunications. The report also discussed several factors for companies to consider when working with their data, such as redefining approaches to using data beyond traditional BI tools.

    Key points included how major potential challenges of big data include not only the cost of the technology but also the shortage of data scientists. Companies are starting to seek professionals with big data skills, stimulated by new pressures to scale large volumes of unstructured data. According to a recent Forbes article by Quentin Hardy, the “data scientist” phenomenon has recently started to emerge in big data conversations with debates currently over its definition, the skill set required and the data scientist’s role in the big data trend.

    Forrester analyst Brian Hopkins, who co-authored the report with Boris Evelson, Sharyn Leaver, Connie Moore, Alex Cullen, Mike Gilpin and Mackenzie Cahill, posted a brief summary of the report on his blog, emphasizing three key questions for companies to address in order to understand and create a big data plan: 1) What is new about big data? 2) What is it? and 3) How will it influence our market?


    Some interesting data points, findings and recommendations from the report include:

    1.       Forrester surveyed 60 of their clients who are using or experimenting with big data computing. 75% of the surveyed clients responded that data volume was the main reason for looking into big data solutions. 58% of respondents in Forrester's June 2011 Global Big Data Online Survey reported interest in insight driven by an analytics approach.

    2.       70% of respondents expressed interest in big data for managing current enterprise information. Therefore, many early adopters are using big data solutions to understand existing information and not new data sources.

    3.       Cost is the underlying theme for big data challenges. These challenges include:

    ·         Volume, in terms of data amount exceeding how it can be stored cost-effectively

    ·         Velocity, in terms of processing data fast enough for businesses to respond and adapt rapidly

    ·         Variety, in terms of integration costs of adding new data feeds and interpreting variable data structures

    ·         Variability, in terms of complex and highly variable data structures that complicate analysis

    4.       Big data is still in its early stage, and thus serves as a challenge to businesses. To succeed and overcome the challenges, businesses should:

    ·         Encourage collaboration between business and IT

    ·         Create new, agile and compliant processes and approaches to deliver big data solutions

    ·         Adapt quickly to fast-paced, changing technology trends for rapid growth 

    As the big data trend continues to flourish throughout industries, companies that tend to have a handle on managing and using their data are able to develop forward-looking strategies and gain a competitive advantage over their competitors. CIOs are facing increased pressure to figure out ways to make use of the company’s data to achieve meaningful insight for valuable business decisions. As a result, this brings up an important question to address: how is the CIO’s role evolving as new big data technologies emerge and IT spending in big data increases? Is your business feeling the pressure to join the competitive big data crowd?


    Pervasive Big Data encompasses
    Pervasive’s DataRush big data software platform for companies to consume high volume and variable data for complex analysis.  We have been watching this big data trend grow for the past two years and have built our tools to help with the challenges of processing and analyzing big data.

  • Is Your Big Data Problem Solved Yet?

    We’re certainly more informed these days. And even my grandmother is hearing of the challenges of big data. But after listening to DM Radio last week, Avoiding Bottlenecks and Hurdles in Data Delivery, the big data crisis appears to be subsiding according to Philip Russom. Or is it?

    Russom points out that the biggest bottleneck in big data is moving the data from processing into data flow. Old ways of processing data relied on the hardware. The faster the hardware then the faster the processing…right? Pervasive’s Big Data Director David Inbar pointed out that old software IS the bottleneck. Organizations are throwing more hardware to handle slow processes when it’s poorly written software causing the problem.

    So we have organizations that have overhauled their technology to stay ahead of the curve, but they haven’t updated their software to process in parallel with their new technology. Do they still sell single-core work machines? If it’s more than a single core machine then your software needs to process in parallel for the most high performance efficiency. Parallel processing is the ability to carry out multiple operations simultaneously. And parallel processing allows organizations to expand and handle bottlenecks of data traffic. Once you’ve parallelized your working environment, is your big data problem solved? Probably not.

    I’m not sure if the big data crisis is subsiding OR if big data awareness is more prevalent. Organizations are certainly starting to experience the IT side of big data, but what about educating the persons running the analytics? When Jeff Kelly wrote Data Scientists Are Rocking the Big Data World, he mentions that there are few formal training and educational programs that focus on big data and analytics. To get the right answer, you have to ask the right question. And to ask the right question, you have to understand the quality of your data.

    Hopefully parents in the high tech industry are coaxing their children and their educational system to jump on the big data train today because I have a feeling that it’s going to be a long ride.

  • Webinar: Big Data and Hadoop with guest speaker Jim Kobielus

    To help enterprises learn more about big data and how it fits with their traditional data warehouse and data mart, Pervasive DataRush and Karmasphere are hosting a webinar titled, "Big Data: The Role, Value and Best Practices of Hadoop," taking place September 7 at 9:00 a.m. PDT/1:00 p.m. EDT (for the Americas), and September 8 at 3:00 p.m. BST (for Europe). During the webinar, guest speaker James G. Kobielus, senior analyst with Forrester Research, Inc., will discuss the concept of big data, and the emerging software platform used for big data: Hadoop. Following the discussion, Pervasive DataRush and Karmasphere will each give a short overview of their big data offerings.


    Attendees can expect to learn:

    - Where Hadoop adds the most business value.
    - Best practices for using Hadoop.
    - How companies are combining Hadoop with standard data warehouses and data marts.
    - The Forrester model of the Hadoop ecosystem.

    It should be a lively discussion. If you would like to attend the webinar, please register here: http://lp.pervasive.com/BigDataWebinarWithJamesKobielus.html

  • BIG Thoughts on BIG Data

    We had the pleasure of meeting with David Linthicum last month to get his thoughts on Big Data.

     

    Many organizations are facing the reality that they have more data coming in than they can process data.  Apache Hadoop offers a big opportunity for businesses, but, according to David Linthicum, enterprises are in an experimental phase with Hadoop right now trying to learn how it works and how they can best use it. 

    The open-source yellow elephant has spawned a large following.  Yahoo! has recently launched a Hadoop-spinoff company called Hortonworks as a prime player in the Big Data game to launch enterprise-quality products.  Hortonworks presented recently  at OSCON Data 2011 where they revealed their current goals on Hadoop security and framework scalability.

    Software vendors are working on filling in the gaps in Hadoop to solve business problems for the enterprise.  Cloudera is aiming to make the Hadoop experience pain-free with its fully integrated Hadoop distribution (CDH) and management suite; Karmasphere provides big data analytics solutions; Pervasive DataRush optimizes Hadoop jobs to increase performance on any platform.

    Where is Hadoop going?  We are paving the path as we speak.  Stay tuned to this blog site for further details.  And follow the latest industry news on Pervasive's Big Data Digest!

     

  • Capture All Your Data, All The Time

    We are in Washington D.C. this week exhibiting at the FOSE show and we're showing this awesome demo that we've been working on. Pervasive has taken Pervasive DataRush to new levels in order to meet specific cyber security challenges. It’s no secret that we’ve entered the big data era. There are millions of devices generating data every second: log events, security events, network traffic, firewalls, and so much more (this various is shown as shapes in the diagram below). And there’s lots of great software out there to look at these events, but only for a short time frame. One of the most daunting challenges facing organizations today is capturing, archiving, and analyzing ALL this data at any given time. There’s so much data that today’s software is failing to archive and analyze cyber security events as a WHOLE.

    Until now.

    Pervasive has developed a Historical Event Processing proof of concept (POC) that leverages and exposes the power of Pervasive DataRush as it captures and archives one million events per second into Hadoop’s HBase. Holy smokes, that’s amazing! This consumption rate is orders of magnitude faster than any solution on the market today.

    For this POC, we used a single server box with 48 cores, 40 drives, and 258G of memory. But the processing rate will increase with increased cores or multiple nodes. We used Pervasive DataRush listeners for multiple log events and archivers to write to any database. And we actually captured 1.6 million events in 62 seconds, to be exact. Once the millions of events are captured and archived, you can use Pervasive DataRush to launch any set of queries or apply data mining algorithms to perform deep analytics on the dataset as a whole to look for a change in patterns...particularly useful in cyber security. For this POC, we used a Hive query to count server process types that generated each event and calculate percentages. Once the query was completed, Pervasive DataRush sent counts of each message to Google Charting to display the data visually. The entire process helps meet the demands of today's cyber security challenges, but we're especially impressed about the speed that we're able to capture this events and run queries.

    If we captured a million events in one minute, imagine how much data is created everyday. Now organizations can capture all the data, archive it, and perform deep analytics as a whole. Stop by our booth #1528 at FOSE to see the POC in person.

  • Pervasive Not Only Presenting But Introducing Pervasive TurboRush for Hive at Hadoop Summit

    Pervasive Software’s Chief Integration Technologist Paul Dingman will be a presenter at Hadoop Summit 2011 on June 29, 2011, in Santa Clara, CA. Paul is participating in the Application and Research track and will present from 1:15 pm to 1:45 pm PDT. Paul's presentation is "Hadoop on a Personal Supercomputer."

    He will discuss the potential advantages of running Hadoop on single many-core machines with large disk arrays, illustrating cases where Hadoop running on one, or a few, fat nodes can deliver faster results and be more cost effective than Hadoop running on a greater number of lower end machines. He will also discuss opportunities to exploit intra-server parallelism to improve task communication and coordination overhead.

    The Yahoo Hadoop Summit Agenda is now posted—and you can register online.
     
    Big News to Expect at the Hadoop Summit: We’ll Introduce Pervasive TurboRush™ for Hive

    As more companies adopt Apache Hadoop software for big data preparation and analytics, many are using Apache Hive for its ability to run SQL-like queries and analyze large datasets stored in the Hadoop file system. 

    We saw the opportunity to provide a “turbocharger” to make Hive queries run more efficiently, without developers having to modify them or learn another tool. At the Hadoop Summit the Pervasive DataRush team will introduce Pervasive TurboRush for Hive. Queries run faster on less hardware, all without code changes. Stay tuned!

    You’ll also hear news about the Pervasive DataRush, Community Edition for Hadoop – a version of Pervasive DataRush for developers doing prototyping in Hadoop and looking to get better performance in their Map/Reduce cluster jobs. If you’re prototyping a Hadoop job and you’re sensitive to scaling and containing costs, this edition is for you. We’ll offer users a managed community site with forum, as well as support.
     
    And Don’t Forget: BigDataCamp the night before the Hadoop Summit

    The evening before Hadoop Summit 2011 users of Hadoop and other big data-related technologies will exchange ideas in a fast-paced, peer-driven “unconference.” BigDataCamp will be a knowledge transfer and networking opportunity for data engineers, enterprise architects, developers, analysts, data miners and business intelligence professionals. Led by CloudCamp's Dave Nielsen, attendees will share their thoughts in open discussions with pre-defined and majority-vote topics, including best practices in application development and advanced analytics. Pervasive DataRush Chief Technologist Jim Falgout will present a brief lightning talk, as will Concurrent Founder and CTO Chris Wensel, Foursquare Engineer Ben Lee and others. 

    Supported by Pervasive DataRush, Aster Data, Concurrent and EMC, BigDataCamp will be on June 28, 2011, from 5:30 PM-10:00 PM (PT), at the Network Meeting Center at Techmart.

    Tickets are going fast so you may want to register now.


     

  • Big Data Analytics Digest: Receive the latest news, thought leadership and Pervasive Big Data Analytics activity through RSS

    If you’ve explored our website lately, you may have noticed that we now offer a daily news aggregator called Big Data Analytics Digest. It’s accessible on our home page, too.

    We avidly follow the torrent of news and commentary in the realm of Big Data Analytics, so we rolled up our sleeves and created the digest to give our customers, industry watchers and analysts a glimpse of what we find interesting in the ever-changing Big Data Analytics scene—including breaking Hadoop-related news.

    So, if you subscribe, what will you get?

    Links to the latest Big Data news from the likes of BeyeNetwork, Computerworld, Enterprise Irregulars, Forrester, GigaOM, Infoworld, IT Business Edge, O’Reilly Radar, ReadWriteWeb, SDTimes, TechCrunch, The 451 Group and others.

    Thought leadership from the leading names in Big Data Analytics, including Pervasive DataRush Chief Technologist Jim Falgout and Pervasive CTO Mike Hoskins. 

    (FYI – if you haven’t had a chance to read Jim’s widely read article on enhancing existing applications with embedded analytics in eWeek, take a look.)

    A source for upcoming Big Data industry events and presentations

    Technical resources

    Updates on the companies and products shaping Big Data Analytics

    Pervasive DataRush news PLUS updates from our Innovation Labs 

    In addition, we offer a comprehensive blogroll that will keep you in touch with the leading bloggers on the Big Data Analytics market.

    I’m open to all suggestions and criticisms to make the digest even better. Just drop me a line at joe.dubin@pervasive.com.


    Joe Dubin,
    Pervasive DataRush Product Manager

  • Pervasive DataRush Chief Technologist Jim Falgout to Speak at AMD Fusion Summit on June 14

    Jim Discusses Leveraging Multicore Systems for Hadoop and HPC Workloads

    Check out Pervasive DataRush Chief Technologist Jim Falgout at the AMD Fusion Developer Summit June 13-16 at Meydenbauer Center in Bellevue, WA. The Summit includes more than 90 technology sessions across eight Technology Topics. AMD tech leaders, industry experts, and members of academia lead the sessions.

    Jim’s presentation, Session 1421 slated for Tuesday June 14th at 5:15 pm, will center on the critical importance of unlocking the parallelism of multi-core servers and clusters, particularly Hadoop-based clusters, to solve big data problems. Despite the promise of scaled-out hardware, users are encountering long processing times and complexities building MapReduce jobs.

    Jim argues that the right approach to solving these challenges is to better exploit the performance potential of multicore. His presentation includes examples of Hadoop and HPC workloads running on clusters and single multi-core servers—including web analytics and bioinformatics—detailing the performance and energy efficiency gains that are possible with scaling up, as well as out.

    Learn more about the technology behind his approach.

    As mentioned, the Summit covers eight technology topics including:

    Developer Tools: Covers development tools ranging from compilers and debuggers to performance visualization tools. Sessions cover the state of the art in compiler technology (CPU and GPU), debugging and profiling OpenCL™, and automatic data movement.

    Enterprise Computing: Features sessions that discuss using multicore technology to handle large data, showcase software being developed today utilizing multicore CPUs, and show early work of applying the data parallel capabilities of GPUs to databases.

    High Performance Computing: Presents a sampling of portable and standards based heterogeneous computing. Come see innovative uses of GPUs, extreme optimizations, power efficient implementations, benchmarks, libraries, and real world applications in physics, chemistry, finance and rendering.

    Multimedia Processing: Sessions on image processing, audio processing, video processing, telepresence, video quality enhancement, computer vision, transcoding, content recognition, image retrieval, multimedia algorithm optimization for parallel processing and codecs.

    Professional Graphics and Visual Computing: Focuses on various areas of visual computing, including mixed-mode OpenGL/DX/OpenCL™ interoperability, and advanced rendering and compute techniques.

    Programming Models: Showcases the state of the art in parallel programming models and techniques for heterogeneous platforms. Topics covered include: programming models for next generation GPU architectures and techniques for building domain specific languages on heterogeneous platforms.

    Security: Sessions on password recovery and audit, encryption, and steganography detection.

    User Interface and Media Experiences: A focus on gesture recognition, touch recognition, face recognition, UIs for new user experiences, video management, video playback, and Web user experiences.

    There’s something for everyone. The Pervasive DataRush team hopes to see you at the AMD Fusion Developer Summit!

     

  • A Detailed Summary of McKinsey Report on Big Data

    We wanted to share a detailed summary of the report on Big Data by McKinsey Global Institute (MGI) that was released last month, as it contains relevant points and interesting statistics.
     
    The report describes the state and growing role of digital data that has now entered every sector and economy, as well as the impact of the growing amount of data. MGI claims that there is strong evidence that Big Data can contribute significantly to national economies, creating substantial value for the overall world economy. Their research suggests that the public sector can increase its productivity through effective use of Big Data. For instance, the value to the US healthcare system could be $300 billion a year, and US retailers could boost their operating profit margins by 60 percent. However, MGI notes the challenges organizations face with reaching the full potential of Big Data, such as limited analytical and managerial talent to make big data advantageous and valuable for businesses.
     
    Some relevant key findings from the research include:

    1.     MGI estimates the new data stored by enterprises exceeded 7 exabytes of data globally in 2010. In addition, the new data stored by customers around the world exceeded 6 exabytes in the same year.

    2.     Organizations are using Big Data analytics more to make decisions by analyzing datasets, including from mobile and social networks, on customers, employees and sensors embedded in products. This is leading to innovation of new business models, products and services. For example, there will be a better match between products and customers.

    3.     The use of Big Data will encourage new growth opportunities and competition among businesses.

    4.     Some sectors are positioned for greater gains from the use of Big Data. Those sectors include computer and electronic products, financial and insurance, and government. Public sectors, such as the education vertical, have experienced negative productivity growth due to high systemic barriers.

    5.     Concerns over the use of Big Data include data policies, privacy issues, developing new techniques and technologies, organizational change and talent, access to and integration of information from various data sources, and industry structure.

    6.     Under techniques for analyzing Big Data, MGI listed various methods from statistics and machine learning for data mining, including association rule learning, cluster analysis and classification. Data fusion and data integration are also significant techniques that allow analysis of data from multiple databases to extract valuable insight.

    7.     Visualization supports Big Data and is becoming more important to portray information in a consumable way for people to understand.
     
    Please see below for a direct link to the full report. The report includes an overview of Big Data techniques and technologies, the potential of Big Data and key findings in five types of verticals (health care, public sector administration, retail, manufacturing and personal location data), and the implications for organization leaders and policy makers.
     
    McKinsey Global Institute, May 2011
    Big data: The next frontier for innovation, competition, and productivity

     

  • Tackling Big Data is a Big Job

    Tackling big data is not a job that is only going to be solved by programmers alone.  It’s going to be solved in concert with data scientists and analysts.  What tools exist for both programmers and non-programmers?   We love KNIME for this and we love DataRush for KNIME even more. 

    KNIME
    , an open-source data mining and visualization tool, allows users to visually create data flows, execute selected analysis steps, and later investigate the results through interactive views on data and models.  KNIME’s drag-and-drop interface gives non-programmers a chance to easily build work flows and store the analysis process for later expansion.  DataRush for KNIME allows big data analytics to be processed in a faction of the time without writing scripts.


    This Wednesday, May 11, Pervasive DataRush and KNIME will host a joint webinar, Scalable Data Analytics: Tools for Big Data Mining and Visualization,  that will show KNIME’s intuitive user interface and demo how DataRush accelerates KNIME’s already powerful data mining capabilities.  Michael Berthold, CEO of KNIME, will show interactive analytics and visualizations produced with the KNIME platform.  With DataRush for KNIME, Davin Potts will show how anyone who can use a mouse can now use DataRush to maximize the full power of cores in their server without needing to be a programmer.
     

    Making available the power of DataRush through a tool like KNIME means empowering data scientists, analysts, and programmers alike to help tackle big data analytics.  If you miss the webinar, please email us for a recorded version of the live event.

     

  • Hitting the East Coast this week: GigaOM Structure Big Data 2011

    Pervasive Software is a proud sponsor and exhibitor of GigaOM’s Structure Big Data 2011 taking place on March 23 in New York City.  As noted on their website, The Structure Big Data conference is designed to get you up to speed on how to make money using the data already locked in your organization.

    In our booth at GigaOM Structure Big Data, we’ll be showing the latest release of Pervasive DataRush, a software framework for building high-performance applications for big data.   Organizations can develop big data applications that run faster, cost less, and use less energy – whether on a cluster or a single server, on premises or in the cloud.  Check out this video of Jim Falgout discussing the latest version of Pervasive DataRush and where it can be used.  Not only are we exhibiting, but our CTO and general manager of Integration Products, Mike Hoskins, will participate on the panel titled, “The Many Faces of Map Reduce - Hadoop and Beyond,” taking place at 1:30 p.m. EST.  Mike will be discussing parallelism and big data.  Mike is always atop the latest industry trends and will be available to meet and discuss any questions you may have.   If you’re not already registered and would like to attend,  you can save $100 off registration by using this link.  

    GigaOM is sure to be a success and we hope to see you there!

     

  • Takeaways from the O'Reilly Strata Conference

    Attending the O'Reilly Strata Conference, I received lots of food for thought about the future of Big Data, as well as further validation that Pervasive DataRushTM is a good framework to respond to many of the information explosion challenges now or soon to be facing organizations. Here are some of my takeaways from this insightful event.

    Joe Dubin
    Manager, Product Marketing
    Pervasive DataRush

    Information is Black Gold

    Metamarkets CTO Mike Driscoll told technology executives to think ‘oil' when it comes to information. Driscoll, quoting Gartner's Peter Sondergaard, stresses, "Information will be the ‘oil of the 21st century'. It will be the resource running our economy in ways not possible in the past." Driscoll describes Big Data as the ‘tar sands' of the information economy, containing valuable stores of information, but that are expensive to extract. Once extracted, the challenge is to analyze the data, using it to learn and predict. 

    Driscoll sees three major forces driving Big Data:

    • Ubiquitous sensor networks (mobile phones, as an example).
    • Cloud computing obviating the need to manage compute power. Drawing an analogy to an electric grid, Driscoll says that businesses don't invest capital in power generation, and the cloud enables a similar trend for compute power.
    • Machine learning, with Driscoll citing the progress made in the DARPA grand challenge and the Netflix prize.

    Other emerging trends Driscoll has a pulse on are:

    • The Need for Data Scientists: Already in short supply, the demand for data scientists is growing. Companies are looking for those with interdisciplinary skills in math, statistics, bioinformatics, physics, programming (and hacking) skills, and, above all, curiosity. In fact, many speakers ended their presentations with a message to data scientists: We're hiring.
    • The Rise of Data Publishers (i.e., the reassertion of control by data producers): Companies recognize the value of their own data and are pulling back from third-party data processors.
    • The End of Privacy (or the Rethinking of Privacy): The view that visibility of personal data can be restricted is shifting to one inclined to restricting allowed usage of that data. In other words, policing usage will become more prevalent.
    • The Rise of Data Start-ups: A class of companies is emerging whose supply chains consist of nothing but data. Their inputs are collected through partnerships or from publicly available sources, processed, and transformed into traffic predictions, news aggregations, or real estate valuations. Data start-ups are the wildcatters of the information age, searching for opportunities across the data landscape.

    Data science, Driscoll firmly believes, can solve big problems to organizations-namely, making sense of the world and scaling-up decision making.  As a case in point, he cites the use of data mining to reduce health care costs by identifying the neediest patients and improving their health care.

    Read more of Driscoll's commentary.

    Traditional BI and Applications are Complimentary

    Dr. Barry Devlin of 9sight Consulting, an industry founder of the DW industry with over 30 years in DW and BI, suggests that Traditional BI (and its database-centric approach), with its emphasis on consistency, traceability, and data quality, and Applications, built on technologies like Hadoop and MapReduce, with support for large, rapidly changing datasets, are complementary approaches to handling the information explosion organizations face.

    I found it interesting that Dr. Devlin presented Traditional BI and Applications as two worlds which need to-and can-work together. Our product Pervasive DataRush works in both spaces-on the traditional BI side, it's the basis for our Pervasive DataMatcher and Pervasive Data Profiler products and for applications, Pervasive DataRush integrates with Hadoop, accelerating MapReduce jobs up to 10x.

    Take a look at Dr. Devlin's slide presentation.

    The Multicore Crisis and Emerging Technologies
    One of the highlights of Strata was listening to Third Nature President Mark Madsen's survey of new technologies, particularly technology innovations and systems powering the analytic database landscape today. Madsen underscored the multicore crisis and the end of Moore's law free lunch as major factors in shaping data technology. 

    Pre-2005, Madsen says, the trend was for CPU manufacturers to increase clock rates with every new chip, and everyone's software would automatically run faster. But increasing speed also increases power consumption and heat generated. As a remedy, CPU makers moved towards putting multiple cores on a chip. Madsen, however, points out, "Putting more engines in your car doesn't make it go faster; you need to redesign to take advantage of them. Achieving multicore performance is fundamentally different than getting a free boost from clock rates increasing."  I couldn't agree more!

    Companies operating at petabyte scale like Google, EBay and Twitter are the exception to the norm, Madsen states. Most companies in need of Big Data Analytics have less than six terabytes of data, and he finds that the computational needs of data analytics are pushing companies from running on PCs into SMP servers and clusters. I would add that Pervasive DataRush can help here.

    Mark was throwing out insights faster than I could write them down... thankfully, his slides are available. I recommend taking a more in-depth look. I know I will.

  • Pervasive DataRush, Parallelism, Big Data and Hadoop are Top of Mind in Upcoming Pervasive Presentations

    January, February and March will be busy months for our technology evangelists. Pervasive DataRush Director of Product Management Davin Potts, Pervasive DataRush Chief Technologist Jim Falgout and Pervasive Software Chief Technology Officer Mike Hoskins will be speaking at major conferences over the next few weeks.

    At the O’Reilly Strata Conference (www.strataconf.com/strata2011) in San Francisco, running from February 1-3, Davin’’s “Supercharge Development and Performance of Hadoop Applications” presentation will take place on February 2 at 2:30 p.m. PST in the Mission City B4 room. He will discuss how to get more done faster with high-performance MapReduce and expand the universe of Hadoop possibilities with tools to speed and simplify development and deployment of analytic applications, such as Pervasive DataRush™. Pervasive Software also is exhibiting at the Strata event—you can find us at booth 501.

    For those in the Austin area, on January 29, Davin also will be speaking at DataDay Austin (www.datadayaustin.eventbrite.com). He will discuss “Reducing Complexity and Increasing Efficiency in the Land of Hadoop.” We hope you can join him.

    Pervasive Chief Technologist Jim Falgout will be addressing the European Data Innovation Summit (www.europeanintegrationsummit2011.com) audience in London at the Hilton London Tower Bridge on February 2. Jim’s presentation will be “Best Practices in Building Custom Applications for Big Data.” He will describe how the dataflow-based Pervasive DataRush framework delivers massive built-in parallelism to power applications that automatically scale up to consume the full capacity of commodity multicore servers.

    Jim also will be speaking at the KNIME User Group Meeting (www.knime.org/about/events/ugm-workshop-2011), being held in Zurich on February 28-March 4. Stay tuned for more details.

    At the GigaOM Structure Big Data conference (http://event.gigaom.com/bigdata/) on March 23 at Pier Sixty in New York City, Mike Hoskins will be part of the Hadoop panel that will be gathering at 1:30 p.m. Mike will be discussing parallelism and Big Data. Mike is always a font of industry knowledge and an astute trend predictor (and entertaining to boot).

    We look forward to seeing you—no matter the time zone--in the weeks ahead!
  • Intel’s Concurrency Checker confirms powerful scalability of Pervasive DataRush & Pervasive DataMatcher

    The Intel Concurrency Checker can be used to evaluate the performance scaling of applications on multi-core systems and to help further optimize applications. It’s a tool that is used to check application threading and threading concurrency and can also be utilized to measure performance by running the application before and after making specific code enhancements and comparing the measured results.

    Computed Scaling
    Computed scaling, or concurrency level, is Intel’s measure to predict the performance improvement factor when an application is run on a multi-core system, compared to a single-core system. The concurrency level is measured over a 30-second interval.  Pervasive Software recently used the Concurrency Checker to conduct software performance testing of Pervasive DataRush v4.4 running the MalStone B benchmark and Pervasive DataMatcher 5.0.

    Pervasive DataRush 4.4 running the MalStone B Benchmark
    MalStone benchmarks, developed by the Open Cloud Consortium, provide a method for assessing data-intensive application performance for cloud-based clusters. MalStone datasets consist of information about web site visits and cyber infection status. The benchmarks calculate the rate of infection for each site (an anomaly that might signal intrusions or attempted intrusions). 

    Largely a ‘Read’ operation, the result shows Pervasive DataRush’s ability to tackle I/O intensive data activities with phenomenal throughput.

    System Information:
    Cores: 16
    Processor:  164 Family 6 Model 15 Stepping 11 Intel Xeon CPU E7330 at 2.4GHz
    Operating System:  Windows Server 2008 R2 Standard Edition (build 7600), 64-bit
    Sockets: 4
    Logicals: 16
     
    Pervasive DataMatcher 5.0
    On the same 16-core system, Pervasive DataMatcher 5.0’s computed measured value was amazing.

    The test serves as another proof point about CPU- and process-intensive capabilities of Pervasive DataMatcher, which allows users to fully utilize all of the capacity of their multicore systems.

    System Information:
    Cores: 16
    Processor:  164 Family 6 Model 15 Stepping 11 Intel Xeon CPU E7330 at 2.4GHz
    Operating System:  Windows Server 2008 R2 Standard Edition (build 7600), 64-bit
    Sockets: 4
    Logicals: 16

    Intel Case Study

    Pervasive’s testing results further validate the computed scaling performance of our highly robust Pervasive DataRush-based products. Candidly (and no pun intended), our results were off the scale. The results, even more, serve as additional substantiation that the multicore reality is here and now.

    As a developer or other professional seeking software capable of tackling big data challenges, we encourage you to check out Pervasive DataRush yourself. The Pervasive DataRush team also invites you to read Intel’s case study of our use of the Concurrency Checker. Look for the case study soon. 


    *Based on 30-second elapsed time.

More Posts Next page »