<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://cs.pervasive.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Pervasive DataRush</title><link>http://cs.pervasive.com/blogs/datarush/default.aspx</link><description>This blog is syndicated from the &lt;a href="http://www.pervasivedatarush.com/"&gt;Pervasive DataRush&lt;/a&gt; site.</description><dc:language>en</dc:language><generator>CommunityServer 2007 SP1 (Build: 20510.895)</generator><item><title>Going really fast ...</title><link>http://cs.pervasive.com/blogs/datarush/archive/2010/03/11/going-really-fast.aspx</link><pubDate>Thu, 11 Mar 2010 22:05:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:43376</guid><dc:creator>jfalgout</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2010/03/11/going-really-fast.aspx#comments</comments><description>&lt;p&gt;In a recent &lt;a title="Would you like to go fast? Really fast?" href="http://www.thevirtualcircle.com/2010/03/would-you-like-to-go-fast-really-fast/"&gt;blog&lt;/a&gt;, Robin Bloor discusses how parallelism is required for software to go &lt;em&gt;&lt;strong&gt;really fast&lt;/strong&gt;&lt;/em&gt; on todays multicore computers. He brings up&amp;nbsp;this point about &lt;a title="MapReduce Definition" href="http://en.wikipedia.org/wiki/MapReduce"&gt;MapReduce&lt;/a&gt;: using MapReduce on problems it wasn&amp;#39;t intended to solve is &amp;quot;... &lt;em&gt;like playing golf with a single club&lt;/em&gt;&amp;quot;. I&amp;#39;d like to expound a bit on this analysis.&lt;/p&gt;
&lt;p&gt;MapReduce was most famously implemented by Google to fulfill their need to index the world wide web. Quite an undertaking! And MapReduce proved to be critical to their success. The programming model for MapReduce fits perfectly with the problem of finding words within documents and creating indices for later (very fast) lookup.&lt;/p&gt;
&lt;p&gt;However, the MapReduce programming model can be limited when applied to other, more complex problems. Many deep data analysis algorithms&amp;nbsp;require multiple, complex steps to produce their output. In these cases, a more general use programming paradigm is a better and more efficient&amp;nbsp;fit. Hence, Robin&amp;#39;s analogy to &amp;quot;... &lt;em&gt;playing golf with a&amp;nbsp;single club&lt;/em&gt;&amp;quot;.&lt;/p&gt;
&lt;p&gt;Robin goes on to discuss &lt;a title="Pervasive DataRush" href="http://www.pervasivedatarush.com/"&gt;DataRush&lt;/a&gt; and the capabilities it brings to bear. Based on a &lt;a title="Dataflow Programming" href="http://en.wikipedia.org/wiki/Dataflow_programming"&gt;dataflow&lt;/a&gt; architecture, the programming model of DataRush is much more flexible and general use than MapReduce. I wouldn&amp;#39;t use DataRush to index all the content of the internet, but it has proven to be an excellent tool for general data processing and data mining. And it has the ability to utilize all of the cores available on today&amp;#39;s multicore systems. Put into perspective, we have &lt;a title="Cluster in a box" href="http://cs.pervasive.com/blogs/datarush/archive/2010/03/05/cluster-on-a-chip.aspx"&gt;benchmarks&lt;/a&gt; showing Terabyte an hour (and even&amp;nbsp;better) processing of&amp;nbsp;network log data for a cyber&amp;nbsp;security application&amp;nbsp;on a single box.&lt;/p&gt;
&lt;p&gt;So does playing golf with only one club mean you can&amp;#39;t play golf? Of course not. But it does mean you can&amp;#39;t play as well or as efficiently as if you used all the clubs at your disposal. The flexibility of DataRush allows you to utilize a full programming paradigm especially suited to &lt;strong&gt;big data&lt;/strong&gt; problems.&lt;/p&gt;
&lt;p&gt;Robin has a knack for the turn of a phrase. Check him out on &lt;a title="Robin on Twitter" href="http://twitter.com/robinbloor"&gt;Twitter&lt;/a&gt;. He also has&amp;nbsp;a very funny (and informative) book out called &amp;quot;&lt;a title="Words You Don&amp;#39;t Know" href="http://www.wordsyoudontknow.com/book/show-me-the-book/the-book-words-you-dont-know/"&gt;Words You Don&amp;#39;t Know&lt;/a&gt;&amp;quot;. I especially like the chapter on swear words. &lt;img src="http://cs.pervasive.com/emoticons/emotion-1.gif" alt="Smile" /&gt;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=43376" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Robin+Bloor+parallelism+DataRush+MapReduce+Google+algorithms+big+data+benchmarks+data+mining+multicore+Java/default.aspx">Robin Bloor parallelism DataRush MapReduce Google algorithms big data benchmarks data mining multicore Java</category></item><item><title>HIMSS 2010 - DataRush in Health IT making a difference in patient care</title><link>http://cs.pervasive.com/blogs/datarush/archive/2010/03/11/himss-2010-datarush-in-health-it-hit-making-a-difference-in-patient-care.aspx</link><pubDate>Thu, 11 Mar 2010 14:13:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:43370</guid><dc:creator>n5712036</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2010/03/11/himss-2010-datarush-in-health-it-hit-making-a-difference-in-patient-care.aspx#comments</comments><description>&lt;p style="MARGIN:0in 0in 0pt;" class="Default"&gt;&lt;font size="3" face="Trebuchet MS"&gt;The &lt;/font&gt;&lt;a href="http://www.pervasivedatarush.com/"&gt;&lt;font size="3" face="Trebuchet MS"&gt;DataRush&lt;/font&gt;&lt;/a&gt;&lt;font size="3" face="Trebuchet MS"&gt; team just returned from attending the &lt;/font&gt;&lt;a href="http://www.himssconference.org/"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;font color="#800080" size="3" face="Trebuchet MS"&gt;HIMSS 2010&lt;/font&gt;&lt;/b&gt;&lt;/a&gt;&lt;font size="3" face="Trebuchet MS"&gt; conference in Atlanta (March 1-4&lt;sup&gt;th&lt;/sup&gt;).&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 0pt;" class="Default"&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 0pt;" class="Default"&gt;&lt;a href="http://www.himss.org/"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;font color="#800080" size="3" face="Trebuchet MS"&gt;HIMSS&lt;/font&gt;&lt;/b&gt;&lt;/a&gt;&lt;font size="3" face="Trebuchet MS"&gt; is traditionally a large conference. As massive as the Atlanta convention center is, this year once again HIMSS filled the venues. Total attendance on March 2nd was recorded at 27,451 with 13,530 as exhibitors. This years’ theme was “&lt;b style="mso-bidi-font-weight:normal;"&gt;Health IT&lt;/b&gt; (HIT) making a difference in patient care”. The &lt;/font&gt;&lt;a href="http://www.pervasivedatarush.com/"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;font size="3" face="Trebuchet MS"&gt;DataRush&lt;/font&gt;&lt;/b&gt;&lt;/a&gt;&lt;font size="3" face="Trebuchet MS"&gt; team staked its camp at booth#3325 where it visited with presenters, vendors and attendees discussing large data pains and synergies between interoperable health IT and the DataRush parallelism engine.&lt;br /&gt;&lt;br /&gt;&lt;/font&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;Day 1: &lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;Started early with a lunch briefing with a global leader IT services. The architect of the largest healthcare informatics data warehouse shared his experiences with content management and searchable health information. Performance bottlenecks are common in data warehousing (DW) and BI data loading and consumption. Additionally, consolidation of data from up to 40 member companies had its challenges with speed and accuracy of data de-duplication.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;In reference to DW loading, Pervasive DataRush offers a platform that leverages multi-core technology enabling unprecedented data throughputs.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;In response to data quality and consolidation, &lt;/span&gt;&lt;a href="http://www.pervasivedatarush.com/solutions/Pages/DataMatcher.aspx"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;font color="#800080"&gt;Pervasive DataMatcher&lt;/font&gt;&lt;/span&gt;&lt;/b&gt;&lt;/a&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt; combines the power of DataRush parallelism engine with sophisticated parametric string matching algorithms to provide the highest precision (fidelity) and recall (completeness) in fuzzy matching.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;/span&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;Once the exhibitor’s area opened midmorning, &lt;b style="mso-bidi-font-weight:normal;"&gt;DataRush&lt;/b&gt; booth featured live demos of Pervasive DataRush Analytics plug-ins for &lt;strong&gt;KNIME&lt;/strong&gt;.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;a href="http://www.knime.org/"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;font color="#800080"&gt;KNIME&lt;/font&gt;&lt;/span&gt;&lt;/b&gt;&lt;/a&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt; (Konstanz Information Miner) is an open source data mining tool.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;The upcoming DataRush release 5.0 provides KNIME nodes that offer DataRush powered versions of several data mining algorithms. The KNIME interface provides a user friendly workflow-like environment for data miner practitioners to rapidly orchestrate and deploy predictive analysis. Healthcare system integrators delivering &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Electronic_medical_record"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;font color="#800080"&gt;EMR&lt;/font&gt;&lt;/span&gt;&lt;/b&gt;&lt;/a&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt; and &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Electronic_health_record"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;font color="#800080"&gt;EHR&lt;/font&gt;&lt;/span&gt;&lt;/b&gt;&lt;/a&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt; based decision support are turning to predictive analytics and data mining. Pervasive DataRush speeds and accuracy combined with the KNIME interface enables instant processing and response from Tera-scale heterogeneous data repositories.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;/span&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;Day 2:&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;Booked back-to-back with briefings to target specialized areas in Healthcare data exchange, interoperability, network support and cyber security. Healthcare data exchange deals with transformations between End-User formats and ANSI EDI, HL7, etc. standards.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;These transformations are facilitated via ETL tools.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;Pervasive DataRush powered schema-to-schema transformations supports healthcare standards as defined by the &lt;/span&gt;&lt;font face="Calibri"&gt;&lt;b&gt;&lt;span style="FONT-SIZE:12pt;mso-bidi-font-size:10.0pt;"&gt;American National Standards Institute&lt;/span&gt;&lt;/b&gt;&lt;span style="FONT-SIZE:12pt;mso-bidi-font-size:10.0pt;"&gt; (&lt;/span&gt;&lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/American_National_Standards_Institute"&gt;&lt;span style="FONT-SIZE:12pt;mso-bidi-font-size:10.0pt;"&gt;&lt;font color="#800080" face="Calibri"&gt;ANSI&lt;/font&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="FONT-SIZE:12pt;mso-bidi-font-size:10.0pt;"&gt;&lt;font face="Calibri"&gt;)&lt;/font&gt;&lt;/span&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;/span&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;Within cyber-security, Pervasive DataRush presented results from a benchmark study based on &lt;/span&gt;&lt;a href="http://rgrossman.com/2009/05/25/malstone-benchmark/"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;font color="#800080"&gt;Malstone&lt;/font&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt; algorithm for web exploits. The Malstone DataRush implementation is capable of processing 1Tb of web log files per hr&lt;b style="mso-bidi-font-weight:normal;"&gt;. Furthermore, DataRush is 12 times faster on 19 less computers&lt;/b&gt; when compared to the Malstone algorithm implementation using &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Mapreduce"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;font color="#800080"&gt;MapReduce&lt;/font&gt;&lt;/span&gt;&lt;/b&gt;&lt;/a&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt; on a 20 node cluster. Customers requiring enhanced cyber security discussed their bottlenecks in vulnerability assessments. An important task in cyber security involves analyzing large volumes of network data against databases of &lt;/span&gt;&lt;font size="3"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Arial&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;"&gt;Common Vulnerabilities and Exposures (&lt;/span&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Verdana&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:#000154;"&gt;CVE&lt;/span&gt;&lt;/b&gt;&lt;span style="FONT-FAMILY:&amp;#39;Verdana&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:#000154;"&gt;) software flaws, Common Configuration Enumeration (&lt;b style="mso-bidi-font-weight:normal;"&gt;CCE&lt;/b&gt;) mis-configurations, etc. and then &lt;/span&gt;&lt;/font&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;scoring them based on National Institute of Standards and Technology (&lt;b style="mso-bidi-font-weight:normal;"&gt;NIST&lt;/b&gt;) severity scoring system for vulnerability.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;The Common Vulnerability Scoring System (&lt;/span&gt;&lt;a href="http://nvd.nist.gov/cvss.cfm"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;font color="#800080"&gt;CVSS&lt;/font&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;) provides an open framework for communicating the characteristics and impacts of IT vulnerabilities. DataRush offers data parallelism where subsets of network data can be concurrently read in blocks and processed. The processing here refers to concurrent computation of CVSS core equations for “AccessComplexity”, “Authentication”, “AccessVector”, “ConfidentialityImpact” and “AvailabilityImpact”. The final step in the processing pipeline is to aggregate these scores to produce the CVSS “BaseScore”.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;Regarding network support and interoperability, discussions were centered on DataRush implementations of parallelized encryption and decryption algorithms as well as baseline change detection algorithms to detect anomalies in network log files. &lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;These are novel applications of DataRush driven entirely by customer need.&lt;/span&gt;&lt;/p&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;/span&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;/span&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;Day 3:&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;Had the most traffic through our booth.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;Fraud, waste and abuse in healthcare came up frequently among vendors and attendees.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;The Pervasive DataRush team featured live demo of the &lt;/span&gt;&lt;a href="http://en.wikipedia.org/wiki/Benfords_law"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;font color="#800080"&gt;Benford’s Law&lt;/font&gt;&lt;/span&gt;&lt;/b&gt;&lt;/a&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt; Fraud Detection operator recently added to the Pervasive DataRush-Analytics library and the DataRush KNIME plug-ins package. Implementation and application of this operator is explained in detail in my blog titled “Finding a needle in a hay stack”, click &lt;/span&gt;&lt;a href="http://cs.pervasive.com/search/SearchResults.aspx?u=227845&amp;amp;o=DateDescending"&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;font color="#800080"&gt;here&lt;/font&gt;&lt;/span&gt;&lt;/b&gt;&lt;/a&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt; &lt;/span&gt;&lt;/b&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;to read.&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;Other topics of interest at &lt;b style="mso-bidi-font-weight:normal;"&gt;HIMSS&lt;/b&gt; included Image Processing and Image Content Management. Image management solutions today use picture archiving and communication systems (PACS).&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;b style="mso-bidi-font-weight:normal;"&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;PACS&lt;/span&gt;&lt;/b&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt; is a combination of hardware and software dedicated to the short and long term storage, retrieval, management, distribution, and presentation of images. There are a lot of inefficiencies in image processing pipelines that could benefit from &lt;b style="mso-bidi-font-weight:normal;"&gt;DataRush&lt;/b&gt; parallelism. Higher quality, efficient packaging and faster delivery of images results in earlier diagnosis and characterization of disease with the potential to improve patient outcomes and reduce costs.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;/span&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;&lt;/span&gt;&lt;span style="FONT-FAMILY:&amp;#39;Trebuchet MS&amp;#39;,&amp;#39;sans-serif&amp;#39;;COLOR:black;FONT-SIZE:12pt;mso-bidi-font-family:&amp;#39;Trebuchet MS&amp;#39;;"&gt;To summarize, we have found healthcare IT to be a great fit for DataRush parallelism engine and its analytics operator library. HIMSS was a successful event for Pervasive DataRush Team in the number of leads generated, the networking opportunities and the flow of ideas in the application of &lt;b style="mso-bidi-font-weight:normal;"&gt;DataRush&lt;/b&gt; to operations and analytics in healthcare.&lt;/span&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=43370" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush/default.aspx">DataRush</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Data+Matching/default.aspx">Data Matching</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Predictive+Analytics/default.aspx">Predictive Analytics</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/multicore+revolution/default.aspx">multicore revolution</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Multicore/default.aspx">Multicore</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Datarush+team/default.aspx">Datarush team</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/accurate/default.aspx">accurate</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/accuracy/default.aspx">accuracy</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/data+mining/default.aspx">data mining</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/dataflow/default.aspx">dataflow</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/java/default.aspx">java</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Pervasive/default.aspx">Pervasive</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/java+applications/default.aspx">java applications</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/building+block+of+predictive+analytics/default.aspx">building block of predictive analytics</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush+engine/default.aspx">DataRush engine</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/dataflow+model/default.aspx">dataflow model</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/bottleneck/default.aspx">bottleneck</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/predictive+health+application/default.aspx">predictive health application</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush+library/default.aspx">DataRush library</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush+applications/default.aspx">DataRush applications</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/scalable/default.aspx">scalable</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/data-intensive+applications/default.aspx">data-intensive applications</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Pervasive+Software/default.aspx">Pervasive Software</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/dataflow+MPI+map_2F00_reduce+teraflops+terabytes/default.aspx">dataflow MPI map/reduce teraflops terabytes</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/java+virtual+machine/default.aspx">java virtual machine</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Java+performance/default.aspx">Java performance</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/algorithms/default.aspx">algorithms</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/datarush-analytics/default.aspx">datarush-analytics</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/data+parallelism/default.aspx">data parallelism</category></item><item><title>Cluster in a box</title><link>http://cs.pervasive.com/blogs/datarush/archive/2010/03/05/cluster-on-a-chip.aspx</link><pubDate>Fri, 05 Mar 2010 14:29:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:43320</guid><dc:creator>jfalgout</dc:creator><slash:comments>1</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2010/03/05/cluster-on-a-chip.aspx#comments</comments><description>&lt;p&gt;Processors continue to become more &amp;quot;core dense&amp;quot; as we approach a watershed mark in multicore development. We&amp;#39;ve approached the point where a 4 processor system is taking on the characteristics of a &amp;quot;cluster in a box&amp;quot;. Mike Hoskins, our CTO at Pervasive has deemed March 2010 the &amp;quot;Multicore Month of March&amp;quot; - a clever alliteration alluding to the latest processor&amp;nbsp;announcements of Intel and AMD. Intel is launching their &lt;a title="Nehalem EX Announcement" href="http://www.intel.com/pressroom/archive/releases/2009/20090526comp.htm"&gt;Nehalem EX&lt;/a&gt; with 8 cores and AMD their Magny Cours processors with up to 12 cores. This AMD &lt;a href="http://blogs.amd.com/work/tag/magny-cours/"&gt;blog&lt;/a&gt;&amp;nbsp;outlines a contest to put an AMD&amp;nbsp;48-core system to good use. The winner to receive a free 48-core box.&lt;/p&gt;
&lt;p&gt;But what to do with all those cores? At &lt;a title="Pervasive Software" href="http://www.pervasive.com/"&gt;Pervasive&lt;/a&gt;, we&amp;#39;ve been developing a platform called &lt;a title="Pervasive DataRush" href="http://www.pervasivedatarush.com/"&gt;DataRush&lt;/a&gt; that helps developers of all skill levels write high performing, scalable Java code. Based on a dataflow architecture, the underlying framework has a very simple programming paradigm. Using a shared-nothing approach, the developer is freed from having to manage hundreds of threads, worry about synchronization and deadlock or deal with other parallel programming headaches.&lt;/p&gt;
&lt;p&gt;We recently came across a set of cluster-based benchmarks called MalStone. The MalStone Benchmark was developed by the Open Cloud Consortium (&lt;a href="http://www.opencloudconsortium.com/"&gt;www.opencloudconsortium.com&lt;/a&gt;). For additional information, see &lt;a href="http://www.opencloudconsortium.org/benchmarks"&gt;www.opencloudconsortium.org/benchmarks&lt;/a&gt;. Even though these benchmarks are targeted at clusters, we decided to build a DataRush application for the MalStone B benchmark and run it on a 24 core system. The results were astounding!&lt;/p&gt;
&lt;p&gt;The MalStone B benchmark consists of a 10 billion row&amp;nbsp;log file containing site-entity records. The file size approaches 1 Terabyte in size. The purpose of the benchmark is to compute a ratio for each site (w), per week (d),&amp;nbsp;for all entities that visited the site, the percent of visits for which the entity became compromised at any time between the fisit and the end of week d. This type of processing is typical for &lt;strong&gt;cyber security&lt;/strong&gt; applications.&lt;/p&gt;
&lt;p&gt;For our testing at Pervasive, we used a single machine consisting of 4 &lt;a title="AMD Opteron 6-core Processors" href="http://www.amd.com/us/products/server/processors/six-core-opteron/Pages/six-core-opteron.aspx"&gt;AMD 8435&lt;/a&gt; processors running at 2.6 GHZ. This gives the machine a total of 24 cores. The total memory capacity is&amp;nbsp;64 Gigabytes. The I/O system uses a RAID&amp;nbsp;filesystem consisting of&amp;nbsp;5 SATA drives. This machine does have a large memory capacity, however during the test we generally utilize about 9 Gigabytes for the JVM.&lt;/p&gt;
&lt;p&gt;Using this system we were able to achieve a &lt;strong&gt;68&lt;/strong&gt; minute runtime for the MalStone B benchmark. That&amp;#39;s approaching a Terabyte an hour of processing on a single node, inexpensive server-class machine. Contrast this runtime to the same test run on a 20-node cluster of 4-core boxes using Hadoop. The MalStone B&amp;nbsp;runtime for the Hadoop cluster is &lt;strong&gt;840&lt;/strong&gt; minutes. DataRush on a single node machine is &lt;strong&gt;12 times faster&lt;/strong&gt; than the small cluster! To be fair, the processers in the cluster nodes are not exactly comparable&amp;nbsp;to the ones in the DataRush test box. This test still shows that an inexpensive, single server-class machine with software like DataRush that can take advantage of multicore processors can outperform distributed processing. Cluster in a box in action!&lt;/p&gt;
&lt;p&gt;This is very exciting when put in context of the next generation of multicore processors due out in the &amp;quot;Multicore Month of March&amp;quot;. We look forward to running this and other benchmarks on these new systems. Check back soon for updated numbers as I have a sneaking suspicion that we&amp;#39;ll easily break through the Terabyte an hour threshold with the MalStone benchmark using the latest Intel and AMD systems.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=43320" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Java+multicore+DataRush+Malstone+cluster+performance+scalable+benchmark+cyber+security/default.aspx">Java multicore DataRush Malstone cluster performance scalable benchmark cyber security</category></item><item><title>PMML validation</title><link>http://cs.pervasive.com/blogs/datarush/archive/2010/02/17/pmml-validation.aspx</link><pubDate>Wed, 17 Feb 2010 17:06:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:43167</guid><dc:creator>wtian</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2010/02/17/pmml-validation.aspx#comments</comments><description>&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Predictive Model Markup Language (&lt;a href="http://www.dmg.org/index.html"&gt;PMML&lt;/a&gt;) is the leading standard for statistical and data mining models. PMML describes one or more structures of the data mining models in XML document with a root element of type PMML.&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Our Pervasive DataRush-Analytics project provides the following data mining models: AssociationModel, NaiveBayesModel, and RegressionModel. &lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;The PMML generated from these models can be shared and exchanged from one environment to another, but&amp;nbsp;the PMML needs to be validated against the schema to find any problems that may need to be fixed.&amp;nbsp; &lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;To guarantee validation, the Pervasive DataRush-Analytics model uses both&lt;/font&gt;&lt;font size="3" face="Calibri"&gt; XSD validation and XSLT validation as recommended by &lt;a href="http://www.dmg.org/v3-2/Interoperability.html"&gt;data mining group&lt;/a&gt;.&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;First step:&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;XSD Validation :&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Get the &lt;a href="http://www.dmg.org/current/pmml.xsd"&gt;PMML XSD 3.2 schema&lt;/a&gt; &lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Here is an example of validating PMML file against PMML XSD schema:&lt;/font&gt;&lt;/p&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;public&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;void&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; &lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;pmml&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;XSDValidate(String schemaPath, String sourcePath) {&lt;/font&gt;&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;blockquote&gt;&lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;try&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; {&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;blockquote&gt;SchemaFactory factory = SchemaFactory.&lt;i&gt;newInstance&lt;/i&gt;(XMLConstants.&lt;/font&gt;&lt;i&gt;&lt;font color="#0000c0" size="2"&gt;&lt;font color="#0000c0" size="2"&gt;W3C_XML_SCHEMA_NS_URI&lt;/i&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;);&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;Source schemaFile = &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;new&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; StreamSource(&lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;new&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; File(schemaPath));&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;p&gt;Schema schema = factory.newSchema(schemaFile);&lt;/p&gt;
&lt;p&gt;Validator validator = schema.newValidator();&lt;/p&gt;
&lt;p&gt;validator.validate(&lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;new&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; StreamSource(sourcePath));&lt;/p&gt;&lt;/blockquote&gt;&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;&lt;/font&gt;&lt;/font&gt;&lt;font face="Times New Roman"&gt;&lt;font size="2"&gt;} &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;catch&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;font face="Times New Roman"&gt; (SAXException e) {&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;blockquote&gt;
&lt;p&gt;&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;..........&lt;/p&gt;&lt;/blockquote&gt;&lt;/font&gt;&lt;/font&gt;&lt;font face="Times New Roman"&gt;&lt;font size="2"&gt;} &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;catch&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;font face="Times New Roman"&gt; (IOException e) {&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;blockquote&gt;
&lt;p&gt;&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;..........&lt;/p&gt;&lt;/blockquote&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;font face="Times New Roman"&gt;}&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;/blockquote&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/font&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;XSD validation is a necessary part, but not sufficient by itself for determining if a PMML model is valid.&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Second step: XSLT Validation:&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Get the &lt;a href="http://www.dmg.org/current/pmml.xslt"&gt;PMML XSLT style sheet&lt;/a&gt; &lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Here is an example of XSLT validation.&lt;/font&gt;&lt;/p&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;public&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;void&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; &lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;pmml&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;XSLTvalidate(String s&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;tylesheet&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;Path, String sourcePath, String &lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;result&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;Path) {&lt;/font&gt;&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;blockquote&gt;&lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;try&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; {&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;blockquote&gt;
&lt;p&gt;DocumentBuilderFactory docFactory = DocumentBuilderFactory.&lt;i&gt;newInstance&lt;/i&gt;();&lt;/p&gt;
&lt;p&gt;&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;//This setting will ignore the namespace &lt;/p&gt;&lt;/font&gt;&lt;/font&gt;&lt;font face="Times New Roman"&gt;&lt;font size="2"&gt;docFactory.setNamespaceAware(&lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;false&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;&lt;font face="Times New Roman"&gt;);&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;p&gt;DocumentBuilder parser = docFactory.newDocumentBuilder();&lt;/p&gt;Document document = parser.parse(&lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;new&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; FileInputStream(sourcePath));&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;p&gt;Source &lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;pmml&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;Source = &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;new&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; DOMSource(document);&lt;/p&gt;Source xsltSource = &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;new&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; StreamSource(&lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;new&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; FileInputStream(s&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;tylesheet&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;Path));&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;p&gt;TransformerFactory transFact&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;ory&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; = TransformerFactory.&lt;i&gt;newInstance&lt;/i&gt;();&lt;/p&gt;Transformer trans&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;former&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; = transFact&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;ory&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;.newTransformer(xsltSource);&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;p&gt;trans&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;former&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; .transform(&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;pmml&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;Source , &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;new&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; StreamResult(&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;result&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;Path));&lt;/p&gt;&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;
&lt;p&gt;//check result after transformation&lt;/p&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;
&lt;p&gt;&lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2" face="Courier New"&gt;&lt;font color="#7f0055" size="2" face="Courier New"&gt;&lt;font color="#7f0055" size="2" face="Courier New"&gt;......................&lt;/p&gt;&lt;/blockquote&gt;&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;} &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;catch&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; (TransformerConfigurationException e) {&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;blockquote&gt;
&lt;p&gt;&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;..............&lt;/p&gt;&lt;/blockquote&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;} &lt;/font&gt;&lt;b&gt;&lt;font color="#7f0055" size="2"&gt;&lt;font color="#7f0055" size="2"&gt;catch&lt;/b&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; (TransformerException e) {&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt; 
&lt;blockquote&gt;
&lt;p&gt;&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;..............&lt;/p&gt;&lt;/blockquote&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;
&lt;p&gt;&lt;/font&gt;&lt;font size="2" face="Courier New"&gt;&lt;font size="2" face="Courier New"&gt;}&lt;/p&gt;&lt;/blockquote&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="2"&gt;
&lt;p&gt;}&lt;/p&gt;&lt;/font&gt;
&lt;p&gt;It is possible that problems may still exist even if the PMML is validated, but running this test lowers the probability.&amp;nbsp; Once validated, Pervasive DataRush-Analytics models will provide specified results to help you analyze your business data and predict customer need.&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=43167" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/data+mining/default.aspx">data mining</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/XSD+schema+validation/default.aspx">XSD schema validation</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/validate+XML/default.aspx">validate XML</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/datarush-analytics/default.aspx">datarush-analytics</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/PMML+validation/default.aspx">PMML validation</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/XSLT+stylesheet+transformation/default.aspx">XSLT stylesheet transformation</category></item><item><title>DataRush Video Analytics</title><link>http://cs.pervasive.com/blogs/datarush/archive/2010/02/17/datarush-in-video-analytics.aspx</link><pubDate>Wed, 17 Feb 2010 14:42:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:43163</guid><dc:creator>n5712036</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2010/02/17/datarush-in-video-analytics.aspx#comments</comments><description>&lt;p&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;&lt;b&gt;Digital video&lt;/b&gt; has become the face of television, the internet and mobile devices.&lt;span&gt;&amp;nbsp; &lt;/span&gt;&lt;span style="COLOR:black;"&gt;According to an official blog post (May 2009), about 20 hours of video are introduced to the YouTube site every minute of real time.&lt;span&gt;&amp;nbsp; &lt;/span&gt;This is equivalent to Hollywood releasing over 114,000 new full-length movies into the theaters each week! &lt;/span&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;But digital video also plays a huge role in biomedical devices, surveillance and manufacturing quality &lt;span style="COLOR:black;"&gt;assurance.&lt;/span&gt;&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;Did you know&lt;span style="COLOR:black;"&gt; there are approximately eight million users sharing 10 petabytes of data (mostly media files) at any given time? This accounts for nearly 10% of the worldwide internet broadband connections [1].&lt;span&gt;&amp;nbsp; &lt;/span&gt;So how can near real-time actionable intelligence be gleaned from the vast amounts of video data being generated? One answer is to exploit the power of emerging commodity multicore computers. When used properly, each core can be used for individual threads of computations, but new software applications will need to be developed to make this happen.&amp;nbsp; &lt;span&gt;&lt;/span&gt;&lt;/span&gt;Today there is a parallel programming gap between multicore systems and software applications.&lt;span&gt;&amp;nbsp; &lt;/span&gt;With the end of the uni-processor performance gains, the average software developer will have to implement parallel programs to maintain performance growth.&lt;span&gt;&amp;nbsp; &lt;/span&gt;The goal in parallel computing is to perform multiple calculations simultaneously.&amp;nbsp; &lt;/font&gt;&lt;/font&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;The &lt;a href="http://www.pervasivedatarush.com/developers/Pages/WhatisPDR.aspx" target="_blank"&gt;Pervasive DataRush&lt;/a&gt;™ (DataRush) platform exploits multiple forms of parallelism facilitating concurrency in &lt;a title="video processing" href="http://en.wikipedia.org/wiki/Video_processing" target="_blank"&gt;&lt;b&gt;video processing&lt;/b&gt;&lt;/a&gt; and &lt;a title="video analytics" href="http://en.wikipedia.org/wiki/Video_analytics" target="_blank"&gt;&lt;b&gt;video analytics&lt;/b&gt;&lt;/a&gt; from spatial-temporal partitioning and down to the pixel level.&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;There are several paths to parallelism given languages and programming frameworks available today, but &lt;/font&gt;&lt;/font&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;a very common path to parallelism today is data parallelism.&lt;span&gt;&amp;nbsp; Data parallelism is a&lt;/span&gt; simple divide-and-conquer technique emerged from &lt;a href="http://en.wikipedia.org/wiki/SPMD" target="_blank"&gt;SPMD &lt;/a&gt;(single program, multiple data) where data is partitioned and distributed over multiple workers (nodes on a cluster, vm’s on a cloud) each running the same program.&lt;span&gt;&amp;nbsp; &lt;/span&gt;&lt;a href="http://hadoop.apache.org/" target="_blank"&gt;Hadoop&lt;/a&gt;, an open source version of &lt;a title="MapReduce" href="http://en.wikipedia.org/wiki/Map_reduce" target="_blank"&gt;&lt;b&gt;MapReduce&lt;/b&gt;&lt;/a&gt;, logically partitions the data and allocates one map task, called a Mapper, per partition. There may be hundreds of Mappers on a single machine. A single threaded legacy application can be deployed to subsets of a large scale datasets on a cloud or grid environment.&lt;span&gt;&amp;nbsp; &lt;/span&gt;A second path is coarse grain parallelism via parallelization of loops (TPL: Parallel.For, Parallel.ForEach and RParallel: runParallel), arrays (ParallelArrays and INVOKE-IN-PARALLEL)&lt;span&gt; &lt;/span&gt;and further orchestration onto multiple workers. True fine grain parallelism requires writing complex and correct multithreaded programs. Fine grain parallelism here refers to thread-level parallelism (not instruction level parallelism).&amp;nbsp;&lt;/font&gt;&lt;/font&gt; &lt;/p&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;&lt;span style="COLOR:black;"&gt;&lt;/span&gt;&lt;/font&gt;&lt;/font&gt;
&lt;p&gt;&lt;font size="3" face="Calibri"&gt;&lt;/font&gt;&amp;nbsp;&lt;/p&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;&lt;/font&gt;&lt;/font&gt;&lt;font size="3" face="Calibri"&gt;&amp;nbsp;&lt;/font&gt;&lt;font size="3" face="Calibri"&gt;&lt;a href="http://cs.pervasive.com/blogs/datarush/DataRushVideo%20Analytics.bmp"&gt;&lt;img border="0" src="http://cs.pervasive.com/blogs/datarush/DataRushVideo%20Analytics.bmp" alt="" /&gt;&lt;/a&gt;&lt;/font&gt;&amp;nbsp; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;font size="3" face="Calibri"&gt;This figure&amp;nbsp;is a cartoon depiction of&amp;nbsp;a data pipeline for Video Object Detection using principal component analysis (PCA) for background subtraction.&amp;nbsp; By projecting the original frame onto its eigenspace and subtracting projected image from original image, foreground objects are clearly identifiable.&lt;/font&gt;&amp;nbsp; This work is based on Yilmaz et al (2006). A white paper&amp;nbsp;detailing this work can be found &lt;a title="White Paper" href="http://www.pervasivedatarush.com/developers/Pages/ArticlesforDevelopers.aspx" target="_blank"&gt;here&lt;/a&gt;. 
&lt;p&gt;&lt;span style="LINE-HEIGHT:115%;FONT-FAMILY:&amp;#39;Calibri&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:11pt;"&gt;Video&amp;nbsp;analytics can also be used in medicine for guided surgery and video tele-monitoring of patients.&amp;nbsp; A &amp;nbsp;use&amp;nbsp;case (see Figure 2 below)&amp;nbsp;and task in&amp;nbsp;this video processing&amp;nbsp;pipeline is the identification of regions of interests&amp;nbsp;for physicians and clinicians&amp;nbsp; decision support.&amp;nbsp; &lt;/span&gt;&lt;span style="LINE-HEIGHT:115%;FONT-FAMILY:&amp;#39;Calibri&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:11pt;"&gt;DataRush parallelism has being applied in experimentations with&amp;nbsp;K-Means clustering of digital colposcopic images to identify acetic acid enhanced pre-cancerous legions (highlighted in red below).&amp;nbsp;&amp;nbsp;&amp;nbsp;This&amp;nbsp;image analysis&amp;nbsp;can be applied concurrently to individual video frames in order to identify and label ROI&amp;#39;s.&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="LINE-HEIGHT:115%;FONT-FAMILY:&amp;#39;Calibri&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:11pt;"&gt;&amp;nbsp;&lt;a href="http://cs.pervasive.com/blogs/datarush/DataRushROI_inDigital%20Colposcopy.bmp"&gt;&lt;img style="WIDTH:715px;HEIGHT:292px;" border="0" src="http://cs.pervasive.com/blogs/datarush/DataRushROI_inDigital%20Colposcopy.bmp" width="809" height="310" alt="" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="LINE-HEIGHT:115%;FONT-FAMILY:&amp;#39;Calibri&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:11pt;"&gt;The &lt;a title="Pervasive DataRush" href="http://www.codeproject.com/KB/showcase/PervasiveDataRush.aspx" target="_blank"&gt;&lt;b&gt;DataRush&lt;/b&gt;&lt;/a&gt; platform is designed specifically to fully utilize emerging commodity multicore computers. It addresses gaps in design time cost, programming, parallelism, scalability and performance/watt, enabling rapid prototyping of video processing applications.&lt;span&gt;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="LINE-HEIGHT:115%;FONT-FAMILY:&amp;#39;Calibri&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:11pt;"&gt;&lt;span&gt;The volumes of video streaming onto television, computers and mobile are forcing video processing onto cloud and distributed environments. Current cloud and grid computing platforms are still not capable of real time processing.&amp;nbsp; Video processing is inherently parallel and the current solutions mostly leverage data parallelism.&amp;nbsp; Emergent fine grain parallelism in video processing exploits concurrency at slice-level, frame-level, intra-frame and pixel-level operations. Such fine granularity has been traditionally achieved using video encoding hardware. This hardware based approach lacks flexibility.&amp;nbsp;Our approach introduces&amp;nbsp;a Video processing development platform to exploit multiple levels of parallelism while facilitating rapid development of agile and adaptive video analytical models.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="LINE-HEIGHT:115%;FONT-FAMILY:&amp;#39;Calibri&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:11pt;"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style="LINE-HEIGHT:115%;FONT-FAMILY:&amp;#39;Calibri&amp;#39;,&amp;#39;sans-serif&amp;#39;;FONT-SIZE:11pt;"&gt;&lt;span&gt;&lt;/span&gt;&lt;/span&gt;&amp;nbsp;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=43163" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/video+processing/default.aspx">video processing</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/video+analytics/default.aspx">video analytics</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/parallel+image+processing/default.aspx">parallel image processing</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/digital+video/default.aspx">digital video</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/data+parallelism/default.aspx">data parallelism</category></item><item><title>What could you do with a 100x performance improvement?</title><link>http://cs.pervasive.com/blogs/datarush/archive/2010/02/09/what-could-you-do-with-a-100x-improvement-in-performance.aspx</link><pubDate>Tue, 09 Feb 2010 13:46:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:43079</guid><dc:creator>azeemj</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2010/02/09/what-could-you-do-with-a-100x-improvement-in-performance.aspx#comments</comments><description>&lt;p&gt;Traditionally &lt;a title="Java Performance" href="http://en.wikipedia.org/wiki/Java_performance" target="_blank"&gt;Java Performance&lt;/a&gt;&amp;nbsp;has always been a bit of a misnomer.&amp;nbsp; In the early days of Java 1.1, performance was a secondary consideration after ease of programming and usability.&amp;nbsp; But the last few years have seen some amazing performance enhancements in the JVM.&amp;nbsp; &lt;a title="Escape Analysis" href="http://en.wikipedia.org/wiki/Escape_analysis" target="_blank"&gt;Escape Analysis&lt;/a&gt;, &lt;a title="Compressed Oops" href="http://wikis.sun.com/display/HotSpotInternals/CompressedOops"&gt;Compressed References&lt;/a&gt;, and other &lt;a title="JDK7 performance enhancements" href="http://java.sun.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html"&gt;JDK7 performance enhancements&lt;/a&gt;.&amp;nbsp; I&amp;#39;m not even including the myriad of smaller changes that the Sun JVM engineers are working on.&lt;/p&gt;
&lt;p&gt;As a Java developer, you don&amp;#39;t have to wait for JVM improvements to get better performance out of your code.&amp;nbsp; There are best practices on how to use the Java language (&lt;a title="Java Concurrency" href="http://jeremymanson.blogspot.com/" target="_blank"&gt;Java Concurrency&lt;/a&gt;, &lt;a title="Java Performance Tuning" href="http://www.javaperformancetuning.com/" target="_blank"&gt;Java Performance Tuning&lt;/a&gt;) and specialized hardware such as &lt;a title="Azul Systems" href="http://azulsystems.com/" target="_blank"&gt;Azul Systems&lt;/a&gt; that can help.&amp;nbsp; But what if after doing all sorts of optimizations you&amp;#39;re code STILL isn&amp;#39;t running as fast as you want it?&amp;nbsp; Especially with regards to using multiple cores, which can be difficult to fully utilize.&lt;/p&gt;
&lt;p&gt;The DataRush team has been working on DataRush 5.0 which is scheduled to be released sometime in the middle of 2010.&amp;nbsp; Until the release of the new version of DataRush, I thought that showing you some of the performance that we&amp;#39;re seeing with the current builds.&amp;nbsp; All of the applications were run on the following system config:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;4p/24c AMD Opteron 8435 2.6GHz&lt;/li&gt;
&lt;li&gt;64gb DDR2&lt;/li&gt;
&lt;li&gt;Windows 2008 R2 64bit&lt;/li&gt;
&lt;li&gt;Java6_u16 64bit Server JVM&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;The following algorithms were run:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a title="Naive bayes" href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier" target="_blank"&gt;Naive Bayes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a title="Kmeans" href="http://en.wikipedia.org/wiki/Kmeans" target="_blank"&gt;Kmeans&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;The run time for each algorithm were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Naive Bayes&lt;/b&gt;&lt;/li&gt;
&lt;ul&gt;
&lt;li&gt;Learner - 3.6 seconds&lt;/li&gt;
&lt;li&gt;Predictor - 7.8 seconds&lt;/li&gt;&lt;/ul&gt;
&lt;li&gt;&lt;b&gt;Kmeans&lt;/b&gt; - 3.2 seconds&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;
&lt;p&gt;The DataRush engine fully utilized all twenty four cores through the complete run!&amp;nbsp; The Naive Bayes algorithm was run on an 8gb data file, while the Kmeans was run with a 3.2gb data file.&amp;nbsp; There are more algorithms that the DataRush team is working on and these are the first set of impressive results.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;What does this mean for your business?&amp;nbsp; Imagine being able to do complex calculations, analytic and other algorithms in seconds!&amp;nbsp; Instead of waiting to run calculations overnight, you could run computations as needed and as often as needed.&amp;nbsp; Update models in near real time, taking different inputs and view results as they happen to make better business decisions based on your whole data set instead of a sample.&amp;nbsp; If this sounds interesting, head over to &lt;a href="http://www.pervasivedatarush.com/"&gt;Pervasive DataRush&lt;/a&gt; to get your two week trial of DataRush.&lt;br /&gt;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=43079" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/java+virtual+machine/default.aspx">java virtual machine</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/naive+bayes/default.aspx">naive bayes</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/JVM/default.aspx">JVM</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Java+performance/default.aspx">Java performance</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Kmeans/default.aspx">Kmeans</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/algorithms/default.aspx">algorithms</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/java+code+optimization/default.aspx">java code optimization</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/java+performance+enhancements/default.aspx">java performance enhancements</category></item><item><title>Pervasive Software Among 10 IT Companies to Watch in 2010</title><link>http://cs.pervasive.com/blogs/datarush/archive/2010/01/06/pervasive-datarush-among-10-it-companies-to-watch-in-2010.aspx</link><pubDate>Wed, 06 Jan 2010 16:35:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:42693</guid><dc:creator>livey</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2010/01/06/pervasive-datarush-among-10-it-companies-to-watch-in-2010.aspx#comments</comments><description>&lt;p&gt;Just days into 2010 and &lt;span style="FONT-WEIGHT:bold;"&gt;Robin Bloor&lt;/span&gt; includes Pervasive among &lt;span style="FONT-WEIGHT:bold;"&gt;’&lt;/span&gt;&lt;b&gt;&lt;a style="FONT-WEIGHT:bold;" href="http://www.thevirtualcircle.com/2010/01/10-it-companies-to-watch-in-2010/" target="_blank"&gt;10 IT Companies to Watch in 2010&lt;/a&gt;&lt;/b&gt;&lt;span style="FONT-WEIGHT:bold;"&gt;’&lt;/span&gt; for the second year running.&amp;nbsp; Pervasive lands second on his list and Bloor believes we will be “worth watching this year.”&lt;br /&gt;&lt;br /&gt;Although &lt;span style="FONT-WEIGHT:bold;"&gt;Pervasive Software&lt;/span&gt; is comprised of four innovative technologies, Bloor praises &lt;span style="FONT-WEIGHT:bold;"&gt;Pervasive DataRush&lt;/span&gt; as the “so-far-little-recognized jewel” due to its &lt;span style="FONT-WEIGHT:bold;"&gt;parallel processing&lt;/span&gt; capabilities.&amp;nbsp; &lt;b&gt;&lt;a href="http://www.pervasivedatarush.com/" target="_blank"&gt;Pervasive DataRush&lt;/a&gt;&lt;/b&gt; released the general availability of its &lt;b&gt;&lt;a href="http://www.pervasivedatarush.com/developers/Pages/WhatisPDR.aspx"&gt;processing engine&lt;/a&gt;&lt;/b&gt; in March, 2009.&amp;nbsp; Bloor adds that “there are not many such engines and as we advance further into the world of multicore CPUs, every vendor that has a parallel engine is likely to experience strong demand. Parallel engines improve processing speeds by one or two orders of magnitude, bringing down query responses (for example) from &lt;b&gt;&lt;a href="http://www.pervasivedatarush.com/solutions/Pages/PervasiveDataRushLogisticsUseCase.aspx" target="_blank"&gt;hours to minutes&lt;/a&gt;&lt;/b&gt;.”&amp;nbsp; &lt;span style="FONT-WEIGHT:bold;"&gt;Early adopters&lt;/span&gt; of the revolutionary Pervasive DataRush engine are taking advantage of the impressive speed ups and beginning to use DataRush to enable their own applications.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The Pervasive DataRush team worked diligently and built &lt;span style="FONT-WEIGHT:bold;"&gt;data-intensive applications&lt;/span&gt; useful in data services, data mining, and predictive analytics that were also released last year.&amp;nbsp; &lt;b&gt;&lt;a href="http://www.pervasivedatarush.com/solutions/Pages/DataMatcher.aspx" target="_blank"&gt;Pervasive DataMatcher&lt;/a&gt; &lt;/b&gt;helps organizations detect fraud through duplicate records.&amp;nbsp; &lt;b&gt;&lt;a href="http://www.pervasivedatarush.com/datamining/Pages/PDR_RecommenderSystem.aspx" target="_blank"&gt;Pervasive RushRecommender&lt;/a&gt; &lt;/b&gt;is a scalable, dataflow implementation of collaborative filtering based on weighted co-clustering that provides organizations insight to their customer needs.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;Dr. Nena Marín, Pervasive DataRush Chief Scientist, co-presented with The University of Texas at the &lt;b&gt;&lt;a href="http://www.kddcup-orange.com/" target="_blank"&gt;KDD Cup 2009&lt;/a&gt;&lt;/b&gt; in Paris, France.&amp;nbsp; The co-authored paper, titled “Pervasive Parallelism in Data Mining: Dataflow Solution to Co-clustering Large and Sparse Netflix Data,” was selected from among 686 total submissions and detailed work to deliver performance improvements in the Netflix recommender system running a computationally intensive co-clustering algorithm.&amp;nbsp; The successful duo went on to present accurate prediction of customer behavior at Predictive Analytics World 2009 titled &amp;quot;&lt;b&gt;&lt;a href="http://www.predictiveanalyticsworld.com/dc/2009/agenda.php#day1-18" target="_blank"&gt;Churn, Baby, Churn&lt;/a&gt;&lt;/b&gt;: Fast Scoring on Large Telecom Dataset.&amp;quot;&lt;br /&gt;&lt;br /&gt;Stay on the lookout as the Pervasive DataRush team uncovers new analytic applications in healthcare and cyber security, and new academic alliance advancements with &lt;a href="http://www.tacc.utexas.edu/" target="_blank"&gt;TACC&lt;/a&gt; in the new year.&amp;nbsp; Also on the roadmap is improved speed and content filtering.&amp;nbsp; We continue to address the gap between proliferating &lt;span style="FONT-WEIGHT:bold;"&gt;multicore processors&lt;/span&gt; and exploding volumes of data to get you the information you need quickly.&amp;nbsp; Please &lt;a href="mailto:info@pervasivedatarush.com" target="_blank"&gt;email us&lt;/a&gt; if you&amp;#39;d like to qualify for the Early Adopters Program.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=42693" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Pervasive+DataRush/default.aspx">Pervasive DataRush</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/parallel+processing/default.aspx">parallel processing</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush+engine/default.aspx">DataRush engine</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Early+adopters/default.aspx">Early adopters</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/multicore+processors/default.aspx">multicore processors</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/10+IT+Companies+to+Watch+in+2010/default.aspx">10 IT Companies to Watch in 2010</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/data-intensive+applications/default.aspx">data-intensive applications</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Robin+Bloor/default.aspx">Robin Bloor</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Pervasive+Software/default.aspx">Pervasive Software</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/KDD+Cup/default.aspx">KDD Cup</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Predictive+Analytics+World/default.aspx">Predictive Analytics World</category></item><item><title>Of teraflops and terabytes</title><link>http://cs.pervasive.com/blogs/datarush/archive/2010/01/06/of-teraflops-and-terabytes.aspx</link><pubDate>Wed, 06 Jan 2010 15:45:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:42692</guid><dc:creator>kirwin</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2010/01/06/of-teraflops-and-terabytes.aspx#comments</comments><description>&lt;p&gt;Before the holidays, I attended SC &amp;#39;09, this year&amp;#39;s supercomputing conference.&amp;nbsp; While supercomputing has traditionally been the domain of academia, the needs of business and scientific computing are converging.&amp;nbsp; The datasets and processing needs of companies are growing such that the sizes of the problems addressed in both worlds are approaching the same.&amp;nbsp; Hardware vendors are producing solutions which address both the need to store large volumes of data and to provide large amounts of computational power.&amp;nbsp; But to do anything, the data and the computation must - at some point in time - be co-located.&amp;nbsp; With conventional server computing, this isn&amp;#39;t really a problem, since everything is local to the machine.&amp;nbsp; In the more scalable architectures, however, data can be non-local.&amp;nbsp; Furthermore, the cost of accessing remote data can be great (and even variable in grid-based architectures).&amp;nbsp; So can you bring&amp;nbsp;the terabytes and teraflops together to solve your problems?&lt;/p&gt;
&lt;p&gt;The traditional supercomputing solution is message passing (MPI), moving the data to the computation.&amp;nbsp; Nodes send messages to and receive messages from each other as needed to exchange data.&amp;nbsp; A very flexible, low-level approach.&amp;nbsp; But you also want to make sure you spend most of your time processing the data, not communicating with other nodes.&amp;nbsp;&amp;nbsp;I attended a number of presentations discussing aspects of this, such as: optimizing the I/O during the initialization phase, load balancing the workload, assigning work to nodes to minimize data transfer costs, utilizing asynchronous messaging to reduce stalls in processing.&lt;/p&gt;
&lt;p&gt;Of course, the opposite approach is possible too, moving the computation to the data.&amp;nbsp;In the ideal case, this what happens in map/reduce models like Hadoop.&amp;nbsp; The computational and storage architecture is one; data is spread across the nodes, with each map running on local data.&amp;nbsp; This means that distribution of the data becomes part of the storage - it doesn&amp;#39;t completely disappear, but becomes a one-time, up-front cost.&lt;/p&gt;
&lt;p&gt;And what about dataflow?&amp;nbsp;&amp;nbsp; As the name suggests, in the dataflow model the data moves.&amp;nbsp; In fact, you can view dataflow as a more structured form of MPI.&amp;nbsp; However, this structure provides one thing for free - pipelining allows computation and communication to overlap without special programmer effort.&amp;nbsp; This works equally well for both disk I/O - you can get good throughput on sequential reads from disk - and for cross-node communication.&lt;/p&gt;
&lt;p&gt;None of these are a universal solution.&amp;nbsp; Each has strengths and weaknesses.&amp;nbsp; As stated, MPI provides flexibility, but also places most of the responsibility on the programmer.&amp;nbsp; On the other hand, while map/reduce and dataflow do much of this work transparently, they both require thinking about problems in a way which fit the paradigm - which may not be the intuitive approach to the solution.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=42692" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/dataflow+MPI+map_2F00_reduce+teraflops+terabytes/default.aspx">dataflow MPI map/reduce teraflops terabytes</category></item><item><title>Code coverage tool for Datarush </title><link>http://cs.pervasive.com/blogs/datarush/archive/2009/12/21/code-coverage-tool-for-datarush.aspx</link><pubDate>Mon, 21 Dec 2009 20:01:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:42607</guid><dc:creator>wtian</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2009/12/21/code-coverage-tool-for-datarush.aspx#comments</comments><description>&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;The&amp;nbsp;&lt;a title="What is PDR?" href="http://www.pervasivedatarush.com/developers/Pages/WhatisPDR.aspx" target="_blank"&gt;Datarush&lt;/a&gt; project is built with &lt;/font&gt;&lt;a href="http://maven.apache.org/what-is-maven.html" target="_blank"&gt;&lt;font color="#800080" size="3" face="Calibri"&gt;Maven 2&lt;/font&gt;&lt;/a&gt;&lt;font size="3" face="Calibri"&gt; and integrated with &lt;/font&gt;&lt;a href="http://hudson-ci.org/" target="_blank"&gt;&lt;font color="#800080" size="3" face="Calibri"&gt;Hudson&lt;/font&gt;&lt;/a&gt;&lt;font size="3" face="Calibri"&gt;. Maven is a tool used for building and managing any Java-based project. Hudson provides automated, continuous build integration system, making it easier for developers to integrate changes to the project. &lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font face="Calibri"&gt;&lt;font size="3"&gt;We need a good code coverage tool for the Datarush project.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;/font&gt;&lt;a href="http://en.wikipedia.org/wiki/Code_coverage" target="_blank"&gt;&lt;font color="#800080" size="3" face="Calibri"&gt;Code coverage&lt;/font&gt;&lt;/a&gt;&lt;font size="3" face="Calibri"&gt; &lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;is a measure used in software testing. It is a form of white box testing. Basic code coverage criteria include function coverage, statement coverage, branch coverage, condition coverage, etc.&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;a href="http://emma.sourceforge.net/" target="_blank"&gt;&lt;font color="#800080" size="3" face="Calibri"&gt;EMMA&lt;/font&gt;&lt;/a&gt;&lt;font size="3" face="Calibri"&gt; is a free good java code coverage tool, but there is no Maven 2 plugin for EMMA now. EMMA’s current edition was publish in 2006 and hasn’t been developed for several years.&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;a href="http://docs.codehaus.org/display/SONAR/Documentation" target="_blank"&gt;&lt;font color="#800080" size="3" face="Calibri"&gt;Sonar&lt;/font&gt;&lt;/a&gt;&lt;font size="3" face="Calibri"&gt; is another good code coverage tool. It requires a relational database for storage of measures data, but not able to keep report history. Sonar’s technical architecture includes cobertura , so it generates similar coverage reports &lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;as cobertura tool. The sonar web server requires large RAM to run efficiently.&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;We chose Cobertura as code coverage tool for our Datarush project. &lt;/font&gt;&lt;a href="http://cobertura.sourceforge.net/" target="_blank"&gt;&lt;font color="#800080" size="3" face="Calibri"&gt;Cobertura&lt;/font&gt;&lt;/a&gt;&lt;font size="3" face="Calibri"&gt; &lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;is a free Java code coverage tool that calculates the percentage of code accessed by tests. It can be used to identify which parts of your Java program are lacking test coverage. Its coverage metric shows the code coverage percentage of packages, files, classes, methods, lines, and conditionals. Cobertura is based on jcoverage. &lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Cobertura can be used from Maven 2. &lt;/font&gt;&lt;a href="http://mojo.codehaus.org/cobertura-maven-plugin/index.html"&gt;&lt;font color="#800080" size="3" face="Calibri"&gt;Cobertura Maven Plugin &lt;/font&gt;&lt;/a&gt;&lt;font face="Calibri"&gt;&lt;font size="3"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;provides the features of Cobertura within the Maven environment.&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Example of basic configuration:&lt;/font&gt;&lt;/p&gt;&lt;font size="3" face="Calibri"&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&amp;nbsp;&amp;lt;project&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;.&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;build&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&amp;lt;plugins&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span style="mso-tab-count:1;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&amp;lt;plugin&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="mso-tab-count:1;"&gt;&lt;/span&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&amp;lt;groupId&amp;gt;org.codehaus.mojo&amp;lt;/groupId&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;span style="mso-tab-count:1;"&gt;&lt;/span&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;&amp;lt;artifactId&amp;gt;&lt;u&gt;cobertura&lt;/u&gt;-&lt;u&gt;maven&lt;/u&gt;-&lt;u&gt;plugin&lt;/u&gt;&amp;lt;/artifactId&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="mso-tab-count:1;"&gt;&amp;nbsp; &lt;/span&gt;&amp;lt;version&amp;gt;2.3&amp;lt;/version&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&amp;lt;configuration&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="mso-tab-count:1;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/span&gt;&amp;lt;formats&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="mso-tab-count:2;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&amp;lt;format&amp;gt;&lt;u&gt;html&lt;/u&gt;&amp;lt;/format&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="mso-tab-count:2;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&amp;lt;format&amp;gt;&lt;u&gt;xml&lt;/u&gt;&amp;lt;/format&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&amp;lt;/formats&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;/font&gt; 
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="mso-tab-count:2;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/span&gt;&amp;lt;/configuration&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/plugin&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;/span&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/span&gt;&amp;lt;/plugins&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;/span&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&amp;nbsp; &amp;nbsp;&amp;lt;/build&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;&lt;font size="3" face="Calibri"&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;/span&gt;&lt;/font&gt;&lt;font size="3" face="Calibri"&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font face="Times New Roman"&gt;&amp;lt;/project&amp;gt;&lt;/font&gt;&lt;/span&gt;&lt;/font&gt;&lt;/p&gt;&lt;span style="mso-bidi-font-family:&amp;#39;Courier New&amp;#39;;"&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;The report generated by this plugin is the result of executing the Cobertura tool against your compiled classes, and can be used to identify which parts of your java program are lacking test coverage.&lt;/font&gt;&lt;/font&gt;&lt;/span&gt; 
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Cobertura can be integrated with &lt;/font&gt;&lt;a href="http://wiki.hudson-ci.org/display/HUDSON/Meet+Hudson"&gt;&lt;font color="#800080" size="3" face="Calibri"&gt;Hudson&lt;/font&gt;&lt;/a&gt;&lt;font size="3" face="Calibri"&gt; for building/testing software projects continuously.&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;font size="3" face="Calibri"&gt;Here is&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;a sample cobertura code coverage report from Hudson:&lt;/font&gt;&lt;/p&gt;
&lt;p style="MARGIN:0in 0in 10pt;" class="MsoNormal"&gt;&lt;a href="http://cs.pervasive.com/blogs/datarush/sample.PNG"&gt;&lt;img border="0" src="http://cs.pervasive.com/blogs/datarush/sample.PNG" width="1290" height="761" alt="" /&gt;&lt;/a&gt;&lt;a href="http://cs.pervasive.com/blogs/datarush/sample.PNG"&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=42607" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush/default.aspx">DataRush</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/maven/default.aspx">maven</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/cobertura/default.aspx">cobertura</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/hudson/default.aspx">hudson</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/code+coverage/default.aspx">code coverage</category></item><item><title>Scalable Multivariate Linear Regression Enabled by DataRush</title><link>http://cs.pervasive.com/blogs/datarush/archive/2009/12/18/scalable-multivariate-linear-regression-enabled-by-datarush.aspx</link><pubDate>Fri, 18 Dec 2009 20:24:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:42599</guid><dc:creator>yangyou</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2009/12/18/scalable-multivariate-linear-regression-enabled-by-datarush.aspx#comments</comments><description>&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt; is a statistical term defined by Merriam-Webster&amp;#39;s dictionary to be: &amp;quot;A&amp;nbsp;functional relationship between two or more correlated variables that is often empirically determined from data and is used especially to predict values of one variable when given values of the others.&amp;nbsp;&amp;quot;The regression value of y on x is linear&amp;quot;;&amp;nbsp; specifically: a function that yields the mean value of a random variable under the condition that one or more independent variables have specified values.&amp;quot;&lt;/p&gt;
&lt;p&gt;Using DataRush, we implemented &lt;a href="http://en.wikipedia.org/wiki/Linear_regression" target="_blank"&gt;Multivariate Linear Regression&lt;/a&gt; using &lt;a href="http://en.wikipedia.org/wiki/Ordinary_least_squares" target="_blank"&gt;Ordinary Least Squares&lt;/a&gt;. Basically&amp;nbsp;OLS &amp;quot;minimizes the sum of squared distances between the observed responses in a set of data, and the fitted responses from the regression model&amp;quot;.&lt;/p&gt;
&lt;p&gt;Trying to find coefficients (beta&amp;#39;s) based on minimizing the sum of squares isn&amp;#39;t straightforward or simple. However, it can be shown that (cf. Multiple Regression with a Singular Matrix, by M. J. R. Healy) the betas can be estimated by solving an upper triangular square matrix equation consisting of inner products between all the columns.&lt;/p&gt;
&lt;p&gt;The DataRush implementation can handle&amp;nbsp;virtually unlimited data sets and had good scalability on the number of cores, as evidenced by the following images, processing US census demographics dataset containing 63 columns. Compared to single threaded implementations, the speed up is significant (as much as&amp;nbsp;50x on a 24 core system).&lt;/p&gt;
&lt;p&gt;&lt;img style="WIDTH:694px;HEIGHT:456px;" title="Scalable Multivariate Linear Regression" alt="Scalable Multivariate Linear Regression" src="http://cs.pervasive.com/blogs/datarush/mlr_scale_core.png" width="694" height="456" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img style="WIDTH:730px;HEIGHT:513px;" src="http://cs.pervasive.com/blogs/datarush/mlr_scale_cpu.png" width="730" height="513" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&amp;nbsp;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=42599" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush/default.aspx">DataRush</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/scalable/default.aspx">scalable</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/regression/default.aspx">regression</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/parallelization/default.aspx">parallelization</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/multivariate+linear+regression/default.aspx">multivariate linear regression</category></item><item><title>Government Data Mining Opportunities Abound </title><link>http://cs.pervasive.com/blogs/datarush/archive/2009/12/14/government-data-mining-opportunities-abound.aspx</link><pubDate>Mon, 14 Dec 2009 21:17:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:42571</guid><dc:creator>Richard Maddox</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2009/12/14/government-data-mining-opportunities-abound.aspx#comments</comments><description>&lt;p&gt;&lt;font size="3" face="Calibri"&gt;&lt;strong&gt;Government data&lt;/strong&gt;, when made available and presented effectively, can provide useful information to organizations and citizens alike. &lt;i style="mso-bidi-font-style:normal;"&gt;Star News Online &lt;/i&gt;presents compelling possibilities in the article, &lt;/font&gt;&lt;a href="http://www.starnewsonline.com/article/20091207/ZNYT01/912073001/-1/ARCHIVE?p=1" target="_blank"&gt;&lt;font size="3" face="Calibri"&gt;“Local Governments Offer Data to Miners.”&lt;/font&gt;&lt;/a&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt; From school violence to bicycle accidents to property values, public-sector data is increasingly available, wide-ranging and voluminous. &lt;/font&gt;&lt;/font&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;Claire Cain Miller writes that government data in the hands of developers could result in applications to &lt;strong&gt;improve data usefulness&lt;/strong&gt;, particularly for the public.&amp;nbsp;&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt;&lt;span style="mso-spacerun:yes;"&gt;&lt;/span&gt;While there are some government organizations that do not offer public data to entrepreneurs, others are open to doing so if it can improve the lives of citizens. &lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&lt;/span&gt;Even more, there is a growing consensus among public administrators to make data disclosure an official policy. While this is mostly taking place on the state and local levels, Miller relays that “the White House is about to publish a directive expected to give similar instructions to federal agencies.” &lt;/font&gt;&lt;/font&gt;&lt;font size="3" face="Calibri"&gt;A major point of this article, expressed by the executive director of an ecological nonprofit, says, “On detailed data dealing with very complicated material, you really have to know what you’re looking for in order to distinguish between good data and junk data.” &lt;/font&gt;&lt;font size="3" face="Calibri"&gt;Even more important, patterns and trends in data can only be unlocked with &lt;em&gt;effective&lt;/em&gt; &lt;strong&gt;data mining&lt;/strong&gt;. &lt;/font&gt;&lt;br /&gt;&lt;br /&gt;&lt;font size="3" face="Calibri"&gt;With high-powered commodity &lt;strong&gt;multicore&lt;/strong&gt; hardware now available, the right technology software platform can unlock patterns in even very large amounts of data at speeds unimaginable until now. Data Miners no longer need to rely on a sample set of large data, but can run terabytes+ in minutes on multicore servers.&amp;nbsp; As a major player in the multicore revolution, &lt;strong&gt;Pervasive DataRush&lt;/strong&gt; offers a rich, high-performance, &lt;/font&gt;&lt;a href="http://www.pervasivedatarush.com/datamining/Pages/default.aspx" target="_blank"&gt;&lt;font size="3" face="Calibri"&gt;automatically scalable architecture for data mining&lt;/font&gt;&lt;/a&gt;&lt;font size="3" face="Calibri"&gt; applications. You can also &lt;/font&gt;&lt;a href="http://www.pervasivedatarush.com/industries/Pages/PublicSector.aspx" target="_blank"&gt;&lt;font size="3" face="Calibri"&gt;learn more&lt;/font&gt;&lt;/a&gt;&lt;font size="3"&gt;&lt;font face="Calibri"&gt; about Pervasive DataRush in the public sector.&lt;/font&gt;&lt;/font&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=42571" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Development+articles/default.aspx">Development articles</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Pervasive+DataRush/default.aspx">Pervasive DataRush</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Multicore/default.aspx">Multicore</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/data+mining/default.aspx">data mining</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush+applications/default.aspx">DataRush applications</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/improve+data+usefulness/default.aspx">improve data usefulness</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/government+data/default.aspx">government data</category></item><item><title>DataRush Line-By-Line: Hello, World!</title><link>http://cs.pervasive.com/blogs/datarush/archive/2009/12/10/datarush-line-by-line-hello-world.aspx</link><pubDate>Thu, 10 Dec 2009 22:29:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:42536</guid><dc:creator>jhefner</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2009/12/10/datarush-line-by-line-hello-world.aspx#comments</comments><description>&lt;p&gt;Let’s take a look at a basic &lt;strong&gt;&amp;quot;Hello, World!&amp;quot;&lt;/strong&gt; &lt;a title="DataRush Application" href="http://www.pervasivedatarush.com/solutions/Pages/DataRush.aspx" target="_blank"&gt;DataRush application&lt;/a&gt; (click to download DataRush now!) and dissect it line-by-line.&amp;nbsp; Below is the code we will be analyzing.&amp;nbsp; See if you can get a feel for what is going on in the application; and, then scroll down to see how your guesses match up with what the application is actually doing.&lt;/p&gt;
&lt;p&gt;==== SNIP ====&lt;br /&gt;&lt;code&gt;/*01*/ import com.pervasive.datarush.flows.RecordFlow;&lt;br /&gt;/*02*/ import com.pervasive.datarush.operators.ApplicationGraph;&lt;br /&gt;/*03*/ import com.pervasive.datarush.operators.DataflowNodeBase;&lt;br /&gt;/*04*/ import com.pervasive.datarush.operators.OperatorFactory;&lt;br /&gt;/*05*/ import com.pervasive.datarush.operators.io.textfile.ReadDelimitedText;&lt;br /&gt;/*06*/ import com.pervasive.datarush.operators.io.textfile.WriteDelimitedText;&lt;br /&gt;/*07*/ import com.pervasive.datarush.ports.RecordInput;&lt;br /&gt;/*08*/ import com.pervasive.datarush.ports.RecordOutput;&lt;br /&gt;/*09*/ import com.pervasive.datarush.ports.StringInputField;&lt;br /&gt;/*10*/ import com.pervasive.datarush.ports.StringOutputField;&lt;br /&gt;/*11*/ &lt;br /&gt;/*12*/ &lt;br /&gt;/*13*/ public class HelloWorld extends DataflowNodeBase {&lt;br /&gt;/*14*/ &lt;br /&gt;/*15*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public static void main(String[] args) {&lt;br /&gt;/*16*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ApplicationGraph app = OperatorFactory.newApplicationGraph();&lt;br /&gt;/*17*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RecordFlow names = app.add(new ReadDelimitedText(&amp;quot;names.txt&amp;quot;)).getOutput();&lt;br /&gt;/*18*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RecordFlow hellos = app.add(new HelloWorld(names)).getOutput();&lt;br /&gt;/*19*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app.add(new WriteDelimitedText(hellos, &amp;quot;hellos.txt&amp;quot;));&lt;br /&gt;/*20*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; app.run();&lt;br /&gt;/*21*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;/*22*/ &lt;br /&gt;/*23*/ &lt;br /&gt;/*24*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final RecordInput input;&lt;br /&gt;/*25*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; private final RecordOutput output;&lt;br /&gt;/*26*/ &lt;br /&gt;/*27*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public HelloWorld(RecordFlow names) {&lt;br /&gt;/*28*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.input = newRecordInput(names);&lt;br /&gt;/*29*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.output = newRecordOutput(this.input.getType());&lt;br /&gt;/*30*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;/*31*/ &lt;br /&gt;/*32*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; @Override&lt;br /&gt;/*33*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public void execute() {&lt;br /&gt;/*34*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final StringInputField name = (StringInputField)this.input.getField(0);&lt;br /&gt;/*35*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final StringOutputField hello = (StringOutputField)this.output.getField(0);&lt;br /&gt;/*36*/ &lt;br /&gt;/*37*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (this.input.stepNext()) {&lt;br /&gt;/*38*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hello.set(&amp;quot;Hello, &amp;quot; + name.asString() + &amp;quot;!&amp;quot;);&lt;br /&gt;/*39*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.output.push();&lt;br /&gt;/*40*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;/*41*/ &lt;br /&gt;/*42*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.output.pushEndOfData();&lt;br /&gt;/*43*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;/*44*/ &lt;br /&gt;/*45*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; public RecordFlow getOutput() {&lt;br /&gt;/*46*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return this.output.getFlow();&lt;br /&gt;/*47*/&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;/*48*/ }&lt;br /&gt;&lt;/code&gt;==== SNIP ====&lt;/p&gt;
&lt;p&gt;OK, have an idea about what’s going here?&amp;nbsp; Let’s break things down.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 13:&lt;/strong&gt; Our class is aptly named HelloWorld.&amp;nbsp; We extend the DataflowNodeBase class, which is one of two primary base classes you’ll encounter when writing &lt;strong&gt;DataRush applications&lt;/strong&gt;.&amp;nbsp; The other base class is DataflowGraphBase.&amp;nbsp; The difference between nodes and graphs is that nodes do work (they have an abstract &amp;quot;execute&amp;quot; method that must be overridden), while graphs hook nodes together (designating producer-consumer relationships).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 16:&lt;/strong&gt; We begin our main method by creating an &lt;strong&gt;ApplicationGraph&lt;/strong&gt; object using a factory method.&amp;nbsp; ApplicationGraph serves as a container for the components of our application. ApplicationGraph also exposes a run method which invoke once all our components have been added to the container, in order to set all of those components in motion.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 17:&lt;/strong&gt; We add our first component to our ApplicationGraph.&amp;nbsp; It is a ReadDelimitedText component (included in the standard DataRush library) which reads a text file expressed in a field-delimited format.&amp;nbsp; Its output is a stream of records that have been parsed row-by-row from the text file.&lt;/p&gt;
&lt;p&gt;The add method on ApplicationGraph adds the new component to the container and then returns the component so we can invoke the component’s getOutput method.&amp;nbsp; The RecordFlow object returned by getOutput() represents the stream of records.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 18:&lt;/strong&gt; We add our second component to our ApplicationGraph.&amp;nbsp; This is our custom HelloWorld component.&amp;nbsp; We’ll be covering its functionality in subsequent lines; but, notice that we take the output from ReadDelimitedText—the &amp;quot;names&amp;quot; RecordFlow—and feed that to HelloWorld.&amp;nbsp; Also, as with ReadDelimitedText, we invoke the new component’s getOutput method after add()-ing it to our ApplicationGraph.&amp;nbsp; The RecordFlow that getOutput() returns represents the stream of records which contain our &amp;quot;Hello, &amp;lt;name&amp;gt;!&amp;quot; messages.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 19:&lt;/strong&gt; We add our final component to our ApplicationGraph.&amp;nbsp; This is a WriteDelimitedText component (also included in the standard DataRush library) which writes a stream of records to a text file in field-delimited format.&amp;nbsp; It exposes no getOutput method because it serves as a data sink only. Notice that we take the output from HelloWorld—the &amp;quot;hellos&amp;quot; RecordFlow—and feed it to WriteDelimitedText to be written to disk.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 20:&lt;/strong&gt; We invoke the run method of our ApplicationGraph.&amp;nbsp; This method will cause all of the node components our container to do their work until all the data from our input file has been consumed.&amp;nbsp; It’s important to note that the run method will not return until all nodes have finished executing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 24:&lt;/strong&gt; We start the real contents of our HelloWorld node component.&amp;nbsp; Everything we’ve covered so far has been in our main method and has related to populating and running our ApplicationGraph.&amp;nbsp; Something to keep in mind is that the placement of the main method in our HelloWorld class is arbitrary.&amp;nbsp; We could have equivocally placed the main method in another class and our application would be unaffected.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 28:&lt;/strong&gt; We create a RecordInput object for our node to pull its input from.&amp;nbsp; In DataRush terminology, this RecordInput object is an input port.&amp;nbsp; The input port must be backed by a stream of data, so we pass its factory method our &amp;quot;name&amp;quot; RecordFlow.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 29:&lt;/strong&gt; We create a RecordOutput object for our node to push its output to.&amp;nbsp; In DataRush terminology, this RecordOutput object is an output port.&amp;nbsp; The output port must have schema or record type to base its outgoing stream of records on.&amp;nbsp; In this case, we decide to use the same record type that our input port has and we pass that type to the output port factory method.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 33:&lt;/strong&gt; We begin our node’s execute method.&amp;nbsp; This method is overridden from DataflowNodeBase and will contain the core processing logic of the node.&amp;nbsp; A node’s execute method is invoked only once during the application’s runtime and thus usually contains a loop for consuming or producing data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 34:&lt;/strong&gt; We extract from our record input port the specific field input port that yields names.&amp;nbsp; Here, we get this field input port by index (it is the 0th field); but, if you know the name of the field from the schema or record type, you could also use that name.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 35:&lt;/strong&gt; We extract from our record output port the specific field output port is used for pushing &amp;quot;Hello, &amp;lt;name&amp;gt;!&amp;quot; messages.&amp;nbsp; Here, we get this field input port by index (it is the 0th field); but, if you know the name of the field from the schema or record type, you could also use that name.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 37:&lt;/strong&gt; We start our core processing logic loop.&amp;nbsp; The stopping condition for this loop is when we’ve consumed all the data on the stream of records that backs our record input port.&amp;nbsp; All record input ports maintain a cursor on the stream of records that underlies them.&amp;nbsp; Calling the stepNext method on an input port advances this cursor so that the next record in the stream can be accessed.&amp;nbsp; When the cursor can no longer be advanced because end of data has been reached, stepNext() returns false.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 38:&lt;/strong&gt; We do three important things in this line:&lt;br /&gt;1.&amp;nbsp;We read the current value of the &amp;quot;name&amp;quot; field input port as a String.&amp;nbsp; Note that reading this value does NOT advance the cursor of our record input port.&lt;br /&gt;2.&amp;nbsp;We add a &amp;quot;Hello, &amp;quot; and &amp;quot;!&amp;quot; to the name that was read to form our &amp;quot;Hello, &amp;lt;name&amp;gt;!&amp;quot; message.&lt;br /&gt;3.&amp;nbsp;We set the current value of the &amp;quot;hello&amp;quot; field output port to our newly formed &amp;quot;Hello, &amp;lt;name&amp;gt;!&amp;quot; message. Note that setting this value does NOT actually push the value onto our output stream.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 39:&lt;/strong&gt; Having set the value for our &amp;quot;hello&amp;quot; field output port, we push the value onto our output stream of records.&amp;nbsp; The reason set() and push() are separate methods is that record output ports can contain multiple fields; and, each field is set individually before finally pushing the record.&amp;nbsp; In this sense, field output ports act as registers for a record output port. Invoking a record output port’s push method then pushes the contents of its registers to the output stream.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 42:&lt;/strong&gt; As the final action in our core processing logic, we call the pushEndOfData method on our record output port.&amp;nbsp; This call is necessary to let nodes that are downstream know when to terminate their own core processing loop.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Line 46:&lt;/strong&gt; Finally, we expose a getOutput method so that other components can consume our node’s output stream.&amp;nbsp; Because components are designed to consume streams of data, rather than the record output port we have been using to push data, we return the RecordFlow yielded by our output port’s getFlow method.&lt;/p&gt;
&lt;p&gt;Now that we’ve seen how our application works, let’s see it in action.&amp;nbsp; Running the application with a names.txt input file like this:&lt;/p&gt;
&lt;p&gt;==== SNIP ====&lt;br /&gt;&amp;quot;Alice&amp;quot;&lt;br /&gt;&amp;quot;Bob&amp;quot;&lt;br /&gt;&amp;quot;Carol&amp;quot;&lt;br /&gt;&amp;quot;World&amp;quot;&lt;br /&gt;==== SNIP ====&lt;/p&gt;
&lt;p&gt;Will yield a hellos.txt output file like this:&lt;/p&gt;
&lt;p&gt;==== SNIP ====&lt;br /&gt;&amp;quot;Hello, Alice!&amp;quot;&lt;br /&gt;&amp;quot;Hello, Bob!&amp;quot;&lt;br /&gt;&amp;quot;Hello, Carol!&amp;quot;&lt;br /&gt;&amp;quot;Hello, World!&amp;quot;&lt;br /&gt;==== SNIP ====&lt;/p&gt;
&lt;p&gt;Note that the quotes in each file are actually field-start and field-end delimiters, and are a by-product of using the default configuration for ReadDelimitedText and WriteDelimitedText.&amp;nbsp; Both components are designed by default to work with files expressed in comma-separated-value (CSV) format.&lt;/p&gt;
&lt;p&gt;That&amp;#39;s all for our line-by-line walkthrough!&amp;nbsp; Copy and paste the code to try it yourself.&amp;nbsp; I hope you learned something today; and if you have any questions or feedback please feel free to leave a comment.&lt;br /&gt;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=42536" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/ApplicationGraph/default.aspx">ApplicationGraph</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/code/default.aspx">code</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/line-by-line/default.aspx">line-by-line</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush+applications/default.aspx">DataRush applications</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/HelloWorld/default.aspx">HelloWorld</category></item><item><title>Matisse for Eclipse?</title><link>http://cs.pervasive.com/blogs/datarush/archive/2009/12/01/matisse-for-eclipse.aspx</link><pubDate>Tue, 01 Dec 2009 21:50:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:42412</guid><dc:creator>rcauble</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2009/12/01/matisse-for-eclipse.aspx#comments</comments><description>&lt;p&gt;I&amp;#39;ve been working on &lt;b&gt;KNIME&lt;/b&gt; plug-in development recently for &lt;a class="" href="http://www.pervasivedatarush.com/" target="_blank"&gt;Pervasive DataRush&lt;/a&gt;, and found myself wishing that there was a &lt;b&gt;Matisse&lt;/b&gt; for &lt;b&gt;Eclipse&lt;/b&gt;. As it turns out, as long as you are using Swing ( as is the case of KNIME ), you can use Netbeans to create a form which then runs in Eclipse! This is possible in the latest version of Matisse, since the generated java code uses GroupLayout, a standard part of the JDK.&lt;/p&gt;
&lt;p&gt;The one trick is that &lt;b&gt;Netbeans&lt;/b&gt; can only create a Matisse form in a Netbeans project. However, once created, it can be copied to an Eclipse project and then opened within Netbeans for additional editing and maintaintance.&lt;/p&gt;
&lt;p&gt;Here&amp;#39;s a full example:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;div&gt;In Netbeans ( I&amp;#39;m using 6.8 Beta ), go to File-&amp;gt;New Project, Java Application, Finish&amp;nbsp;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;div&gt;Right-click on &amp;quot;Source Packages&amp;quot;-&amp;gt;New-&amp;gt;Java Package, specify your package name, example &amp;quot;org.example&amp;quot;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;div&gt;Right-click on &amp;quot;org.example&amp;quot;-&amp;gt;New-&amp;gt;JDialog Form, specify a name, &amp;quot;ExampleDialog&amp;quot;, &amp;quot;Finish&amp;quot;&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;div&gt;You&amp;#39;re done with this Netbeans project; you can now close it&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;div&gt;Now, outside of Netbeans, copy ExampleDialog.java and ExampleDialog.form into your Eclipse project.&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;div&gt;Inside Netbeans, File-&amp;gt;Open File..., specify the location of ExampleDialog.java in your Eclipse project.&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;div&gt;You can now edit the form as you would normally.&amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;Here are a couple of&amp;nbsp;caveats to keep in-mind:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;div&gt;When running in this mode, I&amp;#39;ve sometimes noticed a bug in Netbeans where the&amp;nbsp;palette doesn&amp;#39;t show up. If this&amp;nbsp;happens, you can workaround by adding&amp;nbsp;components in the &amp;quot;Inspector&amp;quot; window. (It has a right-click menu item &amp;quot;Add From Palette&amp;quot;).&lt;/div&gt;&lt;/li&gt;
&lt;li&gt;
&lt;div&gt;&lt;b&gt;Eclipse&lt;/b&gt; doesn&amp;#39;t prevent you from editing the &amp;quot;Do Not Edit&amp;quot;&amp;nbsp;sections of the generated java, so be careful not to edit them.&lt;/div&gt;&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;The following link provides a good tutorial on using Matisse:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.javapassion.com/handsonlabs/nbguibuilder/"&gt;http://www.javapassion.com/handsonlabs/nbguibuilder/&lt;/a&gt;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=42412" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/GroupLayout/default.aspx">GroupLayout</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/KNIME/default.aspx">KNIME</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Swing/default.aspx">Swing</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Matisse/default.aspx">Matisse</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Eclipse/default.aspx">Eclipse</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/Netbeans/default.aspx">Netbeans</category></item><item><title>Parallelizing Data Compression Using DataRush</title><link>http://cs.pervasive.com/blogs/datarush/archive/2009/11/11/parallizing-data-compression-using-datarush.aspx</link><pubDate>Thu, 12 Nov 2009 05:09:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:42207</guid><dc:creator>yangyou</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2009/11/11/parallizing-data-compression-using-datarush.aspx#comments</comments><description>&lt;p&gt;&lt;b&gt;Compression &lt;/b&gt;has been a traditional computation task ever since the creation of zip and other common compression algorithms. The idea is to trade CPU time (normally a small slice) for potentially huge disk storage savings. This is particularly true for text data, as character streams do compress very well, i.e. the compression ratio is very high (~20% of original size). However, as data becomes ever larger and larger, the CPU time for compressing them increases. Certain algorithms (such as rar) allow for differing levels of compression, resulting in some extremely CPU intensive compress process when choosing &amp;quot;maximum compression&amp;quot;.&amp;nbsp; Bzip2 is a highly effective and common algorithm that typically achieve nearly 20% smaller sized outputs than gzip, however, at a cost. According to &lt;a title="Wiki" href="http://en.wikipedia.org/wiki/Bzip2#cite_note-lzmabenchmarks-1" target="_blank"&gt;wiki&lt;/a&gt;, it is &amp;quot;considerably slower (~12 times vs Deflate on typical data)&amp;quot;.&lt;/p&gt;
&lt;p&gt;To alleviate this problem and to accelerate the compression process, a natural path is through parallelization. There is&amp;nbsp;a surprising scarcity of known parallelized implementations in this area. As far as I know, only &lt;a title="bzip2" href="http://compression.ca/pbzip2/"&gt;bzip2&lt;/a&gt; and &lt;a title="7zip" href="http://www.7-zip.org/"&gt;7zip&lt;/a&gt; have implementations that are parallelized. In the case of 7zip, up to two CPU cores are utilized. The primary hindrance is likely in the difficulty of modifying existing algorithms while maintaining compression ratio and output compatibility with decompression methods. Most algorithms require a large window on the input data and process them one chunk at a time. Going inside the multistep compression method to parallelize can be daunting and error prone. Because the entire file is handled as a coherent whole, even if data were to be broken up, the resultant compressed stream can no longer be easily assembled together into a final piece that can be understood by decompressors.&lt;/p&gt;
&lt;p&gt;An effective alternative is parallelizing by treating input file as a concatenation of multiple smaller files. When each of the partitions are fed into separate compressors, they operate as if on a complete file. At the output end, those compressors&amp;#39; output would be concatenated together, in the order of their original placement, back into a compressed file. The catch is, the decompressor must be able to understand a concatenated file and handle chunk boundary and assembling of pieces together. It turns out that for zip and bzip2, this works very effectively.&lt;/p&gt;
&lt;p&gt;DataRush library has built-in parallelization support, because that&amp;#39;s what DataRush is built for :). The splitting of data file into nearly equal chunks can be considered horizontal partitioning, and one simply need to round robin the pieces to the &amp;quot;worker threads&amp;quot;, dishing out as many chunks as there are workers at the same time, until the entire file is processed. To achieve best results, the number of partitions would roughly be&amp;nbsp;equal to the number of CPU cores on the system. To accommodate for memory usage and disk IO, a suitable chunk size is chosen, typically around 1 MB for bzip2, for example. Then, a Java IOStream based compressor (gzip from JDK, and bzip2 from Apache Commons Compression) can be working in parallel to produce the compressed output. At the output end, a roundrobin unpartitioner stitches the pieces together. By reusing existing compressors we adhere to standards and save efforts in avoiding &amp;quot;reinventing the wheel&amp;quot;.&lt;/p&gt;
&lt;p&gt;To profile the performance of the DataRush solution, a 2GB text file is compressed using 1 to&amp;nbsp;16 parallel processes on a 16 core x86 PC, with regular SATA hard disks connected to onboard controller. The execution times (including writing&amp;nbsp;the&amp;nbsp;360MB&amp;nbsp;compressed output to disk) are recorded. As a baseline to note, the single threaded bzip2 native binary took 9m18.839s = 559s to complete the process. Utilizing multiple cores and partitions, we were able to cut the compression time to 78 seconds. It&amp;#39;s noteworthy that perfect and ideal parallelization is not achievable because one still need to read the input file sequentially, and put together the partitions sequentially in the order of input.&lt;/p&gt;
&lt;p&gt;&lt;img style="WIDTH:600px;HEIGHT:400px;" src="http://cs.pervasive.com/blogs/datarush/compressiongraph.bmp" width="600" height="400" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;There is a performance hit from the Java Stream based implementation to the natively compiled C/C++ compressor. The former takes 652 seconds while the latter 559 seconds to compress the 2 GB data file, a 16.6% disadvantage. However, with the help of DataRush&amp;#39;s parallelization, we were able to make use of more CPU cores and achieve near linear scale up, to a much faster run time of around 78 seconds. Run times are listed here: &lt;img style="WIDTH:209px;HEIGHT:142px;" src="http://cs.pervasive.com/blogs/datarush/compressiontimes.bmp" width="209" height="142" alt="" /&gt;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=42207" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/parallel+bzip2/default.aspx">parallel bzip2</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/gzip/default.aspx">gzip</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/COMPRESSION/default.aspx">COMPRESSION</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/DataRush+library/default.aspx">DataRush library</category></item><item><title>Presentations Show Strength and Speed of DataRush Performance</title><link>http://cs.pervasive.com/blogs/datarush/archive/2009/11/10/presentations-show-strength-and-speed-of-datarush-performance.aspx</link><pubDate>Tue, 10 Nov 2009 22:38:00 GMT</pubDate><guid isPermaLink="false">3741b99c-ad24-4023-9eca-ddf558b8b674:42182</guid><dc:creator>livey</dc:creator><slash:comments>0</slash:comments><comments>http://cs.pervasive.com/blogs/datarush/archive/2009/11/10/presentations-show-strength-and-speed-of-datarush-performance.aspx#comments</comments><description>&lt;p&gt;It has been a busy few weeks for Pervasive DataRush.&amp;nbsp; First, we headed to Predictive Analytics World in Washington D.C. where our collaborative work with the University of Texas at Austin on predicting customer behavior for marketing and sales optimization was selected for presentation. Our presentation titled “&lt;a href="http://cs.pervasive.com/blogs/datarush/PAW-Poster.pdf" target="_blank"&gt;Churn, Baby, Churn: Fast Scoring on Large Telecom Dataset&lt;/a&gt;” received the “Best Title” award.&amp;nbsp;&lt;/p&gt;&lt;p&gt;Srivatsava Daruru, from UT Austin, presented&lt;span style="font-weight:bold;"&gt; churn prediction&lt;/span&gt; using a &lt;span style="font-weight:bold;"&gt;dataflow &lt;/span&gt;computational model comprised of 50,000 customers in a 1.6GB test set and 15,000 heterogenous variables.&amp;nbsp; The model used horizontal and vertical partitioning to compute many divergences in parallel, which demonstrated &lt;span style="font-weight:bold;"&gt;runtime scalability &lt;/span&gt;both with the number of cores and size of the input dataset.&amp;nbsp; “It’s impressive since it only takes 3 minutes on a commodity 16-core server, which equals a scoring runtime of 3.6 milliseconds per customer”, said Nena Marín, Ph.D., Chief Scientist for Pervasive DataRush.&amp;nbsp; &lt;/p&gt;&lt;p&gt;&lt;a href="http://cs.pervasive.com/blogs/datarush/SDC10559.JPG"&gt;&lt;img src="http://cs.pervasive.com/blogs/datarush/SDC10559.JPG" style="width:363px;height:272px;" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Pervasive DataRush also exhibited at PAW and showed how choosing DataRush to enable the next generation of analytics saves time and money.&amp;nbsp; As you can see in the graph below, the red line depicts using DataRush on commodity hardware and provides unprecedented results over competitors…&lt;span style="font-weight:bold;"&gt;up to 500x faster runtimes!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://cs.pervasive.com/blogs/datarush/WOWslide.jpg"&gt;&lt;img src="http://cs.pervasive.com/blogs/datarush/WOWslide.jpg" style="width:628px;height:470px;" border="0" alt="" /&gt;&lt;/a&gt;&lt;br /&gt;After PAW, it was a hop, skip, and a jump to &lt;a href="http://cs.pervasive.com/controlpanel/blogs/www.integrationext.com" target="_blank"&gt;IntegratioNext&lt;/a&gt; here in Austin, Texas.&amp;nbsp; Ryan Templeton expanded further on the graph above and explained how DataRush manages high data volumes with shorter runtimes using a dataflow implementation. &lt;br /&gt;&lt;br /&gt;Dr. Cosgrove, of &lt;a href="http://www.wellmax.com/" title="WellMax Center for Preventive Medicine" target="_blank"&gt;The &lt;span style="font-weight:bold;"&gt;WellMax Center&lt;/span&gt; for Preventive Medicine&lt;/a&gt;, also presented “Processing Large Datasets to Improve Lives by Identifying Onset of Disease before Symptoms are Present” at IntegratioNext, which highlighted collaboration with Pervasive DataRush to develop a &lt;span style="font-weight:bold;"&gt;predictive health application&lt;/span&gt; that identifies diseases before symptoms are present.&amp;nbsp; WellMax selected Pervasive DataRush because of our ability to analyze highly dimensional datasets quickly.&amp;nbsp; Dr. Marín says &amp;quot;The preliminary results presented were based on a “whole patient view” to derive insights into disease precursor patterns that help us live longer and better.&amp;quot;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Audios of all the IntegratioNext presentations should be available soon.&amp;nbsp; Please send an &lt;a href="mailto:info@pervasivedatarush.com"&gt;email&lt;/a&gt; if you&amp;#39;re interested in receiving audios.&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;img src="http://cs.pervasive.com/aggbug.aspx?PostID=42182" width="1" height="1"&gt;</description><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/dataflow/default.aspx">dataflow</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/churn+prediction/default.aspx">churn prediction</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/runtime+scalability/default.aspx">runtime scalability</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/predictive+health+application/default.aspx">predictive health application</category><category domain="http://cs.pervasive.com/blogs/datarush/archive/tags/WellMax+Center/default.aspx">WellMax Center</category></item></channel></rss>