The DataRush team just returned from attending the HIMSS 2010 conference in Atlanta (March 1-4th).
HIMSS is traditionally a large conference. As massive as the Atlanta convention center is, this year once again HIMSS filled the venues. Total attendance on March 2nd was recorded at 27,451 with 13,530 as exhibitors. This years’ theme was “Health IT (HIT) making a difference in patient care”. The DataRush team staked its camp at booth#3325 where it visited with presenters, vendors and attendees discussing large data pains and synergies between interoperable health IT and the DataRush parallelism engine.
Day 1: Started early with a lunch briefing with a global leader IT services. The architect of the largest healthcare informatics data warehouse shared his experiences with content management and searchable health information. Performance bottlenecks are common in data warehousing (DW) and BI data loading and consumption. Additionally, consolidation of data from up to 40 member companies had its challenges with speed and accuracy of data de-duplication. In reference to DW loading, Pervasive DataRush offers a platform that leverages multi-core technology enabling unprecedented data throughputs. In response to data quality and consolidation, Pervasive DataMatcher combines the power of DataRush parallelism engine with sophisticated parametric string matching algorithms to provide the highest precision (fidelity) and recall (completeness) in fuzzy matching.
Once the exhibitor’s area opened midmorning, DataRush booth featured live demos of Pervasive DataRush Analytics plug-ins for KNIME. KNIME (Konstanz Information Miner) is an open source data mining tool. The upcoming DataRush release 5.0 provides KNIME nodes that offer DataRush powered versions of several data mining algorithms. The KNIME interface provides a user friendly workflow-like environment for data miner practitioners to rapidly orchestrate and deploy predictive analysis. Healthcare system integrators delivering EMR and EHR based decision support are turning to predictive analytics and data mining. Pervasive DataRush speeds and accuracy combined with the KNIME interface enables instant processing and response from Tera-scale heterogeneous data repositories.
Day 2: Booked back-to-back with briefings to target specialized areas in Healthcare data exchange, interoperability, network support and cyber security. Healthcare data exchange deals with transformations between End-User formats and ANSI EDI, HL7, etc. standards. These transformations are facilitated via ETL tools. Pervasive DataRush powered schema-to-schema transformations supports healthcare standards as defined by the American National Standards Institute (ANSI).
Within cyber-security, Pervasive DataRush presented results from a benchmark study based on Malstone algorithm for web exploits. The Malstone DataRush implementation is capable of processing 1Tb of web log files per hr. Furthermore, DataRush is 12 times faster on 19 less computers when compared to the Malstone algorithm implementation using MapReduce on a 20 node cluster. Customers requiring enhanced cyber security discussed their bottlenecks in vulnerability assessments. An important task in cyber security involves analyzing large volumes of network data against databases of Common Vulnerabilities and Exposures (CVE) software flaws, Common Configuration Enumeration (CCE) mis-configurations, etc. and then scoring them based on National Institute of Standards and Technology (NIST) severity scoring system for vulnerability. The Common Vulnerability Scoring System (CVSS) provides an open framework for communicating the characteristics and impacts of IT vulnerabilities. DataRush offers data parallelism where subsets of network data can be concurrently read in blocks and processed. The processing here refers to concurrent computation of CVSS core equations for “AccessComplexity”, “Authentication”, “AccessVector”, “ConfidentialityImpact” and “AvailabilityImpact”. The final step in the processing pipeline is to aggregate these scores to produce the CVSS “BaseScore”.
Regarding network support and interoperability, discussions were centered on DataRush implementations of parallelized encryption and decryption algorithms as well as baseline change detection algorithms to detect anomalies in network log files. These are novel applications of DataRush driven entirely by customer need.
Day 3: Had the most traffic through our booth. Fraud, waste and abuse in healthcare came up frequently among vendors and attendees. The Pervasive DataRush team featured live demo of the Benford’s Law Fraud Detection operator recently added to the Pervasive DataRush-Analytics library and the DataRush KNIME plug-ins package. Implementation and application of this operator is explained in detail in my blog titled “Finding a needle in a hay stack”, click here to read. Other topics of interest at HIMSS included Image Processing and Image Content Management. Image management solutions today use picture archiving and communication systems (PACS). PACS is a combination of hardware and software dedicated to the short and long term storage, retrieval, management, distribution, and presentation of images. There are a lot of inefficiencies in image processing pipelines that could benefit from DataRush parallelism. Higher quality, efficient packaging and faster delivery of images results in earlier diagnosis and characterization of disease with the potential to improve patient outcomes and reduce costs.
To summarize, we have found healthcare IT to be a great fit for DataRush parallelism engine and its analytics operator library. HIMSS was a successful event for Pervasive DataRush Team in the number of leads generated, the networking opportunities and the flow of ideas in the application of DataRush to operations and analytics in healthcare.