Pervasive DataRush

This blog is syndicated from the Pervasive DataRush site.

HIMSS 2010 - DataRush in Health IT making a difference in patient care

The DataRush team just returned from attending the HIMSS 2010 conference in Atlanta (March 1-4th).

 

HIMSS is traditionally a large conference. As massive as the Atlanta convention center is, this year once again HIMSS filled the venues. Total attendance on March 2nd was recorded at 27,451 with 13,530 as exhibitors. This years’ theme was “Health IT (HIT) making a difference in patient care”. The DataRush team staked its camp at booth#3325 where it visited with presenters, vendors and attendees discussing large data pains and synergies between interoperable health IT and the DataRush parallelism engine.

Day 1:  Started early with a lunch briefing with a global leader IT services. The architect of the largest healthcare informatics data warehouse shared his experiences with content management and searchable health information. Performance bottlenecks are common in data warehousing (DW) and BI data loading and consumption. Additionally, consolidation of data from up to 40 member companies had its challenges with speed and accuracy of data de-duplication.  In reference to DW loading, Pervasive DataRush offers a platform that leverages multi-core technology enabling unprecedented data throughputs.  In response to data quality and consolidation, Pervasive DataMatcher combines the power of DataRush parallelism engine with sophisticated parametric string matching algorithms to provide the highest precision (fidelity) and recall (completeness) in fuzzy matching.

Once the exhibitor’s area opened midmorning, DataRush booth featured live demos of Pervasive DataRush Analytics plug-ins for KNIME.  KNIME (Konstanz Information Miner) is an open source data mining tool.  The upcoming DataRush release 5.0 provides KNIME nodes that offer DataRush powered versions of several data mining algorithms. The KNIME interface provides a user friendly workflow-like environment for data miner practitioners to rapidly orchestrate and deploy predictive analysis. Healthcare system integrators delivering EMR and EHR based decision support are turning to predictive analytics and data mining. Pervasive DataRush speeds and accuracy combined with the KNIME interface enables instant processing and response from Tera-scale heterogeneous data repositories.

Day 2:  Booked back-to-back with briefings to target specialized areas in Healthcare data exchange, interoperability, network support and cyber security. Healthcare data exchange deals with transformations between End-User formats and ANSI EDI, HL7, etc. standards.  These transformations are facilitated via ETL tools.  Pervasive DataRush powered schema-to-schema transformations supports healthcare standards as defined by the American National Standards Institute (ANSI).

Within cyber-security, Pervasive DataRush presented results from a benchmark study based on Malstone algorithm for web exploits. The Malstone DataRush implementation is capable of processing 1Tb of web log files per hr. Furthermore, DataRush is 12 times faster on 19 less computers when compared to the Malstone algorithm implementation using MapReduce on a 20 node cluster. Customers requiring enhanced cyber security discussed their bottlenecks in vulnerability assessments. An important task in cyber security involves analyzing large volumes of network data against databases of Common Vulnerabilities and Exposures (CVE) software flaws, Common Configuration Enumeration (CCE) mis-configurations, etc. and then scoring them based on National Institute of Standards and Technology (NIST) severity scoring system for vulnerability.  The Common Vulnerability Scoring System (CVSS) provides an open framework for communicating the characteristics and impacts of IT vulnerabilities. DataRush offers data parallelism where subsets of network data can be concurrently read in blocks and processed. The processing here refers to concurrent computation of CVSS core equations for “AccessComplexity”, “Authentication”, “AccessVector”, “ConfidentialityImpact” and “AvailabilityImpact”. The final step in the processing pipeline is to aggregate these scores to produce the CVSS “BaseScore”.

Regarding network support and interoperability, discussions were centered on DataRush implementations of parallelized encryption and decryption algorithms as well as baseline change detection algorithms to detect anomalies in network log files.  These are novel applications of DataRush driven entirely by customer need.

Day 3:  Had the most traffic through our booth.  Fraud, waste and abuse in healthcare came up frequently among vendors and attendees.  The Pervasive DataRush team featured live demo of the Benford’s Law Fraud Detection operator recently added to the Pervasive DataRush-Analytics library and the DataRush KNIME plug-ins package. Implementation and application of this operator is explained in detail in my blog titled “Finding a needle in a hay stack”, click here to read.

Other topics of interest at HIMSS included Image Processing and Image Content Management. Image management solutions today use picture archiving and communication systems (PACS).  PACS is a combination of hardware and software dedicated to the short and long term storage, retrieval, management, distribution, and presentation of images. There are a lot of inefficiencies in image processing pipelines that could benefit from DataRush parallelism. Higher quality, efficient packaging and faster delivery of images results in earlier diagnosis and characterization of disease with the potential to improve patient outcomes and reduce costs.

To summarize, we have found healthcare IT to be a great fit for DataRush parallelism engine and its analytics operator library. HIMSS was a successful event for Pervasive DataRush Team in the number of leads generated, the networking opportunities and the flow of ideas in the application of DataRush to operations and analytics in healthcare.

 

Comments

No Comments

About n5712036

Dr.Nena M. Marín joined Pervasive Software Innovations Laboratories (iLabs) in September of 2008. Her research efforts in iLabs focus on Parallel Data Mining algorithms and their applications in business and science. Dr. Marín’s research interests include data intensive high performance computing, mathematical modeling and simulations of physiological systems, spectral pattern recognition for disease detection and drug delivery, bioformatics and Monte Carlo simulations in tissue photonics. Her most recent industry research interests include patterns in large scale and sparse datasets, clustering and unsupervised learning, collaborative filtering recommender systems and Marketing and Sales Optimization Churn analysis. Dr. Marín’s most recent work entitled “Pervasive Parallelism in Data Mining: Dataflow Solution to Co-clustering Large and Sparse Netflix Data” has been selected for presentation at the Knowledge Discovery and Data Mining (KDD) Conference July 2009 in Paris, Fr. She leads collaborations with Academic Partners focusing on bringing the power of commodity multi-core and parallel architecture into the hands of researchers to accelerate delivery of science. Dr. Marín is a National Science Foundation Fellow. After attaining both a Bachelor of Science Degree in Mechanical Engineering in 1984 and a Masters Degree in Mechanical Engineering in 1995, at the University of Texas at Austin, Dr. Marin was bestowed her Ph.D. in Biomedical Engineering at the University of Texas at Austin in 2005. Her Ph.D. research was funded by the National Institute of Health Program and focused on pattern recognition and automated data mining algorithms for cervical cancer detection. Dr, Marín worked as part of a multidisciplinary team in a Phase II Clinical Trial conducted at M.D. Anderson Cancer Center and the British Columbia Cancer Center in Vancouver, Canada.