Healthcare’s Unstructured Data Problem

June 26, 2016

Now that healthcare has made significant gains in transitioning patient medical records to a digital format, one of the industry’s newest hurdles is finding ways to leverage the bulk of patient data that doesn’t easily lend itself to a discrete data schema.

Popular consensus holds that only about twenty percent of health data falls under the umbrella of structured data. While that small sub-set of granular and easily categorized data has helped drive tracking and reporting in healthcare – particularly as it relates to the recently dismantled Meaningful Use program – it leaves a sizeable pile of unstructured patient data out of the clinical analytics picture.

Narrative format data like the unique aspects of care episodes or patient history that get captured during visits and the communicative overviews that are documented and shared between departments and care settings comprise the remaining eighty percent of unstructured patient data.

Given provider preference for communicative patient engagement over bed-side screen-time, the surplus of narrative data in clinical documentation isn’t particularly surprising. Narrative synopsis can offer a condensed view of patient data that’s less difficult and time consuming to process.

What does seem paradoxical, however, is research suggesting that clinicians are prone to gloss over narrative data during care episodes. While that qualitative data is information rich, the research points to “difficulties locating notes in the EHR” due to inconsistent placement of “non-standard” data, and persisting physician preference for verbal communication over documented notes as potential barriers to better use of narrative data at the point-of-care.

The question then becomes how do you leverage provider-preferred verbal communication in a digital environment in a manner that makes that information more useful?

Some progress has been found in the field of Natural Language Processing (NLP) in healthcare. Advancements in the digital capture and tagging of verbal notes has helped the healthcare IT industry meet providers half way. Even with the global NLP market anticipated to grow to $2.67 billion by 2020, though, the problem of better narrative data accessibility within the EHR persists.

Keyword extraction and in-body search mechanisms that help providers quickly comb narrative text for keyword markers in the EHR are needed to help advance initiatives related to the better use of unstructured data in healthcare.

2013 saw the introduction of QPID (Queriable Patient Inference Dossier), a new translation engine with the ability to parse, tag and index data from unstructured text. Several thought leaders in the health data space point to metadata as the next step in advancing unstructured data processing. Adding a layer of summary metadata to contextual, unstructured data elements helps pare text blocks down to their most crucial elements and index that information in a way that makes it suitable for search and analysis.

Inferences are only as good as the data you have to draw them from. All of the work related to making unstructured data more digestible in clinical analytics – from NLP advancements to a broader embrace of metadata – will help healthcare prep patient data in anticipation of the next hurdle the industry faces, which is finding the signals, patterns and insights buried in that data. Industry stakeholders are now turning a keen eye to machine learning as healthcare’s next promising frontier.