Preloader Icon

Healthcare Data Quality Digest

#

Back to Home

Information Management: Digital Empathy, Bad Data, and More

December 6, 2023

By: Charlie Harp

Information Management in Healthcare

Information Management – Digital Empathy

If you are reading this blog three things are likely true. (1) you are somehow involved in healthcare IT (2) you can be classified somewhere on the information-nerd spectrum and (3) you are looking for some answers to the issues that plague our industry. If you fit into those three categories (as I do) you almost certainly realize that if our industry had a “coin of the realm” it would beinformation.

We collect it, receive it, store it, display it, send it, aggregate it and analyze it. So it is a safe bet to say that information is important to us. Unfortunately, it is also the bane of our very existence. Why? The answer is simple…Sometimes it lies to us.

Just as “good data” is useful and illuminating, “bad data” is noisy and deceiving.

在医疗保健,ν信息功能mber of roles. We have reference information that comes from elsewhere and is leveraged to define a portion of our application. We have master information, which is typically something we create and manage to define other, more local, portions of our application. We have instance information, which moves through and accumulates in our application. Regardless of which type of information you are considering, being able to identify which pieces of information are good and which are bad can dramatically affect the performance of your application.

While instance information is typically the most abundant form of information in an application, the impact of bad instance information is typically limited to the instance or the information it is directly related to. Reference information and master information, however, are typically what the instance information relies on to define its identity, classification and other critical meta-data that defines the path and patterns of the instance, and all other instances, through the application. As a result, bad reference or master information can have a significant impact on the performance of an application.

As we as an industry have tried to leverage our information in bigger and better ways it has created an awareness of the importance of data quality, data management and data governance. But before we can discuss any of these notions lets first ask ourselves a question: “What is the definition of bad information?” Assuming that if we can identify bad information and remove it, whatever remains will be good information.

For the purposes of this article we will consider the concept of bad information within a specific environment, a software application.

If you were to ask most people who is impacted by badinformation in a healthcareapplication they would likely respond with “the user”. This typically being a provider or someone who supports the care process. They might also respond with “the patient”. In this case the patient is a manifestation of the instance information so that is also a valid response. Some, hopefully the engineers, would also respond with “the application”, this is true and it is something people do not always consider.

I’m sorry Dave, I’m afraid I can’t do that

In a modern application the system does not just hold and display information to the user, it is also a consumer of information. In fact, the application itself is the most susceptible to the impact of bad information. It does not have the ability to independently question whether the information is good or bad – it must believe that the information is good in order to function. This should not be a surprise, the annals of science fiction are littered with the rusted corpses of robot villains that, when presented with the notion that their data was incorrect, tragically self-destructed with smoke billowing from their cooling ports and shuddering cries of “does not compute!”

My point is, there is a significant difference to what a human considers bad information and what a software application considers bad information.(There is more to be said about this but I am saving that for the next post)

In software design we spend a great deal of time trying to document and understand the players involved in the use of our solutions. This same consideration is rarely extended to the software itself. This can be a fairly significant oversight. This imbalance in consideration can result in a situation where we build an application that streamlines the entry of data for our user population that is totally useless when the software is trying to assist the provider with decision support or research. In order to trulymanage information, we need to understand and respect this dynamic. For lack of a better term let’s call this awarenessdigital empathy.

Digital Empathy:Understanding that a modern healthcare application is a legitimate consumer of all variants of information and must act on the available information in a literal and logical manner.

What are the major axioms of digital empathy? Let’s try to think like an application.

1. Words are meaningless

When I watched the Charlie Brown Thanksgiving special as a kid I would always be annoyed when anadult spoke to one of the peanuts kids. “Waa wa wah, wah wa wawa wah”. What the heck are they saying and why can’t I understand. This is what it is like for the application whenever someone enters free text information. Software relies on structured data sets, using terminologies in order to process information. Unstructured free text is just “Wa wah wawa wah” that it can store and display later for another peanut parent to interpret (not Pigpens parents – they were hauled off by social services).

2. Every term is sacred

There is a part of human brain called thereticular activating system或RAS。这个拉不断的机制pays attention to the world around you, tunes out the noise and alerts you when something needs your attention. Software applications do not have an RAS, so every piece of information is viewed as relevant unless there is specific logic or content that tells it otherwise. Part of making this leap in understanding is realizing thatALLthe information we feed software is important to the software, regardless of where it comes from.

3. Terminology matters

Software knows the codes systems that is knows. For an application to consume information, it can’t just be a structured code, it must be the code system the application is expecting. When I am orchestrating information across an enterprise consistency in terminology is huge. The exact same application, operating in multiple locations, with different local terminologies is not the exact same application. Also, in most applications, the ‘words are meaningless’ axiom applies to terminologies as well. The software pays attention to the code and where it believes it came from (the code system or dictionary). If you change the description on a code and, in doing so, change the meaning… guess what… “waa waah wawa wah wawa”…

There are innovative technologies,like our Symedical® platform, that help applications cope with these and other limitations that are common in healthcare applications. But even with that kind of advantage, possessing digital empathy is an important tool when you are trying to understand and isolate bad information.

Information Management – Bad Data

We have already discussed the following topics:

  • Information is important
  • There are different categories of information that commonly exist in a software application
  • Bad information impacts performance of an application
  • The scope of the impact varies based on the category of the information
  • Software is a consumer of information that is especially susceptible to the dangers of bad information. Information is something useful we create from data.

In this post I am going to focus on “bad data”, the forms that it takes and how we can cope with it today and in the future. First of all, let’s set the context. We are going to be talking about “badness” relative to our metaphorical software application. As was discussed in the last post, human beings have a great capability for coping with bad data but lack the speed required to process information on the scale we require.

Here are the categories of bad data that I am going to cover in this post:

Other Data

Other Data

这是数据的编码系统pplication does not understand. This almost always comes from elsewhere through some interface. It can be a standard that the application does not use or proprietary data from the source that sent it. This is the easiest category of bad data to cope with (Yes really). Think about it, we know the source, can isolate the point of entry, all it requires is semantic interoperability. Establishing a mechanism at the point of entry into your application that can evaluate incoming “foreign” terms and reconcile them to the applications native dictionaries. It will require some elbow grease initially, but a smart platform should be able to do a good bit of the heavy lifting and learn over time. It is worth mentioning that other data can also fall into the categories that follow – even after they have been reconciled.

Disemvoweled Data

Disemvoweled Data

This is data, whether it comes in as “other data” or it was spawned in the recesses of your own application, which are barely human readable due to missing vowels or truncated words. Some of our application are long in the tooth… They were created many years ago when we had database, screen and paper report field size limitations. Making a complex notion fit in 20 characters is more art than science and it heavily relies on our amazing brains to convert them to something useful. A good semantic platform can do a lot to turn this into good data but the best policy, especially if you control this data, is to undergo a data quality initiative. Review proprietary terms and fix them at the source. Not only will it help the software leverage the data, it may stop a less flexible human brain from misinterpreting it.

Impostor Data

Impostor Data

Most standard terminologies have something called a “code system ID”. This is a code that uniquely identifies a given terminology. If I have a source code for a term and a code system ID it should universally identify the term. This means if I have mapped that code system ID + source code already I am good to go… Right?

Most of the time I would say “yes”, however, there are some applications where misguided users have the ability to change the term description for a given source code either in the application dictionary OR on the instance data for a patient. This is particularly bad because (1) it is hard to spot and (2) the application will always assume the term is what the code system and source code says it is. A semantic platform can help find and reconcile these terms against a standard but the trick is what to do with them. If your application is the source and the terminology is supposed to be a standard then a data quality initiative is necessary. If the data comes from somewhere else and you identify it as an impostor then isolation is probably necessary. The problem is you don’t know if the user selected the code or the term so there is no way to know for sure whether the code or the term is what was intended. This can also happen with proprietary local terminologies as well. It is important to have a policy of stability – the meaning of a code NEVER changes.

Wrong Data

Wrong Data

No matter how good your interfaces and policies are there is always the chance that a user will associate data with a patient that is just not correct. This is not an interoperability problem or a term quality problem but it is a data quality problem. Identifying wrong data is something humans do all the time. When you look at your clock and it says it’s 2am but its light outside you know that you are dealing with wrong data. There are analytical mechanisms that we put in place that evaluate a cluster of data and suggest that something isn’t quite right with this picture. While this is not always considered under the heading of data quality or data governance it still has significant impact of the quality of our data. The best approach for dealing with this is at the source but ensuring that accurate data is entered is tricky, for a number of reasons. Having a mechanism that periodically reviews instance data for contextual appropriateness would be a nice fall back, but that is also tricky. Once potentially wrong data is identified a human must be involved to be the final arbiter of its fate. Another important aspect of wrong data is making sure that, after it is removed, it does not get re-introduced accidentally.

Missing Data

Missing Data

That’s right, no data can also be bad data. Data that is missing is as bad as data that is wrong or unusable. If I am trying to create a program for managing my diabetic patients it is important to know who they are. If a diabetic patient is not coded as such it will be that much more difficult to include them in the program. This is similar to “Wrong Data” in that it requires a mechanism that looks at a cluster of instance data and asks the question “what’s missing?” If a data gap is detected, like with wrong data, a human should be notified and they should determine if the introduction of the missing data is appropriate.

Old Data

Old

Time marches on and as it does it has an impact on data. This category effects both instance data and reference data. For example, the fact that you had a broken arm when you were six may not be relevant now that you are 30… Knowing the shelf life for a piece of episodic data and whether or not to bring it into the current context for a patient can help reduce the noise and provide better results. Likewise, using outdated reference data can also be problematic, especially if that data is used to support clinical decision support.

Duplicated Data

Duplicated

Now that we are sharing data (feel the love), we are susceptible to a new type of problem. We are likely to receive data that is a near duplicate of the data we already have or that we have also received from somewhere else. Being inundated with a plethora of data doppelgängers creates the risk of our instance data becoming a data junk drawer. It is not likely that the patient is taking Coumadin, Warfarin and Simvistatin simultaneously. We will need to evolve coping mechanisms to deal with this problem and synthesize a clinical summary for the patient’s current state before too long.

Uncoded Data

Uncoded

Often referred to as “free text terms” this is typically what we find when an application has an “other” selection and the user gets to fill in the box. It could also be from an older application that does not use terminologies for some master data elements. The best option is to deal with this at the source. Even if you allow users to create “missing” local terms on the fly, it would provide an architecture to reconcile those terms in a meaningful way. If that is not an option, a mechanism that assess the text and reconciles it to terminology could increases the likelihood you could make use of this type of data.Natural Language Processing (NLP)solutions struggle in this use case, primarily because rarely is free text in the form of natural language, but there are approaches that can provide a fair degree of success.

Establishing a Data Quality Strategy

If we want to improve our ability to leverage our data we must undertake action to cultivate better quality. Understanding the forms of bad data enables us to formulate short term tactics and longer term strategies for more meaningful, high fidelity data. I have shared my thoughts, please feel free to share yours. Is there a category that I missed? Do you have any good “ugly data” stories?

There is a lot of talk aboutdata governance in healthcare. The question is, will a rigid centrally controlled process work in our industry when it comes to master data and reference terminologies or is there an alternate approach? The next post will be about a new way of approaching data quality across a distributed enterprise, the information ecosystem.

Charlie Harp

Charlie Harp is founder and CEO of Clinical Architecture, an industry-leading healthcare data quality solutions provider. Charlie has over 35 years of experience as a healthcare software engineer focused on creating tools to better utilize and understand data. He led his team to develop the first deterministic algorithm-based engine for automating semantic interoperability. Charlie often speaks on clinical data quality and usability and is host of the Informonster Podcast.

Stay Up to Date with the Latest News & Updates

0 Comments

Submit a Comment

Your email address will not be published.Required fields are marked*

Share This