The Informonster Podcast
Episode 5: Charlie Harp Discusses Terminology, from its Origins to Why it Matters in Healthcare IT
January 20, 2020
In this episode of the Informonster Podcast, Charlie Harp tackles the concept of terminologies in healthcare, the many places you find them and their relevance to the enterprise, our industry, and you.
Have a question or topic idea?
Get our News and Updates
Get notified about new podcast episodes, upcoming events and webinars, and more!
I’m Charlie Harp and this is the Informonster podcast. On this episode of the Informonster podcast, we’re going to talk about terminologies. Now if you’ve been working in healthcare it for any period of time, I have no doubt that you have run across a terminology or two. In fact, you’ve probably run across some things you thought were terminologies that weren’t, and some that you didn’t recognize as terminologies, but they actually were. And that’s what we’re going to talk about in this episode.
What is a terminology? What makes it tick? How can you tell if it’s a good terminology and how can you tell if it’s a not so good terminology? And one of the main takeaways I wanted to have from this episode is that terminologies are everywhere and a lot of them are not high quality. A lot of them are not beautiful, but it doesn’t change the fact that they’re there and that they have to be dealt with.
Let’s begin by taking a look at the origin of the word “Term”. “Term” was first used in the mid to late 11th century in France. The early semantics of the word related to time. It wasn’t until the late 14th century that “Term” was used in the ways we mean it today, where it means a word or a phrase that is used in a precise or limited sense. So, understanding that a terminology is a collection of these terms (and these terms are things that are words or phrases that are used in a precise or limited sense), when is a terminology not a terminology? Well, let’s start with a few of terminology’s first cousins: vocabulary, dictionary, lexicon and nomenclature. Many people use vocabulary and terminology interchangeably, but there is a difference. A vocabulary is defined as a collection of words or phrases and their explanations. The addition of the explanations means that vocabulary at its core has a terminology, but it also has a lot more; it’s like a smarty pants terminology. Similar to that is the definition of dictionary. It’s almost identical to the definition of a vocabulary except when it comes to the scale, where a dictionary is defined as a collection of all of the principle words of a language, their meanings and other information concerning them. So you can think of a dictionary as a vocabulary on steroids.
下一个是词汇。词典上分解into the meaning “word book”, and it’s typically a complete set of the meaningful units of a language and their meanings. For all intents and purposes, it’s kind of synonymous with the dictionary. The last one on our list is nomenclature. A nomenclature, as the name implies, are primarily concerned with the naming of things in a particular domain. There are nomenclatures for naming plants, organisms, planets, and in healthcare we have one called the unified code of units of measure or UCUM. Nomenclature is essentially rules that guide the creation of terms within a given domain. So you can almost think of a nomenclature is something that could give birth to a terminology.
Now that you’re in the, what should you do the next time someone around you uses one of these variants instead of saying the word terminology, when they actually mean terminology? My advice is just let it go. Don’t be that person. Just feel confident in your own expertise and always keep in mind: I’m no PhD in informatics so there’s always a chance that I and the whole internet could be wrong. So we’ve covered the immediate family of terminology cousins. What about the other buzzwords that are often misconstrued? Well, many of the items I’m about to discuss are things that are constructed using terminologies and as such can be confused with them. The first is ontology and it’s posse of specializations: taxonomy, hierarchy and classification.
Now here at Clinical Architecture we call ontology the “O word” and we actually have a special swear jar in the kitchen that you have to put a dollar in every time you say it. Its specializations are constructs intended to express beliefs about the relationships between terms in one or many terminologies. An ontology cannot exist without a terminology or terminologies at its core, but a terminology is not an ontology by itself. In addition to the “O” thing, you have subsets and value sets. Now both of these are specializations of terminologies and essentially establish collections of terms to fit a given purpose. The main difference between the two is that a value set can have terms or members from multiple terminologies and a subset is, well, a subset of terms from a single terminology. Now, I’m going to stop there and save the behemoths of the informatics world, the thesauruses, for another episode and move back in order discussion about terminologies.
Before I continue though I do have a confession to make. The terminologies that we are leveraging in Healthcare IT are not really just terminologies at all. There’s something called a code system. Now for all intents and purposes, a code system is just a terminology where every single term has a unique code. It, as the name implies, is a system of codes that represents the terminology. Now we need unique codes so that the software in healthcare can leverage the terminologies that we use to describe the patient, their clinical situation, our organizations and everything we care about.
我们这样做是因为软件需求结构化数据to meaningfully reason, report, and react to what’s happening. So when you look at any given system in healthcare, and we have quite a few, you have to remember there are different types of terminologies or code systems. There are standards like SNOMED, LOINC, RxNorm, CPT, things that are curated and distributed to people across the continuum of healthcare. And we tend to think of those as terminologies. And when we talk about terminologies and good vocabulary practices, those are the things we tend to focus on. But the truth is they’re actually kind of the exception in healthcare.
If you look at the systems that we’ve built, whether it’s an EHR or claims processing engine or an inpatient system, these systems all have tables upon tables of enumerated codes, they call them dictionaries, that they use to drive all kinds of processing. We need to interact with those, too. We can’t ignore those. Those are also terminologies and I tend to think the difference between them as being, we have deliberate terminologies where people sit in a room and contemplate the universe and they create concepts with representations and attributes, and then we have the data dictionaries, the tables, the pick lists that are created in these applications, that are kind of accidental terminologies. They’re still terminologies. You still have to deal with them. The software is still using them. They may not be great. What I mean by that is when you go about creating a terminology, think of a term as a promise. When you create a term and you give it a code, you’re kind of telling the software that “I’m going to create this thing and I’m going to give it a number 12 and whenever you see the number 12 it’s always going to mean this thing.” That’s called a stable identifier. The identifier always represents the thing that it was created to represent. Down the road, you’re not going to change it up, “and number 12 represented Coumadin on Monday and on Thursday it represents a banana.” the ability to keep terminology stable is really based upon the rigor of the people that are building and maintaining the terminology.
当你’re dealing with a deliberate terminology, you usually have informatics folks that are actually curating that terminology and they know that if they go about changing something, they really have to retire or deprecate a code and replace it, so that they don’t break that promise to the system that this number will always mean this thing. When you have people that are building accidental terminologies, they don’t always understand the ramifications. They just want someone to be able to pick banana and “nobody picks Coumadin anymore, so I’ll just change it up.” When that happens, what you’ve got is you’ve got a terminology that shifts underneath the software and that can wreak all kinds of havoc, but it’s still a terminology. It still needs to be monitored. It still needs to be managed. There’s a lot of these accidental terminologies that come from elsewhere and when they come from elsewhere, you’re responsible for understanding what they mean when they get inside your house. So when you get a code from another system, you have to stabilize it, you have to monitor it. You don’t know whether that term is going to be stable or unstable, but once again, it’s still a terminology.
And that’s another point I want to make. You can’t talk about terminologies and look to the standards. You have to remember that you have to grapple with, whatever terminology or code system is going to come your way, stable or unstable. Speaking of which, there’s also another type of terminology which is actually more like a terminology than a code system, and that’s what I call uncoated discrete text. This goes by many names, (like) free text terms (or) discrete terms. But what it really is is a collection of words or phrases that do not have a unique identifier. And it could be just a list of free text allergies. It could be a list of reasons why somebody didn’t come back. It could be any number of things. The main thing to keep in mind is just because something doesn’t have a separate code doesn’t mean that you can’t use the term itself to identify the code.
For all intents and purposes, what is a code? A code is just a string of alphanumeric characters, yeah? If I just have the word Coumadin, then for all intents and purposes, the word can be its own discrete code. That makes life a little interesting when you’re trying to track changes over time because, if the term itself is its own unique identifier, changing the term changes the unique identifier and therefore creates another unique identifier. It makes the whole tracking and managing of a terminology in motion a little bit different; in some ways it’s a little bit easier. Nothing ever changes. Things just don’t get used anymore and new things spring into being that weren’t there before because somebody changed Coumadin to Coumadin 2. The other thing to keep in mind, when you’re dealing with a terminology, is there’s a whole disparity between what is a term and what is a concept.
When I look at the terminologies we use in healthcare, I tend to be very pragmatic about it. If you look at something as a concept, there’s all kinds of “meaning” behind that, pun intended, because if something’s a concept, the concept itself should be unique in the terminology. I shouldn’t have to Choleras. I should only have one because if I’m building a conceptual terminology, what I’m really saying is I’m going to create one thing for each concept, whereas a terminology doesn’t so much care about that. A terminology is a collection of words and phrases, as opposed to a conceptual terminology, which is really supposed to be a list of unique concepts. I find that you can always treat a list of concepts as a terminology and be safe, but you can’t always take a terminology and treat it as concepts and be safe because your terminology may contain things that are conceptually duplicated.
这是另一件事。当你put yourself in kind of a purist bucket and you talk about concepts, you create a lot of extra rules and rigor that are not always sustainable. Now, if you’re working with something that is truly a conceptual terminology or a conceptual set of things, then that’s fine, but I find that those things are relatively rare and even when you look at something like SNOMED, which is about concepts and representations, which are the descriptions or the different ways of describing a concept or the representations or a terminology, and the concepts have no name, they’re just an idea. But I also find that that’s not a very practical way to approach the world because we, as humans, we like to see the words and associate them with things. What you want to call the concept and what I want to call a concept are not the same because we operate with different terms. You see what I did there?
留出整个concept-term件事,(因为se, once again, I’m not necessarily qualified to debate that at length. I know people that are.) the other thing that’s worth talking about is the difference between attributes and relationships. We talked a little bit about the O word and relationships between terms, but the other thing is attributes, and attributes are things that tend to come along with the term, not enough to make it a vocabulary in the classical definitional sense where it does initiative to tell you the meaning of the term, but it does tell you something about the term that might be useful. In things like SNOMED, there’s a handful of attributes like its semantic tag – What kind of thing is it? – In LOINC there’s a whole collection of attributes which tells you the category it’s in and its example units.
在RxNorm,有事情的集合attributes: Things that describe the mechanical nature of the term or the utility of the term, but not necessarily the definition of the term. In fact, there’s a lot of value sets and terminologies and code systems where they have a bucket that says description. That description bucket is filled in, I don’t know, maybe 30% of the time. It’s like they really want to be a vocabulary, but it just takes too much energy to fill in all those descriptions. So, we’ll just leave it at a code system or terminology. The other thing about terms themselves, I mentioned them a second ago relative to SNOMED, is representations. Representations are just words, variations on the word that describe it. We tend to call them aliases, but basically they are other descriptions that mean the same thing. Those tend to go along with things that are conceptual in nature because a concept is kind of an abstraction and, on top of that abstract concept, I can give you a description that’s the provider preferred description in English, the patient friendly description in Spanish, an alternate for Dutch, etc. I can give you all kinds of representations and sit on top of that abstract notion and what it really turns into is almost a linguistic thesaurus of sorts. It basically says, “All these things are really describing this concept and this concept has a unique identifier.”
The representations also have a unique identifier because software likes codes. Once again, those conceptual things are the exception. Most of the things that you’re going to run into in the trenches of healthcare are terminologies. Some can be rough and tumble, some are accidental, some are actually very well maintained. Very few of them are pure. So, let’s talk a little bit about some of the types of things you’ll see in the terms themselves, and then I’m going to talk a little bit about codes. The terms themselves, so, I’ve been at this a while and I’ve been at this long enough to remember using audio tapes and computers – I’m not a punch card guy – but using audio tapes and computers where it takes you two and a half hours to load a bowling program only to find that it doesn’t work.
Really, we’ve had a pretty interesting change in computing power both in terms of the ability to display information and to store information. And so, I remember a day when you had limitations on how long a description could be and a lot of those systems are still out there today where you have a 40 character limit. These types of limitations on the terms themselves often are derived from being able to fit it on a paper report, being able to fit it in a particular field, or having it show up in a particular display metaphor, but the bottom line is when you start looking at terms they’re not necessarily linguistically pure. You’ll see a lot of truncation, you’ll see a lot of vowel removal to shorten things to get it to fit. That doesn’t mean they’re not terminologies. They’re still terminologies, they just aren’t terribly pretty.
People that have been using them for the last decade are probably pretty adept at making the most out of them, but it’s also not unusual for you to have this limited set of space that results in a less than high quality representation, or a term that shows up in a lot of these local dictionaries, that you’re going to have to interact with in order to accumulate, or do anything meaningful, with all that data that is likely coming to you from different places. When you’re thinking about a code system (and remember the code system is a code and it has a term), in a lot of these systems, the codes run the gamut, in terms of how they’re structured. There are code systems where the code is just an integer: They have a number wheel and when they created the first code, it was one, and the second code was two, and now they’re at code 5,382 the integer code systems are fine.
I personally believe it’s better to have a dumb number than something meaningful and I’ll talk about that in a minute. It also isn’t terribly informative. You’ll see a number one, you’ll want to what that means. If people want to have a shortcut, people start memorizing numbers, and there’s a lot of users out there who memorize a lot of numbers. That’s one type of identifier and a lot of us in the engineering world are big fans of the integer. It’s hard to beat a good integer. Another thing that people have done, when it comes to code systems in healthcare, are mnemonic. I worked in the lab industry for many years and pneumonics were a big deal. You create a unique mnemonic and it’s usually alphanumeric and it usually is some kind of a pigeon abbreviation of the term itself. The value of mnemonic is people can remember it. People can’t always remember a six digit integer, but they can remember that “acet” is the mnemonic for acetaminophen and “acet100” is the mnemonic for an a hundred milligram acetaminophen. Mnemonics are useful as long as the mnemonic makes sense relative to the term. Really, what you’re doing is you’re creating a human-readable shortcut that gets you to the term, and it can get repetitive and a little confusing.
So, it’s one of those things that’s been around for a long time. I know there’s a lot of popular systems that use them and I don’t really have anything negative to say about them, other than they’re not really unique in the world. Sooner or later, we live in a world today where I should be able to do a search and find what I’m looking for without using a mnemonic to get to that shortcut. That to me is kind of an anachronism, but I understand why it brings people comfort. The next thing is a smart code, or a hierarchical code. There’s a whole collection of these. There’s a lot of compendia that have these, where the first two bytes mean something, the second two bytes means something, the third two bytes means something, etc. A good example of that is the NDC code. The NDC code, in the drug world, is an 11 digit identifier that starts out with a code that represents the, essentially, it used to be the manufacturer or the repackager, and then it has codes that identify the product, and ultimately codes that identify the package size.
Theoretically, if you see the code, you can understand what that thing is, just by interpreting the code. ICD10 PCs is like that. It’s a series of two-byte codes that indicate the things that describe the nature and the hierarchy of where the code is. The problem with these is you tend to have limits within two bytes, so you can run on a codes and then the wheels come off cause now you’re just making stuff up. When you run out of two-byte combinations, you’re winging it. You’re creating extensions to that two bites segment. The other thing that happens is, like with NDCs, manufacturer sells off a bunch of products. Well guess what? That two-byte code is now meaningless because it’s not that manufacturer anymore. Smart codes are good and were especially good in the old days. You would use segments of that smart code to create logic to create reports and analytics, as opposed to creating a true hierarchy, or a conceptual hierarchy. So, those are all things, the integers, the mnemonics, smart codes, are things that have been around for a long time.
Today, we tend to use different things like Universal Resource Names, or URNs. These are codes that have meaning, maybe even point to someplace so you can get even more meaning. You could also have codes that are semi-smart codes, where you have a namespace in the code and then you have a dumb number. So, the namespace portion of the code, whether it’s alpha or numeric, is really just there to allow you to create things and recognize they came from someplace, and everything after that is just a dumb number, and, ideally, something where it’s a pretty big number, so that you’re never going to get to the point where you run out of space because, if you have a code that’s structured and people have planned on that, it can create data headaches for people that are trying to consume the terminology. The last thing that I’ll throw out about codes is Global Unique Identifiers.
Now, I’m a big fan of a Global Unique Identifier. It’s basically a code that is generated based upon the mac address of the PC that’s doing the generation and a timestamp, and a bunch of other data. And the net result is you can create a code that is globally unique, hence the name Global Unique Identifier. The benefit of that is, if I create a code on my computer at 12 o’clock and you create a code on your computer, unlike code systems where I’m using integers or pneumonics, I never have to worry that you and I are going to create the same code. My code for my thing is unique, globally, everywhere. Now, when you create a Global Unique Identifier, it is a big, ugly, complex alphanumeric thing – a GUID. I know some pretty smart people, but I don’t know how many people are memorizing GUIDs like they were mnemonics. It’s one of those things where you really have to kind of break your brain around this idea that, “I’m going to remember that number. I’m going to remember that.” If you’re using GUIDs and you’re doing testing, you’re doing a lot of cut and pasting. You’re not going to remember and type out that GUID anytime soon.
Now, one of the things you can do with a proprietary or local terminology is you can improve it. You can seek to make it better, make it more descriptive. You can look to retire things that are no longer appropriate. Just because it didn’t come from a standards organization doesn’t mean you can’t apply good vocabulary practices, and there are a number of great sources that describe good vocabulary practices, and I have talked about it in my blog in the past. You’re welcome to look that up. So if you Google Cimino Desiderata, you’ll find a really great reference that talks about how to create a good vocabulary. But, as the people that are trying to cope with these, as I find, most people are not necessarily curating terminologies. They’re coping with them. Remembering every terminology that you encounter, whether it comes in an HL7 transaction or a CCDA or FHIR bundle, that’s all information.
That information could be vital to you making the right decision about a patient. And so the first thing I would say is, “Don’t underestimate the importance of any terminology, not just the standard terminology. And if you are in the business of curating a terminology, do put in the time and diligence to try to imbue it with as much quality as you can.” Somebody down the road has to consume that terminology that you’ve made. When you’re creating a terminology, whether it’s accidental or deliberate, put thought into it, and just remember that, in healthcare, the terminologies you make make a big difference to the people that we’re taking care of throughout the industry. So, that’s all I’ve got on terminology I appreciate you listening and look forward to talking to you again in a future episode of the Informonster podcast. This is Charlie Harp saying, thank you very much.