Merging Ontologies and Data from Electronic Health Records: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , ,

The Electronic Health Record (EHR) is a system for collecting and storing patient medical records as data that can be mechanically accessed, hence facilitating and assisting the medical decision-making process. EHRs exist in several formats, and each format lists thousands of keywords to classify patients data. The keywords are specific and are medical jargon; hence, data classification is very accurate. As the keywords constituting the formats of medical records express concepts by means of specific jargon without definitions or references, their proper use is left to clinicians and could be affected by their background, hence the interpretation of data could become slow or less accurate than that desired. This article presents an approach that accurately relates data in EHRs to ontologies in the medical realm. Thanks to ontologies, clinicians can be assisted when writing or analysing health records, e.g., our solution promptly suggests rigorous definitions for scientific terms, and automatically connects data spread over several parts of EHRs. The first step of our approach consists of converting selected data and keywords from several EHR formats into a format easier to parse, then the second step is merging the extracted data with specialised medical ontologies. Finally, enriched versions of the medical data are made available to professionals. The proposed approach was validated by taking samples of medical records and ontologies in the real world. The results have shown both versatility on handling data, precision of query results, and appropriate suggestions for relations among medical records.

  • Electronic Health Records
  • HL7
  • ontology
  • data integration
  • data analysis

1. Introduction

The Electronic Health Record (EHR) can be seen as the natural evolution of the physical medical record, which consisted of hard-copy medical documents for a patient. Nowadays, electronic versions of medical records allow citizens to easily track the entire history of their healthcare life and share it with healthcare professionals. Moreover, doctors are better equipped to keep up with the large amount of data, more easily access them, and deliver a significant improvement in health services for patients. To facilitate the syntactic analysis of such data and some level of interoperability between different software systems documents follow a standard format: Health Level 7 (HL7) Clinical Document Architecture (CDA) [1,2]. Such a standard is a valuable tool for data sharing; however, its adoption has been slow due to the length and complexity of the defined formats. In fact, given the wide spectrum and complexity of the medical field, several templates have been defined within HL7, one for each type of clinical document (http://cdasearch.hl7.org, last accessed on 10 November 2023).
Documents produced according to HL7 present two critical issues that derive from its formats: (i) complexity in the organisation of data, which makes it difficult to read data, i.e., navigating the formats to extract data is a complex task, and this can be a significant impediment when developing software systems that extract data from an EHR [2]; (ii) though the terms are very specific, they present only the essential pieces of information, without any hints to other related terms or factors, such as, e.g., related diseases, symptoms, possible causes, etc. Moreover, pieces of data are scattered in the document, without evident relationships among them. There is a need for a way to organise data in an EHR to make it possible to discover and suggest potential diseases, causes, etc., related to the medical values and conditions.
To organise knowledge in a domain, some previous approaches have attempted to build ontologies automatically [3]; however, in the medical field it is paramount to have a highly reliable ontology curated by experts. A significant number of well-established ontologies exist in the biomedical realm (http://obofoundry.org, last accessed on 10 November 2023). Although ontologies have been used to guide the diagnosis process, they have not been related to EHR data [4].

2. Merging Ontologies and Data from Electronic Health Records

In recent years, the high complexity of clinical documents has been largely recognised [6], and the importance of a correct interpretation of data for an appropriate diagnosis has been also highlighted [7,8,9]. Moreover, there exist some gaps in the representation of symptoms and clinical data in the structure of EHR clinical documents, because the most significant descriptions of conditions remain in unprocessed free text [10,11,12]. Indeed, the importance of sharing and using EHRs has also been shown [13].
JSON is a standard text-based format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web applications (e.g., sending some data from the server to the client, so it can be displayed on a web page, or vice versa). Many researchers have highlighted that JSON is a notation that is gradually replacing XML due to its relative simplicity, intuitiveness, compactness, and the ability to directly map the native data types of popular programming languages [14,15].
A first prototype conversion tool of clinical documents to a JSON format was introduced by Rinner et al. [16]. The goal of their work was to allow easier access to health data stored in the Austrian information system for EHR (ELGA) and convert CDA documents into Fast Healthcare Interoperability Resources (FHIR) resources, via XML. The limitation of such a proposal is that it implements a basic prototype with limited capabilities for converting from the XML document to an equivalent JSON file. Moreover, they do not interpret or enrich the processed data.
Rousseau et al. [15] aimed at providing a more complete clinical document to facilitate doctors’ work by integrating Social Determinants of Health (SDoH) into EHRs, leveraging for this a specific ontology. Specifically, they combine the HL7 standard with the SNOMEDCT ontology (https://bioportal.bioontology.org/ontologies/SNOMEDCT, last accessed 10 January 2024), using JSON as a means to better connect data in medical records to external data sources. Indeed, the authors concluded that the research and development teams of health IT and healthcare services strive to make information in existing semantic frameworks more accessible and connected to EHRs. However, they left the integration of data from other ontologies as future work.
Ontologies are among the most effective and most widely adopted means to represent knowledge in the medical (or any other) domain [17,18,19,20]. Several authors (i.e., [21]) have been using them to facilitate medical decision-making, e.g., to suggest diagnoses and treatments, also taking advantage of probabilistic algorithms applied to the ontology data. However, successfully processing clinical data to navigate complex ontologies down to a possibly correct diagnosis is not at all an easy task to accomplish, because probabilistic algorithms applied to the rigid complex structure of an ontology can fail when the clinical data are poorly specified or relate to little-known cases. Other authors [22,23,24,25] have been more concerned with expanding the domain of medical knowledge through an accurate review of the related concepts, e.g., the analysis of disease ontology terms or IDs (DOIDs), highlighting the importance of an enriched and complete ontology for the analysis of diseases.
A further obstacle is the complexity of the HL7 CDA standard itself, whose data format is very widespread but also difficult to manage and to apply in the actual daily routine [16]. Specifically, this is the first approach processing the HL7 XML formatted clinical documents in order to produce a single, both enriched and easier to understand (and process), JSON object. This allows EHRs to go beyond the mere need for storing and exchanging clinical records towards a new conception as data objects that can be really effective and practical tools supporting clinicians’ everyday work.
In fact, the resulting medical document offers a clearer and more complete overall clinical picture of the patient since it is able to integrate the data in its medical records with data from multiple related medical ontologies. As a consequence, it facilitates the analysis of the patient by being able to provide the definitions of the terms and the relationships among all the basic pieces of clinical data, assisting clinicians in an in-depth analysis of the patient and the construction of a correct diagnosis. Moreover, having to manage a single JSON-formatted document object is of great added value not only because it facilitates its integration into multiple frameworks or software but also because it is more user friendly for automated processing and exchange.

This entry is adapted from the peer-reviewed paper 10.3390/fi16020062

This entry is offline, you can click here to edit this entry!