Caboodle data warehouse

5/16/2023

Messaging standards (eg, HL7 Fast Healthcare Interoperability Resources ), standardized terminologies (eg, Systematized Nomenclature of Medicine - Clinical Terms ), and standardized clinical information models (eg, openEHR archetypes ) aim to improve interoperability between systems, but much more work is needed in this area. Some systems, such as Epic, go further in providing additional systems that allow data from third-party data and legacy systems to be integrated with data collected via their own systems (Epic Clarity/Caboodle). Most notably, many National Health Service (NHS) trusts have opted to transition to using full-scale EHR systems (eg, Epic), each of which typically enforce their own data models. Methods and systems through which data are stored, collected, and retrieved have been improving in order to tackle this challenge. The resulting heterogeneity in data means that it is challenging for the organization to find a common data model or even process through which the organization’s entire record can be harmonized.

It is not uncommon for an organization to have to maintain oversight over a myriad of data systems and vendors due to the fact that different clinical specialties will have different requirements of how data needs to be stored and managed. The EHRs of an organization will typically be distributed across a number of different vendor systems, posing a challenge for the use of this information for clinical care and research. Our deployment of CogStack has focused on addressing the following 3 key issues that we feel are universal to all research driven health care organizations. In this paper, we discuss the experience of deploying CogStack at University College London Hospitals (UCLH) and highlight modifications to the platform that have improved its data harmonization and NLP capabilities. While certain off-the-shelf NLP tools were explored in the first iteration, they were added as a proof of concept to demonstrate that the platform could potentially be configured to interact with such tools. It was initially developed with an emphasis on ingestion and harmonization of records from multiple data systems within a health care organization. The platform can be described as an information retrieval system designed to interface with a hospital’s EHR system. The CogStack platform was developed to address these exact problems. Early methods used a rule-based approach, but more modern algorithms incorporate machine learning techniques, enabling the algorithms to “learn” as more data are analyzed. There has been intense interest in developing natural language processing (NLP) techniques to interpret clinical text. Interpreting free text is a major analytic challenge clinical text is written in a variety of styles by numerous authors and may have misspellings, negations, and other linguistic features. Manual analysis of unstructured text is time-consuming, so there has been much interest in developing automated methods for extracting accurate structured information from the free-text records. Working with EHRs thus presents challenges firstly in harmonizing and accessing the hospitals entire record from both existing and legacy data systems and secondly having tools and techniques available to mine and extract data from within these records, especially the unstructured free text.

An additional difficulty is that a hospital’s record is typically distributed across numerous disconnected data systems, which presents a challenge in data harmonization. In particular, the free-text records often contain important clinical information, such as patient diagnoses, that have not yet been recorded as structured data. However, the structured data only account for a small portion of the EHR data, as it is estimated that almost 80% of information records remain unstructured in the form of images, free-text records, and other such unstructured data formats. While functional systems to address these gaps are emerging, many of the tools and data analytic approaches used on EHR data are limited to structured data, such as coded diagnoses and numeric clinical measurements. In many cases, EHRs have simply replicated the paper system that they replaced and have not taken full advantage of the opportunities presented in having the health records in this new electronic format. These EHRs represent a rich data asset, but there remains a challenge in the secondary use of the data for improving clinical care through activities, such as service improvement and clinical research. Over the past 20 years, we have seen an increased uptake of electronic health records (EHRs) within health care organizations, with much of this being attributable to national efforts in having health care organizations transition to using full EHR systems.

0 Comments

BLOG

Caboodle data warehouse

Leave a Reply.

Author

Archives

Categories