Search publications

Reset filters Search by keyword

No publications found.

 

Preprocessing narrative texts in electronic medical records to identify hospital adverse events: A scoping review

Authors: Jafarpour HWu GCheligeer CKYan JXu YSouthern DAEastwood CAZeng YQuan H


Affiliations

1 Concordia University, Gina Cody School of Engineering and Computer Science, Concordia Institute for Information Systems Engineering, 1515 Sainte Catherine West, Montreal, H3G 2W1, Quebec, Canada. Electronic address: hamed.jafarpour@concordia.ca.
2 University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: Guosong.wu@ucalgary.ca.
3 University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: cheligeerken@ucalgary.ca.
4 Concordia University, Gina Cody School of Engineering and Computer Science, Concordia Institute for Information Systems Engineering, 1515 Sainte Catherine West, Montreal, H3G 2W1, Quebec, Canada. Electronic address: jun.yan@concordia.ca.
5 University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: yuxu@ucalgary.ca.
6 University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: dasouthe@ucalgary.ca.
7 University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: caeastwo@ucalgary.ca.
8 Concordia University, Gina Cody School of Engineering and Computer Science, Concordia Institute for Information Systems Engineering, 1515 Sainte Catherine West, Montreal, H3G 2W1, Quebec, Canada. Electronic address: yong.zeng@concordia.ca.
9 University of Calgary, Department of Community Health Sciences, Cumming School of Medicine, 2500 University Drive NW, Calgary, T2N 1N4, Alberta, Canada. Electronic address: hquan@ucalgary.ca.

Description

Background: Narrative electronic medical records (EMR), which include textual notes created by clinicians within healthcare environments, represent a significant resource for documenting various facets of patient care. This form of text exhibits distinctive characteristics, such as the occurrence of grammatically incorrect sentences, abbreviations, frequent acronyms, specialized characters with particular meanings, negation expressions, and sporadic misspellings. As a result, a primary goal in processing these textual notes is to implement effective preprocessing techniques that enhance data quality and ensure consistency across all entries. Recent advancements in algorithms and methodologies within the fields of natural language processing (NLP), machine learning (ML), and large language models (LLM) have prompted researchers to leverage narrative EMR for the detection of hospital adverse events (HAE).

Methods: The scoping review adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. A scoping review protocol was developed and utilized to guide the research process, clearly outlining the eligibility criteria, information sources, search strategies, data management, selection process, data collection procedures, data items, outcomes and prioritization, data synthesis, and meta-bias considerations. The search strategy was implemented across nine engineering and medical electronic databases.

Results: The results have indicated that from a total of 3,264 studies retrieved, 48 unique studies were included in the review. Responses to the research questions were systematically extracted from these studies. The review has identified challenges associated with the preprocessing of narrative texts in EMR for HAE identification. Additionally, three research gaps have been identified: (1) the imperative need for a pipeline to preprocess narrative EMR for the identification of HAE, (2) the necessity for a robust system capable of managing the extensive volume of narrative EMR data, and (3) the requirement for temporal event system, which are essential for effective HAE detection. The study also has underscored the essential role of preprocessing tasks in enhancing the performance of HAE detection. The study has emphasized the importance of extracting N-grams from clinical text, normalizing these N-grams through lemmatization and/or stemming, and establishing semantic feature extraction in preprocessing tasks that significantly affect HAE detection performance. While LLM-based systems naturally incorporate tokenization and normalization processes within their frameworks, it remains crucial to address features that hold semantic relevance to the specific type of HAE during preprocessing.

Conclusion: This scoping review has provided valuable insights for researchers focused on HAE detection utilizing narrative EMR data. It has elucidated how preprocessing tasks can elevate the performance of HAE detection and draws attention to neglected research gaps within the field. Addressing these gaps will necessitate further investigation in subsequent research endeavors.


Keywords: Clinical textHospital adverse eventLarge language modelNarrative EMRNatural language processingPreprocessing


Links

PubMed: https://pubmed.ncbi.nlm.nih.gov/41072367/

DOI: 10.1016/j.artmed.2025.103281