Project
Automated Clinical Coding Using AI Techniques
Clinical coding is essential for utilizing hospital data for funding allocation, benchmarking performance, monitoring outcomes, and research. However, it is a manual, labour-intensive process facing challenges such as a shortage of trained coders, increasing EMR volume and complexity, and coding quality issues. Many important patient outcome factors are not rountinely coded, such as patients' smoking status, functional and cognitive status, living arrangements, or social support.
This project aims to address these challenges using AI, combining deep learning-based large language models with rule-based symbolic AI.
The goals include automating diagnosis and health intervention codes and extracting uncoded clinical and lifestyle factors from EMRs. This innovative approach will enhance the efficiency and accuracy of clinical coding in Australia, using local patient data and coding standards to avoid biases and inaccuracies of foreign commercial solutions.
Aims
This project aims to develops novel approach for the automated clinical coding problem that uses the latest advances in large language models along with Symbolic AI to optimise clinical coding following the Australian coding guidelines and standards and adds new value by extending the content of coded data beyond diagnoses and procedures.
This project aims to develop:
- AI algorithms to automate coding of diagnosis and health intervention codes from diverse hosptial EMRs;
- AI algorithms to extract detailed clinical, health and lifestyle factors that currently are not rountinely coded - such as patients' smoking status, BMI, functional and cognitive status, living arrangements, social support and languages spoken - from EMRs.
Design
We will utilize open-source large language models (LLMs) in a secure environment, employing a combination of the following methodologies:
- Fine-Tuning of LLMs: We will training LLMs with EMR data using parameter-efficient methods such as LoRA, enabling them to perform clinical coding accurately.
- Prompt Engineering: We will develop a methodology using prompt engineering to enhance the performance of accuracy of the LLMs in generating clinical codes as well as clinical, health, and lifestyle factors.
- Symbolic AI - Post-Processing:Â We will apply symbolic AI for post-processing to:
a. Detect rare or less frequent clinical codes that the deep learning model might miss.
b. Ensure consistency with Australian Coding Stanards.
c. Assist LLMs in the extraction of health and lifestyles factors.
Data will be collected from various medical records, including publicly available sources such as MIMIC, and Australian datasets like the Cardiac Analytics and Innovation (CardiacAI) Data Repository. The performance of the summarization tool will be assessed through rigorous testing and validation processes.
Centre for Big Data Research in Health
Dr Oscar Perez Concha
Dr Sanja Lujic