CXRMate-ED: The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It

This is the model and data pipeline for the CXRMate-ED model from:

@inproceedings{nicolson-etal-2025-impact,
    title = "The Impact of Auxiliary Patient Data on Automated Chest {X}-Ray Report Generation and How to Incorporate It",
    author = "Nicolson, Aaron  and Zhuang, Shengyao and Dowling, Jason and Koopman, Bevan",
    editor = "Che, Wanxiang  and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.9/",
    doi = "10.18653/v1/2025.acl-long.9",
    pages = "177--203",
    ISBN = "979-8-89176-251-0",
    abstract = "This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation. Traditionally, CXR report generation relies solely on data from a patient{'}s CXR exam, overlooking valuable information from patient electronic health records. Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we investigate the use of patient data from emergency department (ED) records {---} such as vital signs measured and medicines reconciled during an ED stay {---} for CXR report generation, with the aim of enhancing diagnostic accuracy. We also investigate conditioning CXR report generation on the clinical history section of radiology reports, which has been overlooked in the literature. We introduce a novel approach to transform these heterogeneous data sources into patient data embeddings that prompt a multimodal language model (CXRMate-ED). Our comprehensive evaluation indicates that using a broader set of patient data significantly enhances diagnostic accuracy. The model, training code, and dataset are publicly available."
}

The abstract from the paper:

"This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation. Traditionally, CXR report generation relies solely on data from a patient{'}s CXR exam, overlooking valuable information from patient electronic health records. Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we investigate the use of patient data from emergency department (ED) records {---} such as vital signs measured and medicines reconciled during an ED stay {---} for CXR report generation, with the aim of enhancing diagnostic accuracy. We also investigate conditioning CXR report generation on the clinical history section of radiology reports, which has been overlooked in the literature. We introduce a novel approach to transform these heterogeneous data sources into patient data embeddings that prompt a multimodal language model (CXRMate-ED). Our comprehensive evaluation indicates that using a broader set of patient data significantly enhances diagnostic accuracy. The model, training code, and dataset are publicly available."

Prepare the dataset:

import transformers

# Paths:
physionet_dir = '/.../physionet.org/files'  # Where MIMIC-CXR, MIMIC-CXR-JPG, and MIMIC-IV-ED are stored.
database_dir = '/.../database/cxrmate_ed'  # The Hugging Face dataset will be saved here.

# Prepare the Hugging Face MIMIC-CXR & MIMIC-IV-ED dataset:
model = transformers.AutoModel.from_pretrained('aehrc/cxrmate-ed', trust_remote_code=True)
model.prepare_data(physionet_dir=physionet_dir, database_dir=database_dir)

Generate a report:

import torch
import transformers

# Device and paths:
device = 'cuda'

# Download model checkpoint:
model = transformers.AutoModelForCausalLM.from_pretrained('aehrc/cxrmate-ed', trust_remote_code=True).to(device=device)
tokenizer = transformers.PreTrainedTokenizerFast.from_pretrained('aehrc/cxrmate-ed')

# Get the Hugging Face MIMIC-CXR & MIMIC-IV-ED test set:
test_set = model.get_dataset(database_dir=database_dir, test_set_only=True)

# Get an example, add mini-batch dimension and move to device:
example = test_set[0]
example = {k: v.to(device).unsqueeze(0) if isinstance(v, torch.Tensor) else [v] for k, v in example.items()}  # Add mini-batch dimension and move to device.

# Convert the patient data in the batch into embeddings:
inputs_embeds, attention_mask, token_type_ids, position_ids, bos_token_ids = model.prepare_inputs(tokenizer=tokenizer, **example)
    
# Generate reports:
output_ids = model.generate(
    input_ids=bos_token_ids,
    decoder_inputs_embeds=inputs_embeds,
    decoder_token_type_ids=token_type_ids,
    prompt_attention_mask=attention_mask,
    prompt_position_ids=position_ids,
    special_token_ids=[tokenizer.sep_token_id],
    max_length=256,
    num_beams=4,
    return_dict_in_generate=True,
)['sequences']

# Findings and impression section:
findings, impression = model.split_and_decode_sections(output_ids, [tokenizer.sep_token_id, tokenizer.eos_token_id], tokenizer)
for i,j in zip(findings, impression):
    print(f'Findings:\t{i}\nImpression:\t{j}\n\n')

Inference for a study (no emergency data & no Hugging Face Datasets)

import torch
import transformers
from torchvision.io import read_image

# Modules:
model = transformers.AutoModelForCausalLM.from_pretrained('aehrc/cxrmate-ed', trust_remote_code=True).to(device=device)
tokenizer = transformers.PreTrainedTokenizerFast.from_pretrained('aehrc/cxrmate-ed')

study_image_paths = ['...', '...']  # e.g., ['img1.jpeg', 'img2.jpeg']. 

indication = '...'  # Set to None if not using.
history = '...'  # Set to None if not using.

images = [read_image(i) for i in img_path_list_idx]
images = [torch.stack([model.test_transforms(i) for i in images])]
images = torch.nn.utils.rnn.pad_sequence(images, batch_first=True, padding_value=0.0).to(device=device)
image_time_deltas = [[model.zero_time_delta_value] * images.shape[1]]

# Convert the patient data in the batch into embeddings:
inputs_embeds, attention_mask, token_type_ids, position_ids, bos_token_ids = model.prepare_inputs(
    tokenizer=tokenizer, images=images, image_time_deltas=image_time_deltas, study_id=[0], indication=[[indication]], history=[[history]]
)
            
# Generate reports:
output_ids = model.generate(
    input_ids=bos_token_ids,
    decoder_inputs_embeds=inputs_embeds,
    decoder_token_type_ids=token_type_ids,
    prompt_attention_mask=attention_mask,
    prompt_position_ids=position_ids,
    special_token_ids=[tokenizer.sep_token_id],
    max_length=256,
    num_beams=4,
    return_dict_in_generate=True,
)['sequences']

# Findings and impression section:
findings, impression = model.split_and_decode_sections(output_ids, [tokenizer.sep_token_id, tokenizer.eos_token_id], tokenizer)

MIMIC-CXR & MIMIC-IV-ED dataset:

MIMIC-CXR, MIMIC-CXR-JPG, and MIMIC-IV-ED must be in the same Physio Net directory. E.g.:

user@cluster:~$ ls /home/user/physionet.org/files
mimic-cxr  mimic-cxr-jpg  mimic-iv-ed

Download MIMIC-CXR-JPG:

Download the MIMIC-CXR-JPG dataset from https://physionet.org/content/mimic-cxr-jpg, e.g.,

wget -r -N -c -np --user <username> --ask-password https://physionet.org/files/mimic-cxr-jpg/2.1.0/

Note that you must be a credentialised user to access this dataset.

Download the reports from MIMIC-CXR:

MIMIC-CXR-JPG does not include the radiology reports and are instead included with MIMIC-CXR (the DICOM version of the dataset). To download this dataset and avoid downloading the DICOM files (which are very large), use --reject dcm with the wget command from https://physionet.org/content/mimic-cxr, e.g,

wget -r -N -c -np --reject dcm --user <username> --ask-password https://physionet.org/files/mimic-cxr/2.0.0/

Note that you must be a credentialised user to access this dataset.

Download MIMIC-IV-ED:

Download the MIMIC-IV-ED dataset from https://physionet.org/content/mimic-iv-ed, e.g.,

wget -r -N -c -np --user <username> --ask-password https://physionet.org/files/mimic-iv-ed/2.2/

Note that you must be a credentialised user to access this dataset.

Environment requirements:

Environment requirements can be found here: https://github.com/aehrc/cxrmate-ed/blob/main/requirements.txt.

Training:

The training pipeline for CXRMate-ED, is available at: https://github.com/aehrc/cxrmate-ed.

aehrc
/

cxrmate-ed

CXRMate-ED: The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It

Prepare the dataset:

Generate a report:

Inference for a study (no emergency data & no Hugging Face Datasets)

MIMIC-CXR & MIMIC-IV-ED dataset:

Download MIMIC-CXR-JPG:

Download the reports from MIMIC-CXR:

Download MIMIC-IV-ED:

Environment requirements:

Training:

Collection including aehrc/cxrmate-ed

cxrmate-ed