PLoS One. 2026 Jul 1;21(7):e0350516. doi: 10.1371/journal.pone.0350516. eCollection 2026.
ABSTRACT
OBJECTIVE: To develop a transformer-based natural language processing (NLP) system for detecting adverse drug events (ADEs) from clinical notes in electronic health records (EHRs).
MATERIALS AND METHODS: We fine-tuned BERT Short-Formers and Clinical-Longformer using the processed dataset from the 2018 National NLP Clinical Challenges (n2c2) shared task Track 2. Two data processing methods, window-based and split-based approaches, were compared to identify the optimal processing method. Model generalizability was evaluated on a dataset extracted from Vanderbilt University Medical Center (VUMC) EHRs.
RESULTS: On the n2c2 dataset, the best 5-fold cross-validation AUPRC, micro F1, and macro F1 scores were 0.840 (Clinical-Longformer, 4-chunk split), 0.965 (BioBERT, 15-word window), and 0.852 (Clinical-Longformer, 10-chunk split). On the VUMC dataset, the best AUPRC, micro F1, and macro F1 scores were 0.536 (Clinical-Longformer, 6-chunk split), 0.966 (BERT-base-uncased, 6-chunk split), and 0.762 (Clinical-Longformer, 4-chunk split).
DISCUSSION: Transformer-based models demonstrated strong performance for ADE detection, with split-based processing generally outperforming window-based methods. Clinical-Longformer combined with a practical split-based approach showed promise for real-world implementation. Beyond token limits, chunk size substantially influenced model performance, even when text length remained within limits.
CONCLUSION: Our findings provide practical guidance for developing transformer-based ADE detection systems from clinical notes. The selection of both text preprocessing strategies and model architectures should be guided by note characteristics and practical considerations such as annotation burden.
PMID:42384713 | DOI:10.1371/journal.pone.0350516