Description
Identifying adverse drug reactions by using
BERT models for classification and NER
Divij Mishra
Aravind Rajeev Nair Click for Presentation Video
Naveen Sethuraman
Introduction
● Key application of NLP – auto-extracting information from medical docs
● ADR (Adverse Drug Reaction) identification – classic problem in NLP + healthcare
● Tasks:
○ 1 – Does a given sentence contain an ADR? (binary classification) ○ 2 – If so, identify both the drug and adverse reaction by name. (NER)
● Methods:
○ Fine-tune BERT, BioBERT, BioClinicalBERT.
Prior work
Model: BERT + Variants
○ Transformer-based pre-trained language model
○ Generates contextualized word embeddings for sequences
● Use HuggingFace libraries
● Task 1- Binary Classification
○ Add a linear layer with 2 output nodes
● Task 2 – NER
○ Add a linear layer with 5 output nodes (2 entities = 5 labels under BIO scheme)
Model: BERT variants
● Can improve BERT performance on specific domains by modifying pre-training/fine-tuning corpus!
BioBERT weights -> fine-tuned on a clinical records corpus
Dataset
● Adverse Drug Event benchmark dataset [Gurulingappa et al., 2012] ● We use the HuggingFace version [ade_corpus_v2] ● Task 1 – sentence classification:
○ Contains 23.5k samples labelled for binary classification ● Task 2 – drug + effect identification:
○ Contains 6.8k samples
○ Given as a relation extraction problem – since each sentence is annotated for exactly one drug and one effect, can consider it to be an NER problem.
● For both tasks, used an 80-10-10 split
Experiments – Task 1: Binary classification
● We fine-tuned BERT, BioBERT, and BioClinicalBERT on the ADR
classification task, for 2 epochs, with early stopping, batch size = 16, and evaluation steps = 100. (plots shown for BioBERT)
Experiments – Task 2: NER
● We fine-tuned BERT, BioBERT, and BioClinicalBERT on the ADR NER task, for 3 epochs, with early stopping, batch size = 32, and evaluation steps = 20.
(plots shown for BioBERT)
Results
● For Task 1: Binary Classification, we saw that BioBERT performed the best (91% F1-score), with BioClinicalBERT giving comparable performance. BERT performed much worse than the other two.
● For Task 2: NER, all three performed comparably. BioBERT performed the best (65% F1-score).
Conclusion
● Fine-tuned Transformer-based models show good results on ADR detection!
○ Effective, low data-requirement method
● Pre-training also matters
○ BioBERT performs significantly better than BERT.
○ However, BioClinicalBERT shows no improvement over BioBERT, surprising!
● Room for improvement – previous work hit 84% score on Task 2 – we likely need to do more hyperparameter tuning, train for longer with higher regularization.
Reviews
There are no reviews yet.