We gratefully acknowledge support from
the Simons Foundation
and member institutions
Full-text links:

Download:

Current browse context:

cs.CL
new | recent | 1809

Change to browse by:

cs

References & Citations

Computer Science > Computation and Language

Title:emrQA: A Large Corpus for Question Answering on Electronic Medical Records

Abstract: We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.
Comments: Accepted at Conference on Empirical Methods in Natural Language Processing (EMNLP) 2018
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:1809.00732 [cs.CL]
  (or arXiv:1809.00732v1 [cs.CL] for this version)

Submission history

From: Anusri Pampari [view email]
[v1] Mon, 3 Sep 2018 21:56:47 UTC (887 KB)