Towards comprehensive syntactic and semantic annotations of the clinical narrative

PDF Version Also Available for Download.

Description

This article discusses the creation of annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP).

Physical Description

9 p.

Creation Information

Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William F., IV; Warner, Colin; Hwang, Jena D. et al. January 25, 2013.

Context

This article is part of the collection entitled: UNT Scholarly Works and was provided by UNT College of Engineering to Digital Library, a digital repository hosted by the UNT Libraries. More information about this article can be viewed below.

Who

People and organizations associated with either the creation of this article or its content.

Authors

Publisher

Provided By

UNT College of Engineering

The UNT College of Engineering promotes intellectual and scholarly pursuits in the areas of computer science and engineering, preparing innovative leaders in a variety of disciplines. The UNT College of Engineering encourages faculty and students to pursue interdisciplinary research among numerous subjects of study including databases, numerical analysis, game programming, and computer systems architecture.

Contact Us

What

Descriptive information to help identify this article. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

This article discusses the creation of annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP).

Physical Description

9 p.

Notes

Abstract

Objective To create annotated clinical narratives with layers of syntactic and semantic labels to facilitate advances in clinical natural language processing (NLP). To develop NLP algorithms and open source components.

Methods Manual annotation of a clinical narrative corpus of 127 606 tokens following the Treebank schema for syntactic information, PropBank schema for predicate-argument structures, and the Unified Medical Language System (UMLS) schema for semantic information. NLP components were developed.

Results The final corpus consists of 13 091 sentences containing 1772 distinct predicate lemmas. Of the 766 newly created PropBank frames, 74 are verbs. There are 28 539 named entity (NE) annotations spread over 15 UMLS semantic groups, one UMLS semantic type, and the Person semantic category. The most frequent annotations belong to the UMLS semantic groups of Procedures (15.71%), Disorders (14.74%), Concepts and Ideas (15.10%), Anatomy (12.80%), Chemicals and Drugs (7.49%), and the UMLS semantic type of Sign or Symptom (12.46%). Inter-annotator agreement results: Treebank (0.926), PropBank (0.891–0.931), NE (0.697–0.750). The part-of-speech tagger, constituency parser, dependency parser, and semantic role labeler are built from the corpus and released open source. A significant limitation uncovered by this project is the need for the NLP community to develop a widely agreed-upon schema for the annotation of clinical concepts and their relations.

Conclusions This project takes a foundational step towards bringing the field of clinical NLP up to par with NLP in the general domain. The corpus creation and NLP components provide a resource for research and application development that would have been previously impossible.

Source

  • Journal of the American Medical Informatics Association, 2013. Oxford, UK: Oxford University Press

Language

Item Type

Identifier

Unique identifying numbers for this article in the Digital Library or other systems.

Publication Information

  • Publication Title: Journal of the American Medical Informatics Association
  • Volume: 20
  • Issue: 5
  • Pages: 922-930
  • Peer Reviewed: Yes

Collections

This article is part of the following collection of related materials.

UNT Scholarly Works

Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.

What responsibilities do I have when using this article?

When

Dates and time periods associated with this article.

Submitted Date

  • September 3, 2012

Accepted Date

  • December 28, 2012

Creation Date

  • January 25, 2013

Added to The UNT Digital Library

  • May 1, 2018, 12:41 a.m.

Usage Statistics

When was this article last used?

Yesterday: 1
Past 30 days: 2
Total Uses: 2

Interact With This Article

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William F., IV; Warner, Colin; Hwang, Jena D. et al. Towards comprehensive syntactic and semantic annotations of the clinical narrative, article, January 25, 2013; Oxford, United Kingdom. (digital.library.unt.edu/ark:/67531/metadc1132762/: accessed June 17, 2018), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT College of Engineering.