Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report.

PDF Version Also Available for Download.

Description

This SAND report summarizes the activities and outcomes of the Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) LDRD project, which addressed improving the accuracy of conditional random fields for named entity recognition through the use of ensemble methods. Conditional random fields (CRFs) are powerful, flexible probabilistic graphical models often used in supervised machine learning prediction tasks associated with sequence data. Specifically, they are currently the best known option for named entity recognition (NER) in text. NER is the process of labeling words in sentences with semantic identifiers such as %E2%80%9Cperson%E2%80%9D, %E2%80%9Cdate%E2%80%9D, or %E2%80%9Corganization%E2%80%9D. Ensembles are a powerful ... continued below

Physical Description

92 p.

Creation Information

Kegelmeyer, W. Philip; Shead, Timothy M. & Dunlavy, Daniel M. September 1, 2013.

Context

This report is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided by UNT Libraries Government Documents Department to Digital Library, a digital repository hosted by the UNT Libraries. More information about this report can be viewed below.

Who

People and organizations associated with either the creation of this report or its content.

Authors

Sponsor

Publisher

  • Sandia National Laboratories
    Publisher Info: Sandia National Laboratories (SNL-CA), Livermore, CA (United States) and Albuquerque, NM
    Place of Publication: Albuquerque, New Mexico

Provided By

UNT Libraries Government Documents Department

Serving as both a federal and a state depository library, the UNT Libraries Government Documents Department maintains millions of items in a variety of formats. The department is a member of the FDLP Content Partnerships Program and an Affiliated Archive of the National Archives.

Contact Us

What

Descriptive information to help identify this report. Follow the links below to find similar items on the Digital Library.

Description

This SAND report summarizes the activities and outcomes of the Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) LDRD project, which addressed improving the accuracy of conditional random fields for named entity recognition through the use of ensemble methods. Conditional random fields (CRFs) are powerful, flexible probabilistic graphical models often used in supervised machine learning prediction tasks associated with sequence data. Specifically, they are currently the best known option for named entity recognition (NER) in text. NER is the process of labeling words in sentences with semantic identifiers such as %E2%80%9Cperson%E2%80%9D, %E2%80%9Cdate%E2%80%9D, or %E2%80%9Corganization%E2%80%9D. Ensembles are a powerful statistical inference meta-method that can make most supervised machine learning methods more accurate, faster, or both. Ensemble methods are normally best suited to %E2%80%9Cunstable%E2%80%9D classification methods with high variance error. CRFs applied to NER are very stable classifiers, and as such, would initially seem to be resistant to the benefits of ensembles. The NEEEEIT project nonetheless worked out how to generalize ensemble methods to CRFs, demonstrated that accuracy can indeed be improved by proper use of ensemble techniques, and generated a new CRF code, %E2%80%9CpyCrust%E2%80%9D and a surrounding application environment, %E2%80%9CNEEEEIT%E2%80%9D, which implement those improvements. The summary practical advice that results from this work, then, is: When making use of CRFs for label prediction tasks in machine learning, use the pyCrust CRF base classifier with NEEEEIT's bagging ensemble implementation. (If those codes are not available, then de-stablize your CRF code via every means available, and generate the bagged training sets by hand.) If you have ample pre-processing computational time, do %E2%80%9Cforward feature selection%E2%80%9D to find and remove counter-productive feature classes. Conversely, if pre-processing time is limited, use NEEEEIT's %E2%80%9Cedited clone%E2%80%9D pyCrust mechanism, along with a more modest use of bagging, to generate ensembles much more quickly.

Physical Description

92 p.

Language

Item Type

Identifier

Unique identifying numbers for this report in the Digital Library or other systems.

  • Report No.: SAND2013-9344
  • Grant Number: AC04-94AL85000
  • Office of Scientific & Technical Information Report Number: 1115263
  • Archival Resource Key: ark:/67531/metadc864353

Collections

This report is part of the following collection of related materials.

Office of Scientific & Technical Information Technical Reports

What responsibilities do I have when using this report?

When

Dates and time periods associated with this report.

Creation Date

  • September 1, 2013

Added to The UNT Digital Library

  • Sept. 16, 2016, 12:32 a.m.

Description Last Updated

  • Feb. 17, 2017, 4:30 p.m.

Usage Statistics

When was this report last used?

Yesterday: 0
Past 30 days: 0
Total Uses: 1

Interact With This Report

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Kegelmeyer, W. Philip; Shead, Timothy M. & Dunlavy, Daniel M. Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report., report, September 1, 2013; Albuquerque, New Mexico. (digital.library.unt.edu/ark:/67531/metadc864353/: accessed August 22, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; crediting UNT Libraries Government Documents Department.