Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report.

Kegelmeyer, W. Philip; Shead, Timothy M.; Dunlavy, Daniel M.

You Are Here:
University Libraries
UNT Digital Library
UNT Libraries Government Documents Department
This Report
Page: 4

Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report. Page: 4 of 92

92 p.

This report is part of the collection entitled: Office of Scientific & Technical Information Technical Reports and was provided to UNT Digital Library by the UNT Libraries Government Documents Department.

View a full description of this report.

Previous search

Adjust Image
Rotate Left
Rotate Right
Brightness, Contrast, etc. (Experimental)
Cropping Tool
Download Sizes
Preview all sizes/dimensions or...
Download Thumbnail
Download Small
Download Medium
Download Large
High Resolution Files
IIIF Image JSON
IIIF Image URL
Accessibility
View Extracted Text

zoom Next

These controls are experimental and have not yet been optimized for user experience.

brightness

Reset Brightness 0

contrast

Reset Contrast 0

saturation

Reset Saturation 0

sharpen

Reset Sharpness 0

exposure

Reset Exposure 0

hue

Reset Hue 0

gamma

Reset Gama 0

Applying filters

Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report.

[Sequence #]: 4 of 92

Previous item Next item

Extracted Text

The following text was automatically extracted from the image on this page using optical character recognition software:

Abstract

This SAND report summarizes the activities and outcomes of the Network and Ensemble
Enabled Entity Extraction in Informal Text (NEEEEIT) LDRD project, which addressed im-
proving the accuracy of conditional random fields for named entity recognition through the use
of ensemble methods.
Conditional random fields (CRFs) are powerful, flexible probabilistic graphical models
often used in supervised machine learning prediction tasks associated with sequence data.
Specifically, they are currently the best known option for named entity recognition (NER)
in text. NER is the process of labeling words in sentences with semantic identifiers such as
"person", "date", or "organization".
Ensembles are a powerful statistical inference meta-method that can make most supervised
machine learning methods more accurate, faster, or both. Ensemble methods are normally best
suited to "unstable" classification methods with high variance error. CRFs applied to NER
are very stable classifiers, and as such, would initially seem to be resistant to the benefits of
ensembles.
The NEEEEIT project nonetheless worked out how to generalize ensemble methods to
CRFs, demonstrated that accuracy can indeed be improved by proper use of ensemble tech-
niques, and generated a new CRF code, "pyCrust" and a surrounding application environment,
"NEEEEIT", which implement those improvements.
The summary practical advice that results from this work, then, is:
" When making use of CRFs for label prediction tasks in machine learning, use the pyCrust
CRF base classifier with NEEEEIT's bagging ensemble implementation. (If those codes
are not available, then de-stablize your CRF code via every means available, and generate
the bagged training sets by hand.)
" If you have ample pre-processing computational time, do "forward feature selection" to
find and remove counter-productive feature classes.
" Conversely, if pre-processing time is limited, use NEEEEIT's "edited clone" pyCrust
mechanism, along with a more modest use of bagging, to generate ensembles much
more quickly.

Upcoming Pages

Here’s what’s next.

5 of 92

6 of 92

7 of 92

8 of 92

Show all pages in this report.

Search Inside

This report can be searched. Note: Results may vary based on the legibility of text within the document.

or search this site for other reports

Tools / Downloads

Get a copy of this page or view the extracted text.

Preview all sizes/dimensions or...

Download Thumbnail
Download Small
Download Medium
Download Large
IIIF Image JSON
IIIF Image

View Extracted (OCR) Text

Citing and Sharing

Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.

Reference the current page of this Report.

Kegelmeyer, W. Philip; Shead, Timothy M. & Dunlavy, Daniel M. Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report., report, September 1, 2013; Albuquerque, New Mexico. (https://digital.library.unt.edu/ark:/67531/metadc864353/m1/4/: accessed April 25, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Government Documents Department.

Network and Ensemble Enabled Entity Extraction in Informal Text (NEEEEIT) final report. Page: 4 of 92

Upcoming Pages

Search Inside

Tools / Downloads

Citing and Sharing

Reference the current page of this Report.

Print / Share This Page

Permanent URL (This Page)

Univesal Viewer

International Image Interoperability Framework (This Page)