Mapping Texts: Combining Text-Mining and Geo-Visualization To Unlock The Research Potential of Historical Newspapers Metadata

Metadata describes a digital item, providing (if known) such information as creator, publisher, contents, size, relationship to other resources, and more. Metadata may also contain "preservation" components that help us to maintain the integrity of digital files over time.

Title

  • Main Title Mapping Texts: Combining Text-Mining and Geo-Visualization To Unlock The Research Potential of Historical Newspapers
  • Series Title Mapping Texts

Creator

  • Author: Torget, Andrew J., 1978-
    Creator Type: Personal
    Creator Info: University of North Texas
  • Author: Mihalcea, Rada, 1974-
    Creator Type: Personal
    Creator Info: University of North Texas
  • Author: Christensen, Jon
    Creator Type: Personal
    Creator Info: Stanford University
  • Author: McGhee, Geoff
    Creator Type: Personal
    Creator Info: Stanford University

Contributor

  • Funder: National Endowment for the Humanities
    Contributor Type: Organization

Date

  • Creation: 2011

Language

  • English

Description

  • Content Description: Paper on mapping texts and combining text-mining and geo-visualization to unlock the research potential of historical newspapers.
  • Physical Description: 53 p.

Subject

  • Keyword: text-mining
  • Keyword: geo-visualization
  • Keyword: newspapers
  • Keyword: historical documents

Source

  • Grant: National Endowment for the Humanities Level II Digital Humanities Start-Up Grant

Collection

  • Name: UNT Scholarly Works
    Code: UNTSW

Institution

  • Name: UNT College of Arts and Sciences
    Code: UNTCAS

Rights

  • Rights Access: public

Resource Type

  • Paper

Format

  • Text

Identifier

  • Grant Number: HD-51188-10
  • Archival Resource Key: ark:/67531/metadc83797

Degree

  • Academic Department: History
  • Academic Department: Computer Science and Engineering

Note

  • Display Note: Abstract: In September 2010, the University of North Texas (in partnership with Stanford University) was awarded a National Endowment for the Humanities Level II Digital Humanities Start-up Grant (Award #HD-51188-10) to develop a series of experimental models for combining the possibilities of text-mining with geospatial mapping in order to unlock the research potential of large-scale collections of historical newspapers. Using a sample of approximately 230,000 pages of historical newspapers from the 'Chronicling America' digital newspaper database, we developed two interactive visualizations of the language content of these massive collections of historical documents as they spread across both time and space: one measuring the quantity and quality of the digitized content, and a second measuring several of the most widely used large-scale language pattern metrics common in natural language processing work. This white paper documents those experiments and their outcomes, as well as our recommendations for future work.