Mapping Texts: Combining Text-Mining and Geo-Visualization To Unlock The Research Potential of Historical Newspapers Metadata
Metadata describes a digital item, providing (if known) such information as creator, publisher, contents, size, relationship to other resources, and more. Metadata may also contain "preservation" components that help us to maintain the integrity of digital files over time.
- Main Title Mapping Texts: Combining Text-Mining and Geo-Visualization To Unlock The Research Potential of Historical Newspapers
- Series Title Mapping Texts
Author: Torget, Andrew J., 1978-Creator Type: PersonalCreator Info: University of North Texas
Author: Mihalcea, Rada, 1974-Creator Type: PersonalCreator Info: University of North Texas
Author: Christensen, JonCreator Type: PersonalCreator Info: Stanford University
Author: McGhee, GeoffCreator Type: PersonalCreator Info: Stanford University
Funder: National Endowment for the HumanitiesContributor Type: Organization
- Creation: 2011
- Content Description: Paper on mapping texts and combining text-mining and geo-visualization to unlock the research potential of historical newspapers.
- Physical Description: 53 p.
- Keyword: text-mining
- Keyword: geo-visualization
- Keyword: newspapers
- Keyword: historical documents
- Grant: National Endowment for the Humanities Level II Digital Humanities Start-Up Grant
Name: UNT Scholarly WorksCode: UNTSW
Name: UNT College of Arts and SciencesCode: UNTCAS
- Rights Access: public
- Grant Number: HD-51188-10
- Archival Resource Key: ark:/67531/metadc83797
- Academic Department: History
- Academic Department: Computer Science and Engineering
- Display Note: Abstract: In September 2010, the University of North Texas (in partnership with Stanford University) was awarded a National Endowment for the Humanities Level II Digital Humanities Start-up Grant (Award #HD-51188-10) to develop a series of experimental models for combining the possibilities of text-mining with geospatial mapping in order to unlock the research potential of large-scale collections of historical newspapers. Using a sample of approximately 230,000 pages of historical newspapers from the 'Chronicling America' digital newspaper database, we developed two interactive visualizations of the language content of these massive collections of historical documents as they spread across both time and space: one measuring the quantity and quality of the digitized content, and a second measuring several of the most widely used large-scale language pattern metrics common in natural language processing work. This white paper documents those experiments and their outcomes, as well as our recommendations for future work.