The College of Liberal Arts & Social Sciences prepares students to be the next generation of innovators, scholars, entrepreneurs, and civic leaders. The College comprises more than 20 departments hosting more than 70 degree programs.
Book chapter that documents the “Mapping Texts” project, an experiment focused on the problem of OCR noise in historical newspapers.
Physical Description
20 p.
Notes
Abstract: This paper documents the “Mapping Texts” project, an experiment focused on the problem of OCR noise in historical newspapers. The purpose of the project was to combine natural language processing with data visualization to measure both OCR noise rates and their effects on how scholars detect meaningful high-level trends embedded in large-scale digital newspaper collections. The project developed two interactive visualizations measuring OCR quality and its effects on detecting high-level trends in the data, revealing the depth of the challenges facing humanities scholars seeking greater transparency of OCR data in historical newspaper databases.
Publication Title:
Digitised Newspapers- A New Eldorado for Historians?
Collections
This chapter is part of the following collection of related materials.
UNT Scholarly Works
Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.
Torget, Andrew J., 1978-.Mapping Texts: Examining the Effects of OCR Noise on Historical Newspaper Collections,
chapter,
2023;
Berlin, Germany.
(https://digital.library.unt.edu/ark:/67531/metadc2031349/:
accessed December 7, 2023),
University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu;
crediting UNT College of Liberal Arts & Social Sciences.