Mapping Texts: Examining the Effects of OCR Noise on Historical Newspaper Collections

PDF Version Also Available for Download.

Description

Book chapter that documents the “Mapping Texts” project, an experiment focused on the problem of OCR noise in historical newspapers.

Physical Description

20 p.

Creation Information

Torget, Andrew J., 1978- 2023.

Context

This chapter is part of the collection entitled: UNT Scholarly Works and was provided by the UNT College of Liberal Arts & Social Sciences to the UNT Digital Library, a digital repository hosted by the UNT Libraries. More information about this chapter can be viewed below.

Who

People and organizations associated with either the creation of this chapter or its content.

Author

Publisher

Provided By

UNT College of Liberal Arts & Social Sciences

The College of Liberal Arts & Social Sciences prepares students to be the next generation of innovators, scholars, entrepreneurs, and civic leaders. The College comprises more than 20 departments hosting more than 70 degree programs.

Contact Us

What

Descriptive information to help identify this chapter. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

Book chapter that documents the “Mapping Texts” project, an experiment focused on the problem of OCR noise in historical newspapers.

Physical Description

20 p.

Notes

Abstract: This paper documents the “Mapping Texts” project, an experiment focused on the problem of OCR noise in historical newspapers. The purpose of the project was to combine natural language processing with data visualization to measure both OCR noise rates and their effects on how scholars detect meaningful high-level trends embedded in large-scale digital newspaper collections. The project developed two interactive visualizations measuring OCR quality and its effects on detecting high-level trends in the data, revealing the depth of the challenges facing humanities scholars seeking greater transparency of OCR data in historical newspaper databases.

Source

  • Digitised Newspapers- A New Eldorado for Historians?. Berlin, Germany: De Gruyter Oldenbourg, 2023

Language

Item Type

Identifier

Unique identifying numbers for this chapter in the Digital Library or other systems.

Publication Information

  • Publication Title: Digitised Newspapers- A New Eldorado for Historians?

Collections

This chapter is part of the following collection of related materials.

UNT Scholarly Works

Materials from the UNT community's research, creative, and scholarly activities and UNT's Open Access Repository. Access to some items in this collection may be restricted.

What responsibilities do I have when using this chapter?

When

Dates and time periods associated with this chapter.

Creation Date

  • 2023

Added to The UNT Digital Library

  • Jan. 4, 2023, 4:55 p.m.

Description Last Updated

  • Sept. 5, 2023, 2:11 p.m.

Usage Statistics

When was this chapter last used?

Yesterday: 0
Past 30 days: 3
Total Uses: 6

Interact With This Chapter

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Torget, Andrew J., 1978-. Mapping Texts: Examining the Effects of OCR Noise on Historical Newspaper Collections, chapter, 2023; Berlin, Germany. (https://digital.library.unt.edu/ark:/67531/metadc2031349/: accessed December 7, 2023), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Liberal Arts & Social Sciences.

Back to Top of Screen