Assessing the Impact of Image Resolution on OCR Transcription Accuracy

PDF Version Also Available for Download.

Description

Article investigating the relationship between image resolution and OCR (optical character recognition) performance, with a focus on both character-level accuracy and the integrity of subsequent text processing pipelines. The findings have practical implications for document digitization workflows, especially in resource-constrained environments where high-resolution image storage and processing may be questionable. It was presented at the 3rd International Workshop on Digital Language Archives held on December 15-16, 2025 as part of the ACM/IEEE Joint Conference on Digital Libraries 2025.

Physical Description

5 p.

Creation Information

Boubehziz, Toufik; Koudoro-Parfait, Caroline & Lejeune, Gaël December 30, 2025.

Context

This article is part of the collection entitled: International Workshop on Digital Language Archives and was provided by the UNT College of Information to the UNT Digital Library, a digital repository hosted by the UNT Libraries. More information about this article can be viewed below.

Who

People and organizations associated with either the creation of this article or its content.

Authors

Provided By

UNT College of Information

Situated at the intersection of people, technology, and information, the College of Information's faculty, staff and students invest in innovative research, collaborative partnerships, and student-centered education to serve a global information society. The college offers programs of study in information science, learning technologies, and linguistics.

Contact Us

What

Descriptive information to help identify this article. Follow the links below to find similar items on the Digital Library.

Titles

Degree Information

Description

Article investigating the relationship between image resolution and OCR (optical character recognition) performance, with a focus on both character-level accuracy and the integrity of subsequent text processing pipelines. The findings have practical implications for document digitization workflows, especially in resource-constrained environments where high-resolution image storage and processing may be questionable. It was presented at the 3rd International Workshop on Digital Language Archives held on December 15-16, 2025 as part of the ACM/IEEE Joint Conference on Digital Libraries 2025.

Physical Description

5 p.

Notes

Abstract: Despite advancements in OCR algorithms, the quality of input images remains a critical factor influencing recognition accuracy and subsequent text processing. For digital libraries a question remains open: what is the optimal resolution in which documents should be stored. Obviously, one might expect that the highest resolution would be the best choice but choosing the best input quality has an impact on data storage and computing time and the real influence of image resolution (and size) on OCR and subsequent tasks seems to remain an open question. High-resolution images typically allow OCR engines to better distinguish character features, leading to improved recognition performance. Conversely, low-resolution images often result in increased character ambiguity, misclassifications, and noise, thereby reducing overall OCR reliability. These recognition errors not only compromise the immediate output quality but also propagate into downstream text processing tasks such as information retrieval, named entity recognition, and natural language understanding. In this paper we investigate the relationship between image resolution and OCR performance, with a focus on both character-level accuracy and the integrity of subsequent text processing pipelines. By analyzing OCR outputs across a range of resolutions and evaluating their impact on various post-recognition tasks, we seek to identify resolution thresholds that balance processing efficiency with textual fidelity. The findings have practical implications for document digitization workflows, especially in resource-constrained environments where high-resolution image storage and processing may be questionable.

Source

  • 3rd International Workshop on Digital Language Archives, December 15-16, 2025.
  • ACM/IEEE Joint Conference on Digital Libraries, December 15-19, 2025.

Language

Item Type

Identifier

Unique identifying numbers for this article in the Digital Library or other systems.

Publication Information

  • Publication Title: Proceedings of the 3rd International Workshop on Digital Language Archives: LangArc 2025
  • Page Start: 25
  • Page End: 29
  • Peer Reviewed: Yes

Relationships

Collections

This article is part of the following collection of related materials.

International Workshop on Digital Language Archives

This interactive workshop explores a broad scope of issues related to digital language archives—digital libraries that preserve and provide online access to language data. The collection includes proceedings and articles from the workshop.

Related Items

Proceedings of the International Workshop on Digital Language Archives: LangArc-2025 (Book)

Proceedings of the International Workshop on Digital Language Archives: LangArc-2025

Conference proceedings of the 3rd International Workshop on Digital Language Archives held on December 15-16, 2025 as part of the ACM/IEEE Joint Conference on Digital Libraries 2025. It includes 11 peer-reviewed papers that were presented at the workshop and an introduction from the workshop organizers.

Relationship to this item: (Is Part Of)

Proceedings of the International Workshop on Digital Language Archives: LangArc-2025, ark:/67531/metadc2543332

What responsibilities do I have when using this article?

When

Dates and time periods associated with this article.

Creation Date

  • December 30, 2025

Added to The UNT Digital Library

  • Jan. 13, 2026, 7:15 a.m.

Description Last Updated

  • Jan. 20, 2026, 4:32 p.m.

Usage Statistics

When was this article last used?

Yesterday: 2
Past 30 days: 5
Total Uses: 9

Interact With This Article

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

International Image Interoperability Framework

IIF Logo

We support the IIIF Presentation API

Boubehziz, Toufik; Koudoro-Parfait, Caroline & Lejeune, Gaël. Assessing the Impact of Image Resolution on OCR Transcription Accuracy, article, December 30, 2025; (https://digital.library.unt.edu/ark:/67531/metadc2543322/: accessed March 16, 2026), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT College of Information.

Back to Top of Screen