Making Digital Collection Audio Visual Materials Accessible: Final Report Page: 4
7 p.View a full description of this report.
Extracted Text
The following text was automatically extracted from the image on this page using optical character recognition software:
An important part of the project involved evaluating the quality of machine generated and
human-interceded transcripts. This is an important area of consideration since machine
generated transcripts are becoming increasingly accurate and generally cost a fraction of what
human-generated ones do.
I evaluated two vendor services that provided automated transcriptions and one where the
vendor employees generate or correct machine transcriptions. Machine transcriptions tended
to have success rates of perhaps 85-95% depending on a variety of factors. Generally slow,
clearly enunciated, well-miked English speech performed well. Speakers with accents,
regionalisms, rapid pacing, or who sung, performed less well. One service was overly aggressive
at capturing stutter-starts ("ums") while the other took a less literal approach, cutting and
cleaning poor speech habits. Both had trouble with complex sentence structures and
punctuation. Inconsistencies in numbering and unique/special terms and proper nouns also
made it hard to consider these services as serious options for research-grade materials where
we would likely hope to achieve a 99% accuracy rate, but at 1/10 to 1/4 the price of a human
generated service, it may be worth considering for some collections. Both services generated
transcriptions within only a few minutes of their upload time, provided APIs, methods for batch
imports, file renaming and management, and both had capable, modern, through-the-web
editors for corrections and file export to webvtt standards. Human interceded services by
contrast returned files usually within a few days and with the exception where the source
media contained cross-talk, poor audio-quality, or contained non-English languages, returned
near-perfect results.
Part of my evaluation involved employing student workers to correct machine generated
content. In some ways the workflow is similar to metadata authoring practices we already use
in the Digital Projects Unit, but using different tools. Once corrected, the outputs appear to be
of a similar quality of those by vendors who provide human generated/corrected transcripts of
their own. A number of modern browser-based editors exist to generate transcription files,
either from scratch or from existing source files, and in the end a number of factors will likely
affect the price; in the end, I believe that the costs of employing student workers would be
slightly more expensive in the long run, but would come with certain advantages in limited
cases:
" First, at the present time, most transcription services are English-only or offer foreign
transcriptions at either a much higher rate or through machine-only generation with
varying quality. When considering foreign language, or as is the case with a number of
our oral histories, multi-lingual content, it might be wise to employ a bilingual Spanish-
speaking student who can transcribe and potentially translate content.
" Second, musical transcriptions are virtually impossible today; but some measure of
interpretation would be possible by experienced music library students/staff with access
to scores, libretti, etc. Local experts would be well suited to generate either lyric-based
transcriptions or add timestamped markers for movements, sections within pieces, etc.
Such a service would be largely novel in the world of digital libraries, and however4
Upcoming Pages
Here’s what’s next.
Search Inside
This report can be searched. Note: Results may vary based on the legibility of text within the document.
Tools / Downloads
Get a copy of this page or view the extracted text.
Citing and Sharing
Basic information for referencing this web page. We also provide extended guidance on usage rights, references, copying or embedding.
Reference the current page of this Report.
Hicks, William. Making Digital Collection Audio Visual Materials Accessible: Final Report, report, September 30, 2019; [Denton, Texas]. (https://digital.library.unt.edu/ark:/67531/metadc1616591/m1/4/: accessed July 18, 2024), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .