Speech Recognition Using a Synthesized Codebook

PDF Version Also Available for Download.

Description

Speech sounds generated by a simple waveform synthesizer were used to create a vector quantization codebook for use in speech recognition. Recognition was tested over the TI-20 isolated word data base using a conventional DTW matching algorithm. Input speech was band limited to 300 - 3300 Hz, then passed through the Scott Instruments Corp. Coretechs process, implemented on a VET3 speech terminal, to create the speech representation for matching. Synthesized sounds were processed in software by a VET3 signal processing emulation program. Emulation and recognition were performed on a DEC VAX 11/750. The experiments were organized in 2 series. A ... continued below

Physical Description

x, 164 leaves: ill.

Creation Information

Smith, Lloyd A. (Lloyd Allen) August 1988.

Context

This dissertation is part of the collection entitled: UNT Theses and Dissertations and was provided by UNT Libraries to Digital Library, a digital repository hosted by the UNT Libraries. It has been viewed 27 times . More information about this dissertation can be viewed below.

Who

People and organizations associated with either the creation of this dissertation or its content.

Publisher

Rights Holder

For guidance see Citations, Rights, Re-Use.

  • Smith, Lloyd A. (Lloyd Allen)

Provided By

UNT Libraries

With locations on the Denton campus of the University of North Texas and one in Dallas, UNT Libraries serves the school and the community by providing access to physical and online collections; The Portal to Texas History and UNT Digital Libraries; academic research, and much, much more.

Contact Us

What

Descriptive information to help identify this dissertation. Follow the links below to find similar items on the Digital Library.

Degree Information

Description

Speech sounds generated by a simple waveform synthesizer were used to create a vector quantization codebook for use in speech recognition. Recognition was tested over the TI-20 isolated word data base using a conventional DTW matching algorithm. Input speech was band limited to 300 - 3300 Hz, then passed through the Scott Instruments Corp. Coretechs process, implemented on a VET3 speech terminal, to create the speech representation for matching. Synthesized sounds were processed in software by a VET3 signal processing emulation program. Emulation and recognition were performed on a DEC VAX 11/750.
The experiments were organized in 2 series. A preliminary experiment, using no vector quantization, provided a baseline for comparison.
The original codebook contained 109 vectors, all derived from 2 formant synthesized sounds. This codebook was decimated through the course of the first series of experiments, based on the number of times each vector was used in quantizing the training data for the previous experiment, in order to determine the smallest subset of vectors suitable for coding the speech data base. The second series of experiments altered several test conditions in order to evaluate the applicability of the minimal synthesized codebook to conventional codebook training.
The baseline recognition rate was 97%. The recognition rate for synthesized codebooks was approximately 92% for sizes ranging from 109 to 16 vectors. Accuracy for smaller codebooks was slightly less than 90%. Error analysis showed that the primary loss in dropping below 16 vectors was in coding of voiced sounds with high frequency second formants. The 16 vector synthesized codebook was chosen as the seed for the second series of experiments.
After one training iteration, and using a normalized distortion score, trained codebooks performed with an accuracy of 95.1%. When codebooks were trained and tested on different sets of speakers, accuracy was 94.9%, indicating that very little speaker dependence was introduced by the training.

Physical Description

x, 164 leaves: ill.

Language

Identifier

Unique identifying numbers for this dissertation in the Digital Library or other systems.

Collections

This dissertation is part of the following collection of related materials.

UNT Theses and Dissertations

Theses and dissertations represent a wealth of scholarly and artistic content created by masters and doctoral students in the degree-seeking process. Some ETDs in this collection are restricted to use by the UNT community.

What responsibilities do I have when using this dissertation?

When

Dates and time periods associated with this dissertation.

Creation Date

  • August 1988

Added to The UNT Digital Library

  • Aug. 22, 2014, 6 p.m.

Description Last Updated

  • Oct. 30, 2015, 2:04 p.m.

Usage Statistics

When was this dissertation last used?

Yesterday: 0
Past 30 days: 1
Total Uses: 27

Interact With This Dissertation

Here are some suggestions for what to do next.

Start Reading

PDF Version Also Available for Download.

Citations, Rights, Re-Use

Smith, Lloyd A. (Lloyd Allen). Speech Recognition Using a Synthesized Codebook, dissertation, August 1988; Denton, Texas. (digital.library.unt.edu/ark:/67531/metadc332203/: accessed December 17, 2017), University of North Texas Libraries, Digital Library, digital.library.unt.edu; .