Search Results

Hopfield Networks as an Error Correcting Technique for Speech Recognition

Description: I experimented with Hopfield networks in the context of a voice-based, query-answering system. Hopfield networks are used to store and retrieve patterns. I used this technique to store queries represented as natural language sentences and I evaluated the accuracy of the technique for error correction in a spoken question-answering dialog between a computer and a user. I show that the use of an auto-associative Hopfield network helps make the speech recognition system more fault tolerant. I also looked at the available encoding schemes to convert a natural language sentence into a pattern of zeroes and ones that can be stored in the Hopfield network reliably, and I suggest scalable data representations which allow storing a large number of queries.
Access: This item is restricted to the UNT Community Members at a UNT Libraries Location.
Date: May 2004
Creator: Bireddy, Chakradhar
Partner: UNT Libraries

The Effectiveness of Speech Recognition as a User Interface for Computer-Based Training

Description: Some researchers are saying that natural language is probably one of the most promising interfaces for use in the long term for simplicity of learning. If this is true, then it follows that speech recognition would be ideal as the interface for computer-based training (CBT). While many speech recognition applications are being used as a means for a computer interface, these are usually confined to controlling the computer or causing the computer to control other devices. The user input or interface has been the recipient of a strong effort to improve the quality of the communication between man and machine and is proposed to be a dominant factor in determining user productivity, performance, and satisfaction. However, other researchers note that full natural interfaces with computers are still a long way from being the state-of-the art with technology. The focus of this study was to determine if the technology of speech recognition is an effective interface for an academic lesson presented via CBT. How does one determine if learning has been affected and how is this measured? Previous research has attempted quantify a learning effect when using a variety of interfaces. This dissertation summarizes previous studies using other interfaces and those using speech recognition. It attempted to apply a framework used to measure learning effectiveness in some of these studies to quantify the measurement of learning when speech recognition is used as the sole interface. The focus of the study was on cognitive processing which affects short-term memory and in-turn, the effect on original learning (OL). The methods and procedures applied in an experimental study were presented.
Date: August 1995
Creator: Creech, Wayne E. (Wayne Everette)
Partner: UNT Libraries

Can You Hear Me Now? Benefits of Frequency-Modulated (FM) Systems for Adults and Children Using Cochlear Implants: A Meta-Analysis Approach [Presentation]

Description: Presentation for the 2007 University Scholars Day at the University of North Texas discussing research on the benefits of frequency-modulated (FM) systems for adults and children using cochlear implants.
Date: March 29, 2007
Creator: Kleineck, Mary Pat & Schafer, Erin
Partner: UNT Honors College

Automatic Speech Recognition Using Finite Inductive Sequences

Description: This dissertation addresses the general problem of recognition of acoustic signals which may be derived from speech, sonar, or acoustic phenomena. The specific problem of recognizing speech is the main focus of this research. The intention is to design a recognition system for a definite number of discrete words. For this purpose specifically, eight isolated words from the T1MIT database are selected. Four medium length words "greasy," "dark," "wash," and "water" are used. In addition, four short words are considered "she," "had," "in," and "all." The recognition system addresses the following issues: filtering or preprocessing, training, and decision-making. The preprocessing phase uses linear predictive coding of order 12. Following the filtering process, a vector quantization method is used to further reduce the input data and generate a finite inductive sequence of symbols representative of each input signal. The sequences generated by the vector quantization process of the same word are factored, and a single ruling or reference template is generated and stored in a codebook. This system introduces a new modeling technique which relies heavily on the basic concept that all finite sequences are finitely inductive. This technique is used in the training stage. In order to accommodate the variabilities in speech, the training is performed casualty, and a large number of training speakers is used from eight different dialect regions. Hence, a speaker independent recognition system is realized. The matching process compares the incoming speech with each of the templates stored, and a closeness ration is computed. A ratio table is generated anH the matching word that corresponds to the smallest ratio (i.e. indicating that the ruling has removed most of the symbols) is selected. Promising results were obtained for isolated words, and the recognition rates ranged between 50% and 100%.
Date: August 1996
Creator: Cherri, Mona Youssef, 1956-
Partner: UNT Libraries

Speech Recognition Using a Synthesized Codebook

Description: Speech sounds generated by a simple waveform synthesizer were used to create a vector quantization codebook for use in speech recognition. Recognition was tested over the TI-20 isolated word data base using a conventional DTW matching algorithm. Input speech was band limited to 300 - 3300 Hz, then passed through the Scott Instruments Corp. Coretechs process, implemented on a VET3 speech terminal, to create the speech representation for matching. Synthesized sounds were processed in software by a VET3 signal processing emulation program. Emulation and recognition were performed on a DEC VAX 11/750. The experiments were organized in 2 series. A preliminary experiment, using no vector quantization, provided a baseline for comparison. The original codebook contained 109 vectors, all derived from 2 formant synthesized sounds. This codebook was decimated through the course of the first series of experiments, based on the number of times each vector was used in quantizing the training data for the previous experiment, in order to determine the smallest subset of vectors suitable for coding the speech data base. The second series of experiments altered several test conditions in order to evaluate the applicability of the minimal synthesized codebook to conventional codebook training. The baseline recognition rate was 97%. The recognition rate for synthesized codebooks was approximately 92% for sizes ranging from 109 to 16 vectors. Accuracy for smaller codebooks was slightly less than 90%. Error analysis showed that the primary loss in dropping below 16 vectors was in coding of voiced sounds with high frequency second formants. The 16 vector synthesized codebook was chosen as the seed for the second series of experiments. After one training iteration, and using a normalized distortion score, trained codebooks performed with an accuracy of 95.1%. When codebooks were trained and tested on different sets of speakers, accuracy was 94.9%, indicating ...
Date: August 1988
Creator: Smith, Lloyd A. (Lloyd Allen)
Partner: UNT Libraries