Capstone Project

Back to listing
Group 2007-22 Status completed
Title Development of an Automatic Speech Recognition (ASR) System in Real-life Environments
Supervisor Dr. M. O. Ahmad
Description Automatic speech recognition (ASR) is the process of identifying the spoken words by a machine (or computer). It has growing applications in the area of security checking, voice driven user interfacing with a machine, dictation applications for commands and control, data entry, and document preparation, telephony without pressing buttons, speech to text for the people with problems in hearing. The objective of this project is to develop a limited vocabulary isolated word/digit recognition system under real-life noisy environments using the Hidden Markov Model (HMM). Basically it will be a speaker dependent ASR system. Students will first learn briefly about the basic terminologies in speech processing related to speech recognition. They will implement the HMM method with the help of widely used HMM toolkit, namely HTK, in order to recognize speech by estimating the likelihood of each phoneme where the likelihoods are computed using a Gaussian mixture model. Before testing on speech data the ASR system will be trained by having the speaker repeat standard words/digits. Training a recognizer usually improves its accuracy. Concentration will be given on the most important step in the HMM method, the feature extraction. Difficulties in ASR system strongly depend on the environmental noise. The mismatch in distributions of speech features between the testing and training environments in the presence of background noise could lead to a severe degradation in the recognition performance. Tremendous efforts have been contributed to noise robust speech recognition in the past nearly two decades due to its crucial role in the deployment of speech recognition systems in real-world applications. Students will investigate the effect of incorporating a speech enhancement block prior to feature extraction in the presence of background noise. In addition, they will incorporate recently proposed noise-robust features. As a real-life situation they will investigate the performance of the developed ASR system in some difficult cases like inside a crowded metro station or inside a cafeteria at a busy time. Experiments will be conducted on the real-life data. For simulations they will use standard database and also their own recorded speech.
Requirement Strong motivation to build a real-time speech recognition system. Basic programming knowledge in Matlab and C/C++ Background in Digital Signal Processing. The team will implement its system mainly with software and test the accuracy of the recognition system in real-life environments. Finally a GUI interface has to be built for real-time recognition system. Pre-requisite courses are: ELEC361: Signals and Systems ELEC 442: Digital Signal Processing ENGR371: Probability and Statistics in Engineering
Tools Matlab, C/C++, PC with activated soundcard, good-quality microphone and speakers
Number of Students 0
Students
Comments: Dr. M. O. Ahmad Professor Dept. of ECE Email: omair@ece.concordia.ca Room: S-EV5.107 Tel: 848-2424 Ext. 3075
Links: