Authors: Gonzalo Sad; Lucas D. Terissi; Juan C. Gómez.

Resumen: This paper describes an isolated word speech recognition system based on audio-visual features. The inclusion of visual features related to mouth movements aims to improve the recognition rates, mainly on noisy audio conditions. The proposed system combines three classifiers based on audio, visual and audio-visual information, respectively. A Spanish audio-visual database is employed to test the proposed system. The experimental results show that a significant improvement is achieved when the visual information is considered. The structure of the proposed system allows to improve the recognition rates through a wide range of signal-to-noise ratios.

Production: Isolated Word Speech Recognition improvements based on the fusion of Audio, Video and Audio-Video Classifiers.

