Springer, 2016. — 134.
This book focuses on the area of cognitively inspired multimodal speech processing, and is the result of several years of research by the authors, which focuses on many disparate areas, such as cognitive inspiration, speech processing, image processing, and machine learning.
This book presents a novel two stage multimodal speech enhancement system, making use of both visual and audio information to filter speech, and explores the extension of this system with the use of fuzzy logic to demonstrate proof of concept for an envisaged cognitively inspired autonomous, adaptive, and context-aware multimodal system. The concept of single modality two stage filtering is extended to include the visual modality. Noisy speech information received by a microphone array is first pre-processed by visually derived Wiener filtering. This pre-processed speech is then enhanced further by audio only beamforming. This results in a system which is designed to function in challenging noisy speech environments (using speech sentences with different speakers from the GRID corpus and a range of noise recordings), and both objective and subjective test results show that this initial system is capable of delivering very encouraging results with regard to filtering speech mixtures in difficult reverberant speech environments. Some limitations of this initial framework are identified, and the extension of this multimodal system is explored, with the development of a fuzzy logic-based framework and a proof of concept demonstration implemented. Results show that this proposed autonomous, adaptive, and context-aware multimodal framework is capable of delivering very positive results in difficult noisy speech environments, with cognitively inspired use of audio and visual information, depending on environmental conditions.
This book is aimed at providing a comprehensive introduction to the field of cognitively inspired audiovisual speech processing. As there are many different facets of this field, including audio-only speech processing, image tracking, ROI extraction, and fuzzy logic, there are very few examples of research where all of these aspects are combined together to present a single comprehensive reference. This book therefore contains fully referenced and easily accessible summaries of all of these areas, along with an introduction into the cognitively inspired basis behind multimodal speech filtering, and combines them with research into a cognitively inspired speech processing system. We also include guidance to performing objective and subjective speech evaluations. It is hoped that those that are interested in this fascinating area can use the guides and the research presented in this book as the basis of their own research.
Audio and Visual Speech Relationship
The Research Context
A Two Stage Multimodal Speech Enhancement System
Experiments, Results, and Analysis
Towards Fuzzy Logic Based Multimodal Speech Filtering
Evaluation of Fuzzy Logic Proof of Concept
Potential Future Research Directions