Издательство Academic Press, 2010, -343 pp.
Over the last 10–15 years it became obvious that all/most human– human interactions, especially in the context of collaborative and creative activities, necessarily involved several communication senses, also referred to as ‘modes’ (hence the term ‘multimodal’). Complementary, often asynchronous (not carrying the same semantic information exactly at the same time), and interacting at different hierarchical processing (granularity) levels, these very sophisticated channels for communication and information exchanges include, amongst others, spoken and written language understanding, visual processing (face recognition, gestures and action recognition, etc.), non-verbal cues from video and audio, cognitive psychology (emotions), attention focus, postures, expressions. Human-to-human interaction is also constrained by implicit social rule parameters such as group behaviour and size, level of common understanding, and personality traits of the participants. Still, even if communication is not always optimal, humans are adept at integrating all those sensory channels, and fusing (even very noisy) information to meet the needs of the interaction.
Machines today, including state-of-the-art human–computer interaction systems, are less able to emulate this ability. Furthermore, it is also sometimes claimed that progress in single communication modes (like speech recognition and visual scene analysis) will necessarily rely on progress in those complementary communication modes. Based on this observation, it became quite clear that significant progress in human–computer interaction would necessarily require a truly multidisciplinary approach, driven by principled multimodal signal processing theories, aiming at analyzing, modelling, and understanding how to extract the sought information (to meet the needs of the moment) from multiple sensory channels.
The research areas covered by the present book are often at different levels of advancement. However, while significant progress has recently been made in the field of multimodal signal processing, we believe that the potential of the field ‘as a whole’ is still underestimated and also under-represented in the current literature.
This book is thus meant to give a sampled overview of what is possible today, while illustrating the rich synergy between various approaches of signal processing, machine learning, and social/human behaviour analysis/modelling. Given the breadth of this book, we also decided to reduce its technical depth, hence making it accessible to a larger audience, whilst also showing its potential usefulness in several application domains.
Part I Signal Processing, Modelling and Related Mathematical Tools.
Statistical Machine Learning for HCI.
Speech Processing.
Natural Language and Dialogue Processing.
Image and Video Processing Tools for HCI.
Processing of Handwriting and Sketching Dynamics.
Basic Concepts of Multimodal Analysis.
Multimodal Information Fusion.
Modality Integration Methods.
A Multimodal Recognition Framework for Joint Modality Compensation and Fusion.
Managing Multimodal Data, Metadata and Annotations: Challenges and Solutions.
Part III Multimodal Human–Computer and Human-to-Human Interaction.
Multimodal Input.
Multimodal HCI Output: Facial Motion, Gestures and Synthesised Speech Synchronisation.
Interactive Representations of Multimodal Databases.
Modelling Interest in Face-to-Face Conversations from Multimodal Nonverbal Behaviour.