Springer, 2013. — 358 p.
Intelligent Audio Analysis unites methods of audio signal processing and machine learning. Other terms exist for this field or sub-fields and might have been used instead, such as Computer Audition or Machine Listening—each of which is being used by partly different research communities with slightly different understanding of the core application field and the inventory of methods.
Besides Automatic Speech Recognition being researched since more than half a century, recently an increasing number of further speech and speaker characterisation tasks have been pursued in the literature. In addition, the younger field of Music Information Retrieval is growing and there is emerging interest in the computationally ‘intelligent’ analysis of general sound events. Fields of application comprise audio coding, edition, interaction, search, surveillance as well as coaching and entertainment applications.
This book first propagates a unified view on the multiplicity of resulting tasks. It further provides a broad overview of the field enriched by extensive recent research application examples mostly based on the author’s latest work. The focus thereby lies on realistic conditions and standardisation by open-source software implementations and comparative evaluations. The main goal is to increase robustness by temporary and innovative methods such as automated data-acquisition by semi-supervised learning, audio signal enhancement by non-negative matrix factorisation, systematic feature brute-forcing and application of memory-enhanced learning algorithms—for example in combination with graphical model structures. Machine-based recognition of speech, non-linguistic vocalisations and para-linguistic speaker states and traits serve as examples of application in the domain of speech processing. As for music processing, examples include blind separation of instruments, determination of tempo, metre and ballroom dance style, as well as analysis of musical key, chord progression and structure, next to estimation of music mood and singer traits. Finally, examples are complemented by the recognition of general sound events along with their emotional connotation.
In the outlook, avenues towards evolutionary, unsupervised and holistic audio-signal analysis are shown.
It is thus hoped that the book may find interest by the very broad and interdisciplinary range of researchers and practitioners in academia and industry reaching from engineering and computer science to the fields of speech, language, music and general audio science with their manifold sub-fields. It further addresses levels from early to very advanced level—obviously, though, not all details can be provided at any time, and further reading will be of help where the reader finds it most helpful for oneself.
Intelligent Audio Analysis: A Definition
Motivation, Aims, and Solutions
Structure of the Book
Intelligent Audio Analysis MethodsChain of Audio Processing
Audio Data
Audio Features
Audio Recognition
Audio Source Separation
Audio Enhancement and Robustness
Intelligent Audio Analysis ApplicationsApplications in Intelligent Speech Analysis
Applications in Intelligent Music Analysis
Applications in Intelligent Sound Analysis
Discussion
Vision
A openSMILE Standardised Feature Sets