Springer, 2011. — 185 p.
A self-learning speech controlled system has been developed for unsupervised speaker identification and speech recognition. The benefits of a speech controlled device which identifies its main users by their voice characteristics are obvious: The human-computer interface may be personalized. New ways for interacting with a speech controlled system may be developed to simplify the handling of the device. Furthermore, speech recognition accuracy may be significantly improved if knowledge about the current user can be employed. The speech modeling of a speech recognizer can be improved for particular speakers by adapting the statistical models on speaker specific data. The adaptation scheme presented captures speaker characteristics after a very few utterances and transits smoothly to a highly specific speaker modeling. Speech recognition accuracy may be continuously improved. Therefore it is quite natural to employ speaker adaptation for a fixed number of users. Optimal performance may be achieved when each user attends a supervised enrollment and identifies himself whenever the speaker changes or the system is reset. A more convenient human-computer communication may be achieved if users can be identified by their voices. Whenever a new speaker profile is initialized a fast yet robust information retrieval is required to identify the speaker on successive utterances. Such a scenario presents a unique challenge for speaker identification. It has to perform well on short utterances, e.g. commands, even in adverse environments. Since the speech recognizer has a very detailed knowledge about speech, it seems to be reasonable to employ its speech modeling to improve speaker identification. A unified approach has been developed for simultaneous speech recognition and speaker identification. This combination allows the system to keep long-term adaptation profiles in parallel for a limited group of users. New users shall not be forced to attend an inconvenient and time-consumptive enrollment. Instead new users should be detected in an unsupervised manner while operating the device. New speaker profiles have to be initialized based on the first occurrences of a speaker. Experiments on the evolution of such a system were carried out on a subset of the SPEECON database. The results show that in the long run the system produces adaptation profiles which give continuous improvements in speech recognition and speaker identification rate. A variety of applications may benefit from a system that adapts individually to several users.
Fundamentals.
Combining Self-Learning Speaker Identification and Speech Recognition.
Combined Speaker Adaptation.
Unsupervised Speech Controlled System with Long-Term Adaptation.
Evolution of an Adaptive Unsupervised Speech Controlled System.
Summary and Conclusion.