Springer, 2012. — 83 p. The fast pace of the advancement in information and communications technology is reshaping our society and vastly increasing our capabilities for faster learning, higher achievements, and better and wider communication, in addition to more effective and productive collaboration among speech scientists and engineers. One of the important frontiers of...
PhD dissertation. — Carnegie Mellon University, 1990. — 153 p. This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different...
Kluwer, 1993. — 197 p. The need for automatic speech recognition systems to be robust with respect to changes in their acoustical environment has become more widely appreciated in recent years, as more systems are finding their way into practical applications. Although the issue of environmental robustness has received only a small fraction of the attention devoted to speaker...
PhD dissertation. — University of Cambridge, 1996. — 146 p. HMM-based speech recognition systems have recently demonstrated impressive recognition performance. Many of these systems attempt to provide low error rates for a large range of speakers. However, the performance of these speaker independent systems is generally inferior to speaker dependent systems trained for a...
Springer, 2011. — 163 p. Many of the things we think about, actions we take, the way we react to stimuli, generate a feeling or subjective experience, for example, an emotion, or a mood. The generic term used in the twentieth century psychology and philosophy literature to denote such an emotion or mood is an old, Middle English (fourteenth century) word affect. The outward...
Pergamon Press, 1976. — 149 p.
The study of speech is a multidisciplinary subject, and the topic of this book is no exception. The production of speech is properly the province of the anatomist and the physiologist, but in practice it has been studied mainly by the phonetician with help from the physicist. The sounds of speech have been classified by the phonetician, and...
Cambridge University Press, 2004. — 226 p. Although widely employed in image processing, the use of fractal techniques and the fractal dimension for speech characterization and recognition is a relatively new concept, which is now receiving serious attention. This book represents the fruits of research carried out to develop novel fractal-based techniques for speech and audio...
München: Lincom Europa, 2005. – 143 p.
This monograph describes an experiment in Forensic Speaker Identification, showing how speeches samples from the same speaker can be discriminated from speech from different speakers with acoustic features commonly used in forensic. It also explains what is now considered the legally and logically correct approach to Forensic Speaker...
Morgan & Claypool, 2005. — 136 p. Immediately following the Second World War, between 1947 and 1955, several classic papers quantified the fundamentals of human speech information processing and recognition. In 1947 French and Steinberg published their classic study on the articulation index. In 1948 Claude Shannon published his famous work on the theory of information. In 1950...
Диплом (Master), Mississippi State University, 2003. — 80 p. Spoken language processing is one of the oldest and most natural modes of information exchange between humans beings. For centuries, people have tried to develop machines that can understand and produce speech the way humans do so naturally. The biggest problem in our inability to model speech with computer programs...
PhD dissertation. — Griffith University, Brisbane, Australia, 2005. — 208 p. Incorporating information from the short-time phase spectrum into a feature set for automatic speech recognition (ASR) may possibly serve to improve recognition accuracy. Currently, however, it is common practice to discard this information in favour of features that are derived purely from the...
Bradford Book, 1995. — 549 p. The chapters in this book represent the outcome of a research workshop held at the Park Hotel Fiorelle, Sperlonga, 16- 20 May 1988. Twenty-five participants gathered in this small coastal village in Italy , where the Emperor Tiberius kept a Summer house, to discuss psycholinguistic and computational issues in speech and natural-language processing....
Springer, 1999. — 212 p. Automatic speech recognition and processing has received a lot of attention during the last decade. Prototypes for speech-to-speech translation are currently being developed that show first impressive results for this highly complex endeavor. They demonstrate that machines can actually be helpful in communicating information between persons speaking...
Springer, 1999. — 212 p. Automatic speech recognition and processing has received a lot of attention during the last decade. Prototypes for speech-to-speech translation are currently being developed that show first impressive results for this highly complex endeavor. They demonstrate that machines can actually be helpful in communicating information between persons speaking...
PhD dissertation. — Universitat Politècnica de Catalunya, 2006. — 348 p. This PhD thesis verses about the topic of speaker diarization for meetings. While answering to the question ``Who spoke when?'', the presented speaker diarization system is able to process a variable number of microphones spread around the meeting room and determine the optimum output without any prior...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2005. — 242 p. Nowadays, state-of-the-art automatic speaker recognition systems show very good performance in discriminating between voices of speakers under controlled recording conditions. However, the conditions in which recordings are made in investigative activities (e.g., anonymous calls and wire-tapping)...
Springer, 2015. — 72 p. This book presents state of art research in speech emotion recognition. Readers are first presented with basic research and applications – gradually more advance information is provided, giving readers comprehensive guidance for classify emotions through speech. Simulated databases are used and results extensively compared, with the features and the...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2008. — 159 p. In this thesis, we investigate the use of posterior probabilities of sub-word units directly as input features for automatic speech recognition (ASR). These posteriors, estimated from data-driven methods, display some favourable properties such as increased speaker invariance, but unlike conventional...
Springer, 1999. — 315 p.
This book is intended for researchers who want to keep abreast of current developments in corpus-based natural language processing. It is not meant as an introduction to this field; for readers who need one, several entry-level texts are available, including those of (Church and Mercer, 1993; Charniak, 1993; Jelinek, 1997).
This book captures the...
Springer, 1999. — 315 p.
This book is intended for researchers who want to keep abreast of current developments in corpus-based natural language processing. It is not meant as an introduction to this field; for readers who need one, several entry-level texts are available, including those of (Church and Mercer, 1993; Charniak, 1993; Jelinek, 1997).
This book captures the...
New York, USA: Routledge, 2019. — 289 p. — (Routledge Research in Language Education). — ISBN 978-1-138-73312-1. Количественные Данные Оценки Языка Quantitative Data Analysis for Language Assessment Volume I: Fundamental Techniques is a resource book that presents the most fundamental techniques of quantitative data analysis in the field of language assessment. Each chapter...
New York, USA: Routledge, 2020. — 260 p. — (Routledge Research in Language Education). — ISBN 978-1-138-73314-5. Количественные Данные Оценки Языка Quantitative Data Analysis for Language Assessment Volume II: Advanced Methods emonstrates advanced quantitative techniques for language assessment. The volume takes an interdisciplinary approach and taps into expertise from...
New York, USA: Routledge, 2020. — 260 p. — (Routledge Research in Language Education). — ISBN 978-1-138-73314-5. Количественные Данные Оценки Языка Quantitative Data Analysis for Language Assessment Volume II: Advanced Methods emonstrates advanced quantitative techniques for language assessment. The volume takes an interdisciplinary approach and taps into expertise from...
Springer, 1991. — 376 p. Speech coding has been an ongoing area of research for several decades, yet the level of activity and interest in this area has expanded dramatically in the last several years. Important advances in algorithmic techniques for speech coding have recently emerged and excellent progress has been achieved in producing high quality speech at bit rates as low...
Kluwer, 1993. — 267 p. This volume contains 34 chapters, loosely grouped into six topical areas. The chapters in this volume reflect the progress and present the state of the art in low bit rate speech coding primarily at bit rates from 2.4 kbit/s to 16 kbit/s. Together they represent important contributions from leading researchers in the speech coding community. The book...
Taylor&Francis, 1993. — 225 p. This text deals with two important technologies in human-computer interaction: computer generation of synthetic speech and computer recognition of human speech. These technologies are quite different and the ergonomics problems in implementation are also different. Nonetheless, synthetic speech and speech recognition are usually dealt with in the...
Springer, 2017. — 251 p. This book provides scientific understanding of the most central techniques used in speech coding both for advanced students as well as professionals with a background in speech audio and or digital signal processing. It provides a clear connection between the Why’s?, How’s?, and What’s, such that the necessity, purpose and solutions provided by tools...
Springer, 2013. — 74 p. The diagnosis and monitoring of many common neurological conditions routinely involve acoustic analysis of the subject’s speech by an expert clinician. There are two significant problems with this: one is that the analysis is time-consuming, hence expensive, and therefore often performed too infrequently, and the other is that the results of the analysis...
Springer, 2004. — 237 p. Spoken dialog systems allow people to get information, conduct business, and be entertained, simply by speaking to a computer. There are hundreds of these systems currently in use, handling millions of interactions every day. How do they work? What problems do they solve? The goal of this book is to answer these questions and others like them, including:...
EURASIP Journal on Audio, Speech, and Music Processing, 2009. — 66 p. The aim of this special issue is to provide a detailed description of state-of-the-art systems for animating faces during speech, and identify new techniques that have recently emerged from both the audiovisual speech and computer graphics research communities. This special issue is a followup to the first LIPS...
Cambridge: Cambridge University Press, 2012. — 508 p.
When we speak, we configure the vocal tract which shapes the visible motions of the face and the patterning of the audible speech acoustics. Similarly, we use these visible and audible behaviors to perceive speech. This book showcases a broad range of research investigating how these two types of signals are used in spoken...
Springer, 2005. — 203 p. The goal of this book is to present a discussion of the ideas arising from the European Special Event (ESE) on the Integration of Phonetic Knowledge in Speech Technology at Eurospeech 2001 in Aalborg. Where there is discussion, there must be unresolved questions, doubts must exist, integration is not a fait accompli. The different questions asked,...
PhD dissertation. — Massachusetts Institute of Technology, 2002. — 153 p. This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer...
Springer, 2011. — 1029 p. — ISBN10: 0387775919, ISBN13: 978-0387775913 When I was being interviewed at the handwriting recognition group of IBM T.J. Watson Research Center in December of 1990, one of the interviewers asked me why, being a mechanical engineer, I was applying for a position in that group. Well, he was an electrical engineer and somehow was under the impression...
New York: Springer, 2018. — 112 p. This book presents and develops several important concepts of speech enhancement in a simple but rigorous way. Many of the ideas are new; not only do they shed light on this old problem but they also offer valuable tips on how to improve on some well-known conventional approaches. The book unifies all aspects of speech enhancement, from single...
Springer, 2011. — 88 p. Signal enhancement is a fundamental topic of signal processing in general and of speech processing in particular [1]. In audio and speech applications such as cell phones, teleconferencing systems, hearing aids, human–machine interfaces, and many others, the microphones installed in these systems always pick up some interferences that contaminate the...
Springer, 2012. — 112 p. This work addresses this problem in the short-time Fourier transform (STFT) domain. We divide the general problem into five basic categories depending on the number of microphones being used and whether the interframe or interband correlation is considered. The first category deals with the single-channel problem where STFT coefficients at different...
Springer, 2015. — 113 p. This book is devoted to the study of the problem of speech enhancement whose objective is the recovery of a signal of interest (i.e., speech) from noisy observations.Typically, the recovery process is accomplished by passing the noisy observations through a linear filter (or a linear transformation). Since both the desired speech and undesired noise are...
Morgan & Claypool, 2011. — 112 p. This book is devoted to the study of the problem of speech enhancement whose objective is the recovery of a signal of interest (i.e., speech) from noisy observations. Typically, the recovery process is accomplished by passing the noisy observations through a linear filter (or a linear transformation). Since both the desired speech and undesired...
Springer, 2009. — 235 p. Noise is everywhere and in most applications that are related to audio and speech, such as human-machine interfaces, hands-free communications, voice over IP (VoIP), hearing aids, teleconferencing/telepresence/telecollaboration systems, and so many others, the signal of interest (usually speech) that is picked up by a microphone is generally...
Academic Press, 2014. — 138 p. Speech enhancement is a classical problem in signal processing, yet still largely unsolved. Two of the conventional approaches for solving this problem are linear filtering, like the classical Wiener filter, and subspace methods. These approaches have traditionally been treated as different classes of methods and have been introduced in somewhat...
Springer, 2005. — 415 p. We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc) that require at least one microphone, the signal of interest is usually contaminated by background noise and reverberation. As a result, the microphone signal has to be "cleaned" with digital signal processing tools...
PhD dissertation. — Swiss Federal Institute of Technology Lausanne, 2005. — 123 p. The goal of the thesis is to investigate different approaches that combine and integrate Automatic Speech Recognition (ASR) and Speaker Recognition (SR) systems, with applications to (1) User- Customized Password Speaker Verification (UCP-SV) systems, and, (2) joint speech and speaker...
PhD dissertation. — Columbia University, 2011. — 190 p. A fundamental challenge for current research on speech science and technology is understanding and modeling individual variation in spoken language. Individuals have their own speaking styles, depending on many factors, such as their dialect and accent as well as their socioeconomic background. These individual differences...
PhD dissertation. — Boston University, 1998. — 216 p. Acoustic modeling and analysis of speech based on phonetic features is explored in the current research for speaker-independent speech recognition. Phonetic features are minimal speech units that describe the manner and place of articulation of the sounds of a language. In this research, it is shown that phonetic features...
PhD dissertation. — Cambridge University, 1996. — 199 p. The past 15 years have seen dramatic improvements in the performance of computer algorithms which attempt to recognise human speech. The falling error rates achieved by the best speech recognition systems on limited tasks have recently enabled the development of a diverse range of applications which promise to have a sign...
Springer, 2020. — 808 p. — (Modern Acoustics and Signal Processing). — ISBN 978-3-030-00385-2. This book offers a computational framework for modeling active exploratory listening that assigns meaning to auditory scenes. Understanding auditory perception and cognitive processes involved with our interaction with the world are of high relevance for a vast variety of ICT systems...
Kluwer, 2000. — 397 p. As the title indicates, "Intonation: Analysis, Modelling and Technology" is a contribution to the study of prosody, with major emphasis on intonation. Intonation and tonal themes are thus the central object of the volume, although temporal and dynamic aspects are also taken into consideration by a good number of papers. Although tonal and prosodic...
Springer, 2009. — 228 p. The development of computer and telecommunication technologies led to a revolution in the way that people work and communicate with each other. One of the results is that large amount of information will increasingly be held in a form that is natural for users, as speech in natural language. In the presented work, we investigate the speech signal...
Kluwer, 1994. — 329 p. This book describes how large multi-layer perceptron networks containing more than 150,000 weights were trained and integrated into a state-of-the-art Hidden Markov Model (HMM) recognizer to provide improved acoustic-phonetic modeling and improved recognition accuracy. The lessons learned along the way form a case study which demonstrates how hybrid...
Wiesbaden: Springer, 2016. — 148 p. Almut Braun carried out forensic phonetic speaker identification experiments (voice lineups) with 306lay listeners. Blind listeners significantly outperformed sighted listeners when the speech recordings were presented in studio quality. For recordings in mobile phone quality or of whispering voices, blind and sighted listeners achieved...
MIT Press, 1990. — 854 p. Auditory Scene Analysis addresses the problem of hearing complex auditory environments, using a series of creative analogies to describe the process required of the human auditory system as it analyzes mixtures of sounds to recover descriptions of individual sounds. In a unified and comprehensive way, Bregman establishes a theoretical framework that...
Ellis Horwood Limited, 1987. — 282 p. An increased understanding of human speech comprehension is a major goal for research groups working in a number of closely related disciplines. We take the position that genuine advances in our understanding of speech comprehension will be based on explicit computational models of aspects of this process which yield predictions testable...
Springer, 2011. — 200 p. Many existing natural language and spoken language dialogue systems are either very limited in the scope of domain functionality or require a rather cumbersome interaction. With an increasing number of application domains, ranging from unified messaging to trip planning and appointment scheduling, it seems to be obvious that the current interfaces need...
Диплом (Master), Massachusetts Institute of Technology, 2000. — 65 p. The thesis discusses the development and evaluation of a word spotting understanding system within a spoken language system. A word spotting understanding server was implemented within the GALAXY [4] architecture using the JUPITER [3] weather information domain. Word spotting was implemented using a simple...
John Wiley, 2007. — 373 p. The Media Resource Control Protocol (MRCP) is a key enabling technology delivering standardised access to advanced media processing resources including speech recognisers and speech synthesisers over IP networks. MRCP leverages Internet and Web technologies such as SIP, HTTP, and XML to deliver an open standard, vendor-independent, and versatile...
PhD dissertation. — Cambridge University, 1996. — 202 p. This dissertation investigates some aspects of speech processing using linear models and single hidden layer neural networks. The study is divided into two parts which focus on speech modelling and speech classification respectively. The first part of the dissertation examines linear and nonlinear vocal tract models for...
PhD dissertation. — Massachusetts Institute of Technology, 1999. — 226 p. This thesis links processing in working memory to prosody in speech, and links different working memory capacities to different prosodic styles. It provides a causal account of prosodic differences and an architecture for reproducing them in synthesized speech. The implemented system mediates text-based...
Kluwer, 1998. — 249 p. This book is a revised version of my doctoral thesis which was submitted in April 1993. The main extension is a chapter on evaluation of the system described in Chapter 8 as this is clearly an issue which was not treated in the original version. This required the collection of data, the development of a concept for diagnostic evaluation of linguistic word...
Диплом (Master), Massachusetts Institute of Technology, 2008. — 103 p. Discriminative training for acoustic models has been widely studied to improve the performance of automatic speech recognition systems. To enhance the generalization ability of discriminatively trained models, a large-margin training framework has recently been proposed. This work investigates large-margin...
Springer, 2006. — 398 p. There is no question of the value of applying automatic speech recognition technology as one of the interaction tools between humans and different computational systems. There are many books on design standards and guidelines for different practical issues, such as Gibbon's book Handbook of Standards and Resources for Spoken Language System (1997) and...
Springer, 2010. — 351 p. In recent years spoken language research has been successful in establishing technology which can be used in various applications, and which has also brought forward novel research topics that advance our understanding of the human speech and communication processes in general. This book got started in order to collect these different trends together,...
PhD dissertation. — Massachusetts Institute of Technology, 1999. — 111 p. In this thesis, a method for designing a hierarchical speech recognition system at the phonetic level is presented. The system employs various component modules to detect acoustic cues in the signal. These acoustic cues are used to infer values of features that describe segments. Features are considered...
CRC Press, 2003. — 385 p. Approaches to the problems of designing speech and language processing algorithms for human machine communication used to be taken from the perspectives of linguistics and speech science, until the late 1970s. Due to the advances in computing and statistical modeling, data driven pattern recognition methods have become a fast moving research area during...
PhD dissertation. — Massachusetts Institute of Technology, 2009. — 185 p. Despite the proliferation of speech-enabled applications and devices, speech-driven human-machine interaction still faces several challenges. One of thesis issues is the new word or the out-of-vocabulary (OOV) problem, which occurs when the underlying automatic speech recognizer (ASR) encounters a word it...
Диплом (Master), Massachusetts Institute of Technology, 1996, -70 pp. The objective of this research is to investigate the use of a hierarchical framework for phonetic classification of speech. The framework is motivated by the observation that a wide variety of measurements may be needed to make phonetic distinctions among different types of speech sounds. The measurements...
Kluwer, 1987. — 278 p. It is well-known that phonemes have different acoustic realizations depending on the context. Thus, for example, the phoneme /t/ is typically realized with a heavily aspirated strong burst at beginning of a syllable as in the word Tom, but without a burst at the end of a syllable in a like cat. Variation such as this is often considered to be problematic...
Диплом (Master), Massachusetts Institute of Technology, 1997. — 86 p. The problem addressed by this research is the automatic construction of a model of the fundamental frequency (F 0 ) contours of a given speaker to enable the synthesis of new contours for use in Text-to-Speech synthesis. The parametric F 0 generation model designed by Fujisaki is used to analyze observed F 0...
Springer, 2010. — 352 p. More and more devices for human-to-human and human-to-machine communications, where sound pickup and rendering is necessary, require some sophisticated algorithms. This is due to the fact that the acoustic environment in which we live in and communicate is extremely challenging. The difficult problems encountered in this environment are very well known...
Диплом (Master), Middle East Technical University, 2003. — 115 p. This study aims to build a new language model that can be used in a Turkish large vocabulary continuous speech recognition system. Turkish is a very productive language in terms of word forms because of its agglutinative nature. For such languages like Turkish, the vocabulary size is far from being acceptable....
Springer, 2011. — 267 p. The telephony network broadly changed during the last decades with the intensive introduction of Voice over Internet Protocol (VoIP) technology and third generation mobile networks. These networks enable new transmission paradigms that affect the perceived quality of speech signals. The perceived characteristics of a speech signal transmitted by a VoIP...
Kluwer, 2001. — 328 p. Modern speech synthesis began in the 1950s with the development of electronic formant synthesisers, such as PAT (Parametric Artificial Talker) designed by Walter Lawrence in the UK and OVE designed by Gunnar Fant in Sweden. Many others followed and, with the widespread introduction of fast digital computers, became implemented as computer programs. The best...
PhD dissertation. — The University of British Columbia, 2002. — 119 p. Modern speech synthesizers use concatenated words and sub-word segments, such as diphones, to synthesize natural speech. Synthesizers available today can synthesize speech with only a limited selection of voices provided by the vendors. The voice segments (e.g. words & diphones) are often created using...
Springer, 2018. — 144 p. This book presents the consolidated acoustic data for all phones in Standard Colloquial Bengali (SCB), commonly known as Bangla, a Bengali language used by 350 million people in India, Bangladesh, and the Bengali diaspora. The book analyzes the real speech of selected native speakers of the Bangla dialect to ensure that a proper acoustical database is...
CRC Press, 2002. — 400 p. A wide range of potential sources of noise and distortion can degrade the quality of the speech signal in a communication system. Noise Reduction in Speech Applications explores the effects of these interfering sounds on speech applications and introduces a range of techniques for reducing their influence and enhancing the acceptability, intelligibility,...
Plenum Press, 1983. — 505 p. The work reported in this book results from years of research oriented toward the goal of making an experimental model capable of understanding spoken sentences of a natural language. This is, of course, a modest attempt compared to the complexity of the functions performed by the human brain. A method is introduced for conceiving modules performing...
John Wiley, 2005. — 273 p. In many situations, the dialogue between two human beings seems to be performed almost effortlessly. However, building a computer program that can converse in such a natural way with a person, on any task and under any environmental conditions, is still a challenge. One reason why is that a large amount of different types of knowledge is involved in...
IEEE/Wiley-Interscience, 2000. — 1041 p. Purposes and Scope. The purposes of this book are severalfold. Principally, of course, it is intended to provide the reader with solid fundamental tools and sufficient exposure to the applied technologies to support advanced research and development in the array of speech processing endeavors. As an academic instrument, however, it may also...
PhD dissertation. — Katholieke Universiteit Leuven, 2001. — 197 p. The task of a speech recogniser is to transcribe human speech into text. To do so, modern recognisers rely firmly on the principles of statistical pattern recognition. This statistical framework allows the problem of speech recognition to be decomposed into a set of well-defined sub-tasks, namely the extraction...
Morgan & Claypool, 2006. — 118 p. Speech dynamics refer to the temporal characteristics in all stages of the human speech communication process. This speech chain starts with the formation of a linguistic message in a speaker’s brain and ends with the arrival of the message in a listener’s brain. Given the intricacy of the dynamic speech process and its fundamental importance...
PhD dissertation. — Mississippi State University, 1999. — 95 p. The ability to correctly pronounce names of entities, such as people, places and organizations, is a critical component of effective verbal communication. In many situations, such as looking up information on a person or a place (e.g. airline reservations, directory assistance etc.), it is customary to alternate...
Academic Press, 2019. — 199 p. — ISBN: 978-0-12-818130-0. This book investigates the utilization of speech analytics across several systems and real-world activities, including sharing data analytics, creating collaboration networks between several participants, and implementing video-conferencing in different application areas. Chapters focus on the latest applications of...
New York: Academic Press, 2021. — 191 p. Applied Speech Processing: Algorithms and Case Studies is concerned with supporting and enhancing the utilization of speech analytics in several systems and real-world activities, including sharing data analytics related information, creating collaboration networks between several participants, and the use of video-conferencing in...
John Wiley & Sons, Inc., 2013. — 384 p. — 3rd Edition. На англ. языке. Fully updated for the latest speech recognition tools and features, this bestselling guide helps you conquer Dragon NaturallySpeaking and gets you started creating documents, sending e-mail, searching the web, and more using only your voice. You?ll learn Dragon basics like dictation, formatting, and...
John Wiley & Sons, Inc., 2013. — 384 p. — 3rd Edition. На англ. языке. Fully updated for the latest speech recognition tools and features, this bestselling guide helps you conquer Dragon NaturallySpeaking and gets you started creating documents, sending e-mail, searching the web, and more using only your voice. You?ll learn Dragon basics like dictation, formatting, and...
PhD dissertation. — Brown University, 2000. — 122 p. A combination of microphone arrays and sophisticated signal processing has been applied to the remote acquisition of high-quality speech audio. These applications all exploit the spatial filtering ability of an array, which allows the speech signal from one talker to be enhanced as the signals from other talkers and unwanted...
Kluwer, 2005. — 327 p. There is a serious problem in the recognition of sounds. It derives from the fact that they do not usually occur in isolation but in an environment in which a number of sound sources (voices, traffic, footsteps, music on the radio, and so on) are active at the same time. When these sounds arrive at the ear of the listener, the complex pressure waves...
IOS Press, 2006. — 389 p. That speech is a dynamic process strikes as a tautology: whether from the standpoint of the talker, the listener, or the engineer, speech is an action, a sound, or a signal continuously changing in time. Yet, because phonetics and speech science are offspring of classical phonology, speech has been viewed as a sequence of discrete events-positions of...
Springer, 2017. — 77. In the few last years, we saw the rise of practical speech recognition applications, which work well in English and a few other languages. There is no doubt that this trend will continue and a more natural interaction between humans and technology will become part of our lives. Language is one of the most important components of one’s culture and identity....
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2005. — 196 p. Standard hidden Markov model (HMM) based automatic speech recognition (ASR) systems usually use cepstral features as acoustic observation and phonemes as subword units. Speech signal exhibits wide range of variability such as, due to environmental variation, speaker variation. This leads to different...
Диссертация, Katholieke Universiteit Leuven, 1998. — 218 p. In general the aim of an automatic speech recognition system is to write down what is said. State of the art continuous speech recognition systems for large vocabulary - for an entire language - consist of four basic modules: the signal processing, the acoustic modelling, the language modelling and the search engine....
Диссертация (Master), Massachusetts Institute of Technology, 1992, -97 pp.
The Viterbi search is an important but computationally expensive algorithm for speech recognition. Even with the substantial advances expected in processor technology the massive computational resources required will remain prohibitive for operation of a speech recognition system in real time. This...
Springer, 1997. — 306 p. The field of speech synthesis has secn a large increase in commercial applications in the last ten years. As recently as 1986, there were only a few companies in the synthesis market, all exploiting one of two basic technologies-either formant-based phonemic synthesis or LPC-based diphone synthesis. While these approaches still form the basis of most...
Springer, 2008. — 305 p. This book has its point of departure in courses held at the Tenth European Language and Speech Network (ELSNET) Summer School on Language and Speech Communication which took place at NISLab in Odense, Denmark, in July 2002. The topic of the summer school was Evaluation and Assessment of Text and Speech Systems. Nine (groups of) lecturers contributed to...
Springer, 2008. — 338 p. — (Text, Speech and Language Technology Series 39). This book edition highlights recent trends and important issues that still remain only partially solved or even unsolved within the broad field of discourse and dialogue. The field is discussed and illustrated both from an overall spoken (multimodal) dialogue system perspective as well as from a more...
CMP Books, 2001. — 338 p. In the summer of 2000, I came across the VoiceXML 1.0 standard published by the VoiceXML Forum. I downloaded the specification and began to read it. I had been working on software development in computer telephony for more than 10 years, but I was completely baffled; I couldn't understand most of the specification. I had no idea what the motivation or...
Springer Science+Business Media, 2012. — 120 p. — ISBN 978-1-4614-1905-1, e-ISBN 978-1-4614-1906-8 This book describes novel approaches to improve automatic speech recognition for dialectal Arabic. Since the existing dialectal Arabic speech resources, that are available for the task of training speech recognition systems, are very sparse and are lacking quality, we describe how...
Springer, 2011. — 137 p. I know what you are asking yourself – there are a lot of books available in speech processing, what is novel in this book? Well, I can summarize the answer for this question in the following points: You always see different algorithms for speech enhancement, deconvolution, signal separation, watermarking, and encryption, separately, without specific...
Диплом (Magister), Universität Wien, 2009, -126 pp. In recent years the number of court cases involving speech recordings of suspects as evidence, for example taken from telephone conversations, has seen a substantial increase. Forensic speech evidence is expected to gain even more importance, as speech communication technologies have become ubiquitous. Likewise the role of...
John Wiley, 2013. — 355 p. This book came about as a result of the standing-room-only special session on crowdsourcing for speech processing at Interspeech 2011. There has been a great amount of interest in this new technique as a means to solve some persistent issues. Some researchers dived in head first and have been using crowdsourcing for a few years by now. Others waited...
Springer, 2016. — 288 p. This volume brings together through a peer-revision process advanced research results obtained on nonlinear speech processing, following the tradition initiated by the European COST Action 277: “Nonlinear Speech Processing” (http://www.cost. eu/COST_Actions/ict/277). The research published in this book was discussed for the first time at the 7th edition...
Kluwer, 2004. — 333 p. This is a collection of articles spanning half a century of speech research. It started at the Ericsson Telephone Company in Stockholm, 1946-1949. The following two years were spent at MIT. In 1951 a small research group was established at the KTH in Stockholm. This unit, the Speech Transmission Laboratory, became the foundation for our present department...
Springer, 2014. — 53 p. As the wavelets gain wide applications in different fields, especially within the signal processing realm, this chapter will provide a survey on widespread employing of wavelets analysis in different applications of speech processing. Many speech processing algorithms and techniques still lack some sort of robustness which can be improved through the use...
NY: Springer International Publishing, 2014. — 53 p.
This book provides a survey on wide-spread of employing wavelets analysis in different applications of speech processing. The author examines development and research in different applications of speech processing. The book also summarizes the state of the art research on wavelet in speech processing.
2nd Ed. — Springer, 2017. — 96 p. — (SpringerBriefs in Electrical and Computer Engineering). — ISBN10: 3319690019, 13 978-3319690018. This new edition provides an updated and enhanced survey on employing wavelets analysis in an array of applications of speech processing. The author presents updated developments in topics such as; speech enhancement, noise suppression, spectral...
2nd Ed. — Springer, 2017. — 115 p. — (SpringerBriefs in Electrical and Computer Engineering). — ISBN10: 3319690019, 13 978-3319690018. This new edition provides an updated and enhanced survey on employing wavelets analysis in an array of applications of speech processing. The author presents updated developments in topics such as; speech enhancement, noise suppression, spectral...
Диссертация, Universitat Politècnica de Catalunya, 2008. — 156 p. Automatic speaker recognition is the use of a machine to identify an individual from a spoken sentence. Recently, this technology has been undergone an increasing use in applications such as access control, transaction authentication, law enforcement, forensics, and system customisation, among others. One of the...
2. Auflage. — Springer Vieweg, 2013. — xv, 398 S. — ISBN: 978-3-642-31502-2, ISBN: 978-3-642-31503-9. Klassiker der Sprachverarbeitung auf dem neuesten Stand der Technik, der neben theoretischen Grundlagen stets auch den Anwendungsbezug herstellt Mit neuen Kapiteln zu den Grundzügen der Signalanalyse sowie Sprachdialogsystemen Elektronisches Zusatzmaterial steht auf...
PhD dissertation. — Daimler-Benz AG, 1998. — 181 p. This thesis deals with the problem of Out-Of-Vocabulary words in speech recognition. The standard response of speech recognition systems whenever they encounter such OOV words is to (silently) misrecognize them without issuing any warning to the user. In order to avoid this undesired behaviour, two different strategies are...
PhD dissertation. — Johns Hopkins University, 2009. — 317 p. The output of a speech recognition system is often not what is required for subsequent processing, in part because speakers themselves make mistakes (e.g. stuttering, self-correcting, or using filler words). A system would accomplish speech reconstruction of its spontaneous speech input if its output were to...
Springer, 1972. — 446 p. Второе, дополненное издание монографии Джеймса Флэнагана "Анализ, синтез и восприятие речи" (первое издание, 1965 года, было переведено на русский в 1968 году издательством "Связь") Для изучающих обработку речевых сигналов.
Springer, 2017. — 109 p. Speech communication assumes a dominant role in how we communicate, and it is nowadays available to support interaction with machines in a wide range of scenarios, ranging from personal assistants for smartphones to home entertainment. While in many circumstances audible speech may suffice, there are a multitude of scenarios for which it is inadequate...
Cambridge: Cambridge University Press, 2012. - 155 p.
The mechanism of speech is a very complex one and in order to undertake any analysis of language it is important to understand the processes that go to make up the message that a speaker transmits and a listener receives. Professor Fry therefore first takes the reader through the various stages of the speech chain: from...
Springer, 2011. — 221 p. The analysis and measurement of the spectrum of a speech signal is one of the most important areas of sound signal processing for a number of fields, yet it is not an area to which a book has been specifically devoted. The accurate determination of the speech spectrum is commonly pursued in diverse areas including speech processing, recognition, and...
A study of digital speech processing, synthesis and recognition. This edition contains sections on the international standardization of robust and flexible speech coding techniques, waveform unit concatenation-based speech synthesis, large vocabulary continuous-speech recognition based on statistical pattern recognition, and more.
Second Edition, Revised and Expanded. — Marcel Dekker, 2001. — 477 p. More than a decade has passed since the first edition of Digital Speech Processing, Synthesis, and Recognition was published. The book has been widely used throughout the world as both a textbook and a reference work. The clear need for such a book stems from the fact that speech is the most natural form of...
Презентация доклада. 43 стр. Содержание/Outline Fundamentals of automatic speech recognition Acoustic modeling Language modeling Database (corpus) and task evaluation Transcription and dialogue systems Spontaneous speech recognition Speech understanding Speech summarization Summary (Annotation) Speech recognition technology has made significant progress with many potential...
Marcel Dekker, 1992. — 871 p. This book originated in an invitation from Marcel Dekker, Inc., to put together a book of original articles on various aspects of speech signal processing. After discussing the possible scope of such a book with several of our colleagues, we decided that the chapters should stress the advances during the past five to ten years. The past decade has...
Lausanne: Frontiers Media SA, 2020. — 310 p. Spoken language is conveyed via well-coordinated speech movements, which act as coherent units of control referred to as gestures. These gestures and their underlying movements show several distinctive features. However, currently, no existing theory successfully accounts for all properties of these movements. Even though models in...
NOWPress, 2007. — 24 p. — (Foundations and Trends in Signal Processing). Hidden Markov Models (HMMs) provide a simple and effective framework for modelling time-varying spectral vector sequences. As a consequence, almost all present day large vocabulary continuous speech recognition (LVCSR) systems are based on HMMs. Whereas the basic principles underlying HMM-based LVCSR are...
PhD dissertation. — University of Cambridge, 1995. — 132 p. This thesis details the development of a model-based noise compensation technique, Parallel Model Combination (PMC). The aim of PMC is to alter the parameters of a set of Hidden Markov Model (HMM) based acoustic models, so that they reflect speech spoken in a new acoustic environment. Differences in the acoustic...
Springer, 2014. — 198. The automatic detection of people’s identity from their voices is part of modern telecommunication services. This generally requires the telephone transmission of the speech to remote servers that perform the recognition task. The transmission may introduce severe distortions that degrade the system performance and hence represents one of the major...
PhD dissertation. — Mississippi State University, 2002. — 200 p. Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm...
Springer, 2011. — 125 p. The preparation of the present brief book was motivated by the significant and long-standing interest of the speech processing community to short-time cepstrum-based parameterization of speech. In approximately 100 pages, this volume brings together relevant information about 11 speech parameterization techniques and some of their variants that emerged...
Springer, 2008. — 483 p. Years ago when speech technology was younger, the designers of telephony-based speech recognition applications discovered something interesting. If human factors design, now often called user interface design, is applied to the prompts and flow of these applications, the result is improved system performance. Previously, nearly the only path of performance...
PhD dissertation. — Purdue University, 2013. — 146 p. The areas of mispronunciation detection (or accent detection more specifically) within the speech recognition community are receiving increased attention now. Two application areas, namely language learning and speech recognition adaptation, are largely driving this research interest and are the focal points of this work....
PhD dissertation. — University of York, 2014. — 323 p. The research presented in this thesis examines the calculation of numerical likelihood ratios using phonetic and linguistic parameters derived from a corpus of recordings of speakers of Southern Standard British English. The research serves as an investigation into the development of the numerical likelihood ratio as a...
CRC Press, 2000. — 247 p. Всеобъемлющее описание алгоритмов и методов кодирования речи. Детали реализации этих алгоритмов в распространенных речевых кодеках. Speech Production The Speech Chain Articulation Excitation Vocal Tract Phonemes Source-Filter Model Speech Analysis Techniques Sampling the Speech Waveform Systems and Filtering Z-Transform Fourier Transform Discrete...
Springer, 2014. — 188 p. The most of the applications of digital speech processing deal with speech or speaker pattern recognition. To understand the practical implementation of the speech or speaker recognition techniques, there is the need to understand the concepts of digital speech processing and the pattern recognition. This book aims in giving the balanced treatment of...
Springer, 2002. — 134 p. Speech recognition technology is being increasingly employed in humanmachine interfaces. Two of the key problems affecting such technology, however, are its robustness across different speakers and robustness to non-native accents, both of which still create considerable difficulties for current systems. In this book methods to overcome these problems...
PhD dissertation. — University of Szeged, Hungary, 2010. — 153 p. Even from the beginning of speech recognition technology two aspects proved to be very important, and perhaps the two most important ones. The first one was a goal: to recognize as the word or sentence spoken as accurately as possible has evidently a high focus as this is the purpose of the whole speech...
PhD dissertation. — Université de Neuchâtel, 1998. — 228 p. Several speech processing applications such as digital hearing aids and personal communications devices are characterized by very tight requirements in power consumption, size, and voltage supply. These requirements are difficult to fulfill, given the complexity and number of functions to be implemented, together with...
Now Publishers, 2010. — 152 p. — (Foundations and Trends in Signal Processing). In December 1974 the first real-time conversation on the ARPAnet took place between Culler-Harrison Incorporated in Goleta, California, and MIT Lincoln Laboratory in Lexington, Massachusetts. This was the first successful application of real-time digital speech communication over a packet network and...
Springer, 2004. — 487 p. Springer Handbook of Auditory Research. Volume 18 Although our sense of hearing is exploited for many ends, its communicative function stands paramount in our daily lives. Humans are, by nature, a vocal species and it is perhaps not too much of an exaggeration to state that what makes us unique in the animal kingdom is our ability to communicate via the...
InTech, 2007. — 470 p.
Digital speech processing is a major field in current research all over the world. In particular for automatic speech recognition (ASR). Very significant achievements have been made since the first attempts of digit recognizers in the 1950’s and 1960’s when spectral resonances were determined by analogue filters and logical circuits. As prof. Furui...
Monografija. — Ljubljana: Znanstvenoraziskovalni center Slovenske akademije znanosti in umetnosti (ZRC SAZU), 2000. — 149 s. — (Linguistica et philologica, 3; ISSN 2712-2689). — ISBN 961-6358-21-9. Monografija ''Samodejno tvorjenje govora'' iz besedilje predelana doktorska disertacija, ki je bila izvedena na Fakulteti za elektrotehniko v Ljubljani. Prvotno besedilo doktorske...
Monografija. — Ljubljana: Znanstvenoraziskovalni center Slovenske akademije znanosti in umetnosti (ZRC SAZU), 2000. — 149 s. — (Linguistica et philologica, 3; ISSN 2712-2689). — ISBN 961-6358-21-9. Monografija ''Samodejno tvorjenje govora'' iz besedilje predelana doktorska disertacija, ki je bila izvedena na Fakulteti za elektrotehniko v Ljubljani. Prvotno besedilo doktorske...
Springer, 2011. — 125 p. Automatic speech recognition systems are increasingly applied for modern communication. One example are call centers, where speech recognition based systems provide information or help sorting customer queries in order to forward them to the according experts. The big advantage of those systems is that the computers can be online 24 h a day to process...
PhD dissertation. — University of Cambridge, 2001. — 136 p. Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In many cases the...
PhD dissertation. — Massachusetts Institute of Technology, 1998. — 173 p. The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, fixed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be significantly improved through a...
PhD dissertation. — Mississippi State University, 2000. — 49 p. Progress on speech recognition technology has been impressive. There are now commercial products that allow automatic dictation, telephone voice interfaces, and voice activated appliances. It has been ovTwo to three paragraphs explaining the state of speech research today melding into how the important role that...
PhD dissertation. — Mississippi State University, 2002. — 85 p The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement the Gaussian emission probabilities under the belief that...
Kluwer, 1990. — 454 p. Speech sound production is one of the most complex human activities: it is also one of the least well understood. This is perhaps not altogether surprising as many of the complex neurological and physiological processes involved in the generation and execution of a speech utterance remain relatively inaccessible to direct investigation, and must be inferred...
McGraw-Hill, 2003. — 338 p. The focus of this book is the narrow question of how to assess quality of packet-switched voice services in general and VoIP services in particular. The approach taken in answering this vexing question is one that I have exploited to very good effect in more than 35 years’ working in the general area of test and evaluation of telecommunications...
Blackwell, 2010. — 279 p. In undergraduate courses that include phonetics, students typically acquire skills both in ear-training and an understanding of the acoustic, physiological, and perceptual characteristics of speech sounds. But there is usually less opportunity to test this knowledge on sizeable quantities of speech data partly because putting together any database that is...
Kluwer, 1999. — 328 p. This book is the development of a series of lectures to undergraduate and postgraduate students at Macquarie University on basic principles in acoustic phonetics and speech signal processing. The first part of the book (Chapters 1 to 4) is intended to provide students with the ability to interpret acoustic records of speech signals in their various forms....
O’Reilly Media, Inc., 2013. — 242 p. Go under the hood of an operating Voice over IP network, and build your knowledge of the protocols and architectures used by this Internet telephony technology. With this concise guide, you’ll learn about services involved in VoIP and get a first-hand view of network data packets from the time the phones boot through calls and subsequent...
O’Reilly Media, 2013. — 242 p. Go under the hood of an operating Voice over IP network, and build your knowledge of the protocols and architectures used by this Internet telephony technology. With this concise guide, you’ll learn about services involved in VoIP and get a first-hand view of network data packets from the time the phones boot through calls and subsequent...
PhD dissertation. — Massachusetts Institute of Technology, 1998. — 181 p. This dissertation addresses the independence of observations assumption which is typically made by today's automatic speech recognition systems. This assumption ignores within-speaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker...
Morgan & Claypool, 2008. — 121 p. In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective...
Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers’ Utterances in Spoken Dialog by Peter Anthony Heeman
University of Rochester, Rochester, New York. 1997
Abstract
Interactive spoken dialog provides many new challenges for natural language understanding
systems. One of the most critical challenges is simply determining the speaker’s...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2010. — 214 p. Conventional speech recognition systems are based on Gaussian hidden Markov models (HMMs). Discriminative techniques such as log-linear modeling have been investigated in speech recognition only recently. This thesis establishes a log-linear modeling framework in the context of discriminative...
Springer, 2013. — 227 p. One of the main reasons for the complexity of spoken dialogue systems (SDSs) development constitutes the multi-domain and thus the multi-topic nature of reallife processes. If the application domain is not clearly defined collecting a corpus or establishing valid rules to control the dialogue flow of the SDS becomes a complex task. Within the framework...
Springer, 2013. — 301 p. The book covers a wide range of disciplines related to speech and language and vocal communication in animals. In Part I, the first chapter deals with the current state of understanding of the neurology of speech and language in terms of brain substrates, representation, and theoretical models. The second chapter is a review of what is known about the...
Springer, 2008. — 445 p. Cost reduction is of increasing importance for medium and large enterprises. Seen in this context, Interactive Voice Response (IVR) systems are becoming more and more significant. IVR systems can help to automate business processes as for example in call centers, which are now a growing market for IVR systems. Automatic speech recognition (ASR) is the...
Springer, 2011. — 185 p. A self-learning speech controlled system has been developed for unsupervised speaker identification and speech recognition. The benefits of a speech controlled device which identifies its main users by their voice characteristics are obvious: The human-computer interface may be personalized. New ways for interacting with a speech controlled system may...
Springer, 1983. — 713 p. Pitch (i.e., fundamental frequency F 0 and fundamental period T 0 ) occupies a key position in the acoustic speech signal. The prosodic information of an utterance is predominantly determined by this parameter. The ear is more sensitive to changes of fundamental frequency than to changes of other speech signal parameters by an order of magnitude. The...
PhD dissertation. — Massachusetts Institute of Technology, 1995. — 173 p. This thesis is directed toward the characterization of the problem of new out-of-vocabulary words for continuous-speech recognition and understanding. It is motivated by the belief that this problem is critical to the eventual deployment of the technology and that a thorough understanding of the problem...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2004. — 170 p. This work describes an algorithm to increase the noise robustness of automatic speech recognition systems. In many practical applications recognition systems have to work in adverse acoustic environment conditions. Distortions and noises caused by the transmission are typical for telephone...
PhD dissertation. — University of Washington, 2008. — 149 p. Increasing amounts of easily available electronic data are precipitating a need for automatic processing that can aid humans in digesting large amounts of data. Speech and video are becoming an increasingly significant portion of on-line information, from news and television broadcasts, to oral histories, on-line...
Springer, 2017. — 170 p. Text-to-Speech (TTS) synthesis, i.e., artificially produced speech, has finally attained a quality level that makes it possible to include it into ordinary services that are used by common people. With the increasing processing power of smartphones and the development of intelligent personal assistants like Siri, Cortana, and Google Now, synthetic...
Springer, 2015. — 212 p. The volume addresses issues concerning prosody generation in speech synthesis, including prosody modeling, how we can convey para- and non-linguistic information in speech synthesis, and prosody control in speech synthesis (including prosody conversions). A high level of quality has already been achieved in speech synthesis by using selection-based...
PhD dissertation. — Helsinki University of Technology, 2009. — 66 p. Automatic speech recognition systems are devices or computer programs that convert human speech into text or make actions based on what is said to the system. Typical applications include dictation, automatic transcription of large audio or video databases, speech-controlled user interfaces, and automated...
Springer, 2012. — 109 p. Speech production and perception, man’s most widely used means of communication, has been the subject of research and intense study for more than 10 decades. Conventional theories of speech production are based on linearization of pressure and volume velocity relations and the speech production system is modeled as a linear source-filter model. This...
2nd edition. — Taylor & Francis, 2001. — 317 p. As information technology continues to make more impact on many aspects of our daily lives, the problems of communication between human beings and informationprocessing machines become increasingly important. Up to now such communication has been almost entirely by means of keyboards and screens, but there are substantial...
Morgan & Claypool, 2013. — 164 p. This book introduces the theory, algorithms, and implementation techniques for efficient decoding in speech recognition mainly focusing on the Weighted Finite-State Transducer (WFST) approach. The decoding process for speech recognition is viewed as a search problem whose goal is to find a sequence of words that best matches an input speech...
Kluwer, 2000. — 359 p. The study of prosody is perhaps the area of speech research which has undergone the most noticeable development during the past ten to fifteen years. As an indication of this, one can note, for example, that at the latest International Conference on Spoken Language Processing in Philadelphia (October 1996), there were more sessions devoted to prosody than...
PhD dissertation. — Massachusetts Institute of Technology, 2000. — 200 p. Lexical Access From Features (LAFF) is a proposed knowledge-based speech recognition system which uses landmarks to guide the search for distinctive features. The first stage in LAFF must find Vowel landmarks. This task is similar to automatic detection of syllable nuclei (ASD). This thesis adapts and...
PhD dissertation. — Stanford University, 2006. — 202 p. In a natural environment, speech often occurs simultaneously with acoustic interference. Many applications, such as automatic speech recognition and telecommunication, require an effective system that segregates speech from interference in the monaural (one-microphone) situation. While this task of monaural speech...
PhD dissertation. — University of Florida, 2012. — 207 p. Advanced signal processing techniques can help us well analyze signals of interests and perform proper operations on signals of interests for many useful applications. In this dissertation, we aim at developing signal processing techniques for speaker recognition (e.g. feature extraction, classifier design) and for...
Prentice Hall, 2001. — 965 p. Recognition and understanding of spontaneous unrehearsed speech remains an elusive goal. To understand speech, a human considers not only the specific information conveyed to the ear, but also the context in which the information is being discussed. For this reason, people can understand spoken language even when the speech signal is corrupted by...
PhD dissertation. — Universiteit Twente, 2008. — 184 p. In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions of the audio to be...
PhD dissertation. — University of Miami, 2009. — 169 p. Emotion conveys the psychological state of a person. It is expressed by a variety of physiological changes, such as changes in blood pressure, heart beat rate, degree of sweating, and can be manifested in shaking, changes in skin coloration, facial expression, and the acoustics of speech. This research focuses on the...
Kluwer, 1992. — 254 p. After almost three scores of years of basic and applied research, the field of speech processing is, at present, undergoing a rapid growth in terms of both performance and applications and this is fuelled by the advances being made in the areas of microelectronics, computation and algorithm design. Speech processing relates to three aspects of voice...
InTech, 2011. — 442 p.
The book Speech Technologies addresses different aspects of the research field and a wide range of topics in speech signal processing, speech recognition and language processing. The chapters are divided in three different sections: Speech Signal Modeling, Speech Recognition and Applications. The chapters in the first section cover some essential topics...
Springer, 2010. — 187 p. The idea for this book was formed during the doctorate of Bernd Iser. Bernd Iser was working on efficient and robust bandwidth extension algorithms in hands-free systems for Harman/Becker Automotive Systems. It turned out that bandwidth extension of speech signals was a topic of appreciable interest, where lots of scientific publications discussing...
The Distinctive Features and their Correlates The M-l-T Press, 1952. - 74 p. This report proposes some questions to be discussed by specialists working on various aspects of speech communication. These questions concern the ultimate discrete components of language, their specific structure, their inventory in the languages of the world, their identification on the acoustical...
Диссертация, Cambridge University, 1995. — 157 p. The research presented in this thesis addresses the topic of ad hoc retrieval of information from collections of spoken items such as radio news bulletins. Modern digital computers are becoming increasingly adept at processing nontextual data, such as speech. Consequently, new methods are required to allow users to pin-point...
PhD dissertation. — University of California, Berkeley, 2004. — 100 р. From cell phones and PDAs to huge automated call centers, speech recognition is becoming more and more ubiquitous. As demand for automatic speech recognition (ASR) applications increases, so too does the need to run ASR algorithms on a variety of unconventional computer architectures. One such architect ure...
Springer, 2005. — 207 p. As part of the steady progress being made in the field of information and telecommunication techniques, voice and speech quality assessment of systems has gained in importance over the last years. An engineering approach to voice and speech quality of systems includes the consideration of how a system is perceived by its users, and how the needs and...
L.: A Bradford Book, 1998. - 305p.
This book reflects decades of important research on the mathematical foundations of speech recognition. It focuses on underlying statistical techniques such as hidden Markov models, decision trees, the expectation-maximization algorithm, information theoretic goodness criteria, maximum entropy probability estimation, parameter and data...
PhD dissertation. — Carnegie Mellon University, 2007. — 177 p. The automatic speaker recognition technologies have developed into more and more important modern technologies required by many speech-aided applications. The main challenge for automatic speaker recognition is to deal with the variability of the environments and channels from where the speech was obtained. In...
Springer, 2004. — 292 p. The importance of speech and language technologies continues to grow as information, and information needs, pervade every aspect of our lives and every corner of the globe. Speech and language technologies are used to automatically transcribe, analyze, route and extract information from highvolume streams of spoken and written information. Equally...
PhD dissertation. — Purdue University, 2000. — 223 p. Some of the major research issues in the field of speech recognition revolve around methods of incorporating additional knowledge sources, beyond the short-time spectral information of the speech signal, into the recognition process. These knowledge sources, which may include information about prosody, language structure,...
Диссертация (Master), University of Cambridge, 1997, -67 pp. This project investigates the problem of labelling segments in a speaker-tracking system. A mathematical representation of each segment is sought which encaptures the speaker-dependent information available. It is shown that both the covariance matrix and the Maximum Likelihood Linear Regression (MLLR) matrix provide...
John Wiley, 2009. — 181 p. State-of-the-art speech and language technology has reached a level that allows us to build interactive applications which the users can have short conversations with in order to search for information. We are already dealing with electronic banking facilities, information providing systems, restaurant guides, timetable services, assisting translation...
Morgan & Claypool, 2010. — 167 p. Considerable progress has been made in recent years in the development of dialogue systems that support robust and efficient human–machine interaction using spoken language. Spoken dialogue technology allows various interactive applications to be built and used for practical purposes, and research focuses on issues that aim to increase the...
Kluwer, 2002. — 193 p. As the performance of speaker-independent continuous speech recognition has improved over the last decade, increasing attention has been given to the poor recognition performance obtained for some speakers, noisy conditions and environments where the quality and the type of the communication channel is unknown. At the same time an increasing number of...
Kluwer, 2001. — 277 p. Consider a computer system that you can talk to using ordinary speech (either directly or perhaps using your telephone), and that you can ask questions concerning such things as timetables for public transportation. For example, you might ask the system the departure time of a train from Brussels to Amsterdam, specifying that you wish to arrive in...
Draft, 2nd edition: Prentice Hall, 2008 — 1024 p. An explosion of Web-based language techniques, the merging of distinct fields, the availability of phone-based dialogue systems, and much more make this an exciting time in speech and language processing. The first of its kind to thoroughly cover language technology – at all levels and with all modern technologies – this book...
PhD dissertation. — Boston University, 1997. — 186 p. The goal of this dissertation is to develop effective strategies for the adaptation of acoustic parameters for a large vocabulary continuous speech recognition (LVCSR) system from a small amount of speech. Typically this implies adapting a system characterized by millions of parameters from a few minutes of speech. This is...
PhD dissertation. — University of Illinois, 2010. — 131 p. The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the...
PhD dissertation. — University of Cambridge, 1998. — 97 p. Most modern speech recognition systems are based on hidden Markov models. Yet despite their widespread use many of their properties are not well understood. This work aims to increase our understanding about the training of hidden Markov models for classification. We first examine the question of what is the best...
Arunachal Pradesh: Technical and Scientific Publisher, 2017. — 11 p. Speech editing is nothing more than moving about some arrays of numbers. Enhancement filters can be used to remove both natural and intentional noise, to a reasonable extent. And pitch and formant analysis can be used to give a general idea of whether two speakers are the same person or not. There are also other...
John Wiley, 2002. — 407 p. Making machines speak like humans is a dream that is slowly coming to fruition. When the first automatic computer voices emerged from their laboratories twenty years ago, their robotic sound quality severely curtailed their general use. But now after a long period of maturation, synthetic speech is beginning to reach an initial level of acceptability....
Albany: Singular Publishing Group, 2001. — 319 p. An Introduction to the Study of Speech Acoustics Acoustic Theory of Speech Production Introduction to the Acoustic Analysis of Speech The Acoustic Characteristics of Vowels and Diphthongs The Acoustic Characteristics of Consonants The Acoustic Correlates of Speaker Characteristics Suprasegmental Properties of Speech Speech...
PhD dissertation. — Hebrew University, 2007. — 110 p. Automatic speech recognition has long been a considered dream. While ASR does work today, and it is commercially available, it is extremely sensitive to noise, talker variations, and environments. The current state-of-the-art automatic speech recognizers are based on generative models that capture some temporal dependencies...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2008. — 178 p. The use of local phoneme posterior probabilities has been increasingly explored for improving speech recognition systems. Hybrid hidden Markov model / artificial neural network (HMM/ANN) and Tandem are the most successful examples of such systems. In this thesis, we present a principled framework for...
John Wiley, 2003. — 222 p. In general, voice transmission over the Internet protocol (IP), or VoIP, means transmission of real-time voice signals and associated call control information over an IP-based (public or private) network. The term IP telephony is commonly used to specify delivery of a superset of the advanced public switched telephone network (PSTN) services using IP...
Master Thesis. — Eidgenössische Technische Hochschule Zürich, 2003. — 119 p. Large vocabulary speech recognition systems traditionally represent words in terms of smaller subword units. During training and recognition they require a mapping table, called the dictionary, which maps words into sequences of these subword units. The performance of the speech recognition system...
PhD dissertation. — University of Cambridge, 2001. — 127 p. The work in this thesis concerns Named Entity (NE) recognition from speech and its use in the generation of enhanced speech recognition output with automatic punctuation and automatic capitalisation. A method for the automatic generation of rules is proposed for NE recognition. Punctuation marks are generated using...
PhD dissertation. — Massachusetts Institute of Technology, 2003. — 115 p. The singing voice is the oldest and most variable of musical instruments. By combining music, lyrics, and expression, the voice is able to affect us in ways that no other instrument can. As listeners, we are innately drawn to the sound of the human voice, and when present it is almost always the focal...
Диплом (Master), Temple University, 2001. — 41 p. Co-channel speech occurs when one speaker’s speech is corrupted by another speaker’s speech. Speech recognition systems, speaker identification systems, speech coding systems, gisting and natural language processing systems work on the basis that there is only one speaker’s speech. If there is more than one speaker (co-channel...
PhD dissertation. — University of California, 2003. — 178 p. This work concerns the automatic speech recognition (ASR) problem, which roughly speaking, consists in converting digitized speech into text. More specifically, we study front ends and acoustic modeling, which together with language modeling and search, constitute a typical ASR system. The main approach is to...
PhD dissertation. — Universität Oldenburg, 2003. — 205 p. Why are even the most advanced computers not able to understand speech nearly half as well as human beings? Even though the rapidly growing performance of microprocessors has enabled speech technology to exhibit major, revolutionary advancements within the last decades, we still are not able to communicate with a...
PhD dissertation. — Queensland University of Technology, 2010. — 237 p. Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In noise-free environments, word recognition performance of...
PhD dissertation. — Indian Institute of Technology, Madras, 2009. — 195 p. The primary mode of excitation of the vocal-tract system during speech production is due to the vibration of the vocal folds. For voiced speech, the most significant excitation takes place around the instant of glottal closure, called the epoch. The objective of this work is to extract the epoch...
PhD dissertation. — Technischer Universität Berlin, 2008. — 221 p. It has long been a dream of many to be able to speak to a computer and be understood. Whereas this dream will remain in the realm of fantasy for a while, there are some applications which appear worthwile as well as achievable. One of those is speech recognition in car environments, useful, as it may be used to...
Springer, 2011. — 387 p. Automatic speech recognition suffers from a lack of robustness with respect to noise, reverberation and interfering speech. The growing field of speech recognition in the presence of missing or uncertain input data seeks to ameliorate those problems by using not only a preprocessed speech signal but also an estimate of its reliability, to selectively...
Springer, 1997. — 367 p. Speech technology, the automatic processing of (spontaneously) spoken words and utterances, now is known to be technically feasible and will become the major tool for handling the confusion of languages. The economic implications of this tool are obvious, in particular in the multilingual European Union. Potential and current applications are dictation...
John Wiley, 2015. — 583 p. Emotion represents a psychological state of the human mind. Researchers from different domains have diverse opinions about the developmental process of emotion. Philosophers believe that emotion originates as a result of substantial (positive or negative) changes in our personal situations or environment. Biologists, however, consider our nervous and...
Springer, 2012. — 161 p. This book came out of approximately ten years of continuing research at Yamagata University. With the emergence of numerous algorithms for a variety of speech processing applications, such as coding, enhancement, and synthesis, a variety of distortion can now be observed. These disturbances degrade the speech quality in an unexpected manner. For...
Second Edition — John Wiley &Sons Ltd, 2004. — 459 p. This Second Edition continues to provide the fundamental technical background required for low bit rate speech coding and the hottest developments in digital speech coding techniques that are applicable to evolving communication systems. Features new chapters on Pitch Estimation and Voice-Unvoiced Classification of Speech,...
Springer, 2015. — 87 p. Voice-based call centers or business process outsourcing units generate huge amounts of speech data everyday during their day-to-day operations. Large and diverse types of information are hidden in these natural language conversations, which is begging to be exploited. The whole area of voice analytics deals with the aspect of deriving usable information...
Addison Wesley, 2003. — 155 p. Most people have experienced an automated speech-recognition system when calling a company. Instead of prompting callers to choose an option by entering numbers, the system asks questions and understands spoken responses. With a more advanced application, callers may feel as if they're having a conversation with another person. Not only will the...
Springer, 2017. — 233 p. — ISBN: 3319536117. This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general...
Springer, 2019. — 282 p. — ISBN: 978-3-030-15852-1. This book explores the processes of spoken language production and perception from a neurobiological perspective. After presenting the basics of speech processing and speech acquisition, a neurobiologically-inspired and computer-implemented neural model is described, which simulates the neural processes of speech processing...
Springer, 2013. — 134 p. During production of speech human beings impose emotional cues on the sequence of sound units to convey the intended message. Speech without emotional information is unnatural and monotonous. Most of the existing speech systems are able to process studio recorded neutral speech. However, in the present real world communication scenario, speech systems...
Springer, 2016. — 126 p. Speech enhancement is incorporated as an essential component in all voice communication devices to improve their performance in noisy environments. Speech enhancement is an important issue for mobile phones, hands-free telephones and also for hearing aids. It has been a challenging problem for researchers to develop new enhancement algorithms that...
ISTE/John Wiley, 2013. — 221 p. The preparation of this book was carried out while preparing an accreditation to supervise research. This is a synthesis covering the past 10 years of research, since my doctorate [LAN 04], in the field of man–machine dialogue. The goal here is to outline the theories, methods, techniques and challenges involved in the design of computer programs...
Kluwer, 1996. — 524 p. The term speech and speaker recognition often refers to the science and technology of developing algorithms and implementing them on machines to recognize the linguistic content in a spoken utterance and to identify the talker who speaks the utterance. Since speech is the most natural means of communication among human beings, it also plays a key role in the...
World Scientific, 2007. — 563 p. It is generally agreed that speech will play a major role in defining next-generation human-machine interfaces because it is the most natural means of communication among humans. To push forward this vision, speech research has enjoyed a long and glorious history spanning the entire twentieth century. As a result in the last three decades we...
PhD dissertation. — Massachusetts Institute of Technology, 2014. — 188 p. The ability to infer linguistic structures from noisy speech streams seems to be an innate human capability. However, reproducing the same ability in machines has remained a challenging task. In this thesis, we address this task, and develop a class of probabilistic models that discover the latent...
Springer, 1989. — 216 p. Speech Recognition has a long history of being one of the difficult problems in Artificial Intelligence and Computer Science. As one goes from problem solving tasks such as puzzles and chess to perceptual tasks such as speech and vision, the problem characteristics change dramatically: knowledge poor to knowledge rich; low data rates to high data rates;...
Диплом (Master), Massachusetts Institute of Technology, 1996, -65 pp. In an effort to reduce the degradation in speech recognition performance caused by variations in vocal tract shape among speakers, this thesis studies a set of low complexity, maximum likelihood based speaker normalization procedures. By approximately modeling the vocal tract as a simple acoustic tube, these...
PhD dissertation. — University of Cambridge, 1995. — 175 p. Hidden Markov models (HMMs) have been used successfully for speech recognition for many years. However, in some respects the assumptions behind HMM models are poor. HMMs model only the within-class data and no attempt is made at discriminating between classes. This is a problem, especially in speaker independent...
Диплом (Master), Helsinki University of Technology, 1999, -113 pp. Synthetic or artificial speech has been developed steadily during the last decades. Especially, the intelligibility has reached an adequate level for most applications, especially for communication impaired people. The intelligibility of synthetic speech may also be increased considerably with visual...
Springer, 2012. — 184 p. — ISBN 978-1-4614-4802-0, ISBN 978-1-4614-4803-7. Data driven methods have long been used in Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) synthesis and have more recently been introduced for dialogue management, spoken language understanding, and Natural Language Generation. Machine learning is now present end-to-end in Spoken Dialogue...
Springer, 2015. — 250 p. This book addresses the subject of emotional speech, especially its encoding and decoding process during interactive communication, based on an improved version of Brunswik’s Lens Model. The process is shown to be influenced by the speaker’s and the listener’s linguistic and cultural backgrounds, as well as by the transmission channels used. Through...
Academic Press, 2016. — 303 p. Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with...
Springer, 2012. — 264 p. This book is organized by research topic. Each chapter focuses on a major topic and can be read independently. Each chapter contains advanced algorithms along with real speech examples and evaluation results to validate the usefulness of the selected topics. Special attention has been given to the topics related to improving overall system robustness...
PhD dissertation. — University of Cambridge, 2007. — 181 p. It is well known that the performance of automatic speech recognition degrades in noisy conditions. To address this, typically the noise is removed from the features or the models are compensated for the noise condition. The former is usually quite efficient, but not as effective as the latter, often computationally...
PhD dissertation. — Technischer Universität München, 2006. — 132 p. This thesis presents a system for the interpretation of natural speech which serves as input module for a spoken dialog system. It carries out the task of extracting application-specific pieces of information from the user utterance in order to pass them to the control module of the dialog system. By following...
IGI Global, 2009. — 573 p. It has been widely accepted that speech perception is a multimodal process and involves information from more than one sensory modality. The famous McGurk effect [McGurk and MacDonald, Nature 264(5588): 746–748, 1976] shows that visual articulatory information is integrated into our perception of speech automatically and unconsciously. For example, a...
Диссертация (Master), China University of Science and Technology, 1996. — 78 p. This research identifies the problems encountered in transmitting voice over the Internet and proposes approaches to solve these problems. The current Internet is not very suitable for transmitting real-time data because its underlying protocols and switches were only engineered to transmit non-real...
PhD dissertation. — l’École Nationale Supérieure des Télécommunications, 2007. — 178 p. Speech is one of the most natural ways of communication for human beings. The task which extracts the intended message content in the signal is automatic speech recognition (ASR). Since the human speech carries not only the linguistic information but also the personal information such as the...
PhD dissertation. — University of Cambridge, 2005. — 158 p. Selecting the optimal model structure with the .appropriate. complexity is a standard problem for training large vocabulary continuous speech recognition (LVCSR) systems, and machine learning in general. State-of-the-art LVCSR systems are highly complex. A wide variety of techniques may be used which alter the system...
PhD dissertation. — Purdue University, 2004. — 253 p. Although speech recognition technology has significantly improved during the past few decades, current speech recognition systems output only a stream of words without providing other useful structural information that could aid a human reader and downstream language processing modules. This thesis research focuses on the...
PhD dissertation. — Massachusetts Institute of Technology, 2005. — 140 p. Spoken language, especially conversational speech, is characterized by great variability in word pronunciation, including many variants that differ grossly from dictionary prototypes. This is one factor in the poor performance of automatic speech recognizers on conversational speech. One approach to...
PhD dissertation. — Cambridge University, 2001. — 157 p. This dissertation details the development and evaluation of techniques to enhance speech corrupted by unknown independent additive noise when only a single microphone is available. It therefore seeks to address a deficiency of many speech enhancement systems which require a priori knowledge of the interfering noise...
New York: Taylor & Francis, 2007. — 608 p. — ISBN: 0849350328, 9780849350320. The first book to provide comprehensive and up-to-date coverage of all major speech enhancement algorithms proposed in the last two decades, Speech Enhancement: Theory and Practice is a valuable resource for experts and newcomers in the field. The book covers traditional speech enhancement algorithms,...
New York: Taylor & Francis, 2007. — 608 p. — ISBN: 0849350328, 9780849350320. The first book to provide comprehensive and up-to-date coverage of all major speech enhancement algorithms proposed in the last two decades, Speech Enhancement: Theory and Practice is a valuable resource for experts and newcomers in the field. The book covers traditional speech enhancement algorithms,...
N.-Y.: CRC Press, 2013. — 705 p. This text is, in part, an outgrowth of graduate course on speech signal processing at the University of Texas at Dallas since the fall of 1999. The fact that no textbook existed at the time on speech enhancement, other than a few edited books suitable for the experts, made it difficult to teach the fundamental principles of speech enhancement in...
PhD dissertation. — Cambridge University, 2010. — 191 p. In recent years, systems based on support vector machines (SVMs) have become standard for speaker verification (SV) tasks. An important aspect of these systems is the dynamic kernel. These operate on sequence data and handle the dynamic nature of the speech. In this thesis a number of techniques are proposed for improving...
Cambridge University Press, 2020. — 329 p. — ISBN: 978-1-108-42812-5. This book will help readers understand fundamental and advanced statistical models and deep learning models for robust speaker recognition and domain adaptation. This useful toolkit enables readers to apply machine learning techniques to address practical issues, such as robustness under adverse acoustic...
Springer, 2007. — 438 p. We are surrounded by sounds. Such a noisy environment makes it difficult to obtain desired speech and it is difficult to converse comfortably there. This makes it important to be able to separate and extract a target speech signal from noisy observations for both man–machine and human–human communication. Blind source separation (BSS) is an approach for...
PhD dissertation. — Johns Hopkins University, 2000. — 117 p. This thesis explores new ways of utilizing the information existing in word lattices produced by speech recognition systems to improve the accuracy of the recognition output and obtain a more perspicuous representation of a set of alternative hypotheses. We change the standard problem formulation of searching among a...
ISTE/John Wiley, 2009. — 505 p. This book, entitled Spoken Language Processing, addresses all the aspects covering the automatic processing of spoken language: how to automate its production and perception, how to synthesize and understand it. It calls for existing know-how in the field of signal processing, pattern recognition, stochastic modeling, computational linguistics,...
Springer, 1976. — 300 p. During the past ten years a new area in speech processing, generally referred to as linear prediction, has evolved. As with all scientific research, results did not always get published in a logical order and terminology was not always consistent. In mid-1974, we decided to begin an extra hours and weekends project of organizing the literature in linear...
Wiley-ISTE, 2021. — 208 p. — (Cognitive Science Series). — ISBN 978-1-78630-319-6. The text sets out in simple and accessible terms the various methods of acoustic analysis of speech, placing them in their historical context, allowing a better understanding of the mathematical and technical solutions adopted today in phonetics and experimental phonology. Without mathematical...
John Wiley, 2008. — 555 p. When the book Digital Speech Transmission – Enhancement, Coding and Error Concealment by Peter Vary and Rainer Martin appeared in 2006, it was clear that a subject of this importance and this range could not be treated in all its details on 600-some pages. Important aspects had to be left out and had to be postponed to a succeeding volume. The...
Springer, 2012. — 70 p. Human beings recognize speaker, language and speech using multiple cues present in speech signal and evidences are combined to arrive at a decision. Humans use several prosodic cues for these recognition tasks. But conventional automatic speaker, language and speech recognition systems mostly rely on spectral/cepstral features which are affected by...
Springer, 2019. — 70 p. Human beings recognize speaker, language, emotion, and speech using multiple cues present in speech signal and evidences are combined to arrive at a decision. Humans use several prosodic cues for these recognition tasks. But conventional automatic speaker, language, emotion, and speech recognition systems mostly rely on spectral/cepstral features which...
Springer, 2019. — 70 p. — ISBN: 978-1-4614-1158-1. Human beings recognize speaker, language, emotion, and speech using multiple cues present in speech signal and evidences are combined to arrive at a decision. Humans use several prosodic cues for these recognition tasks. But conventional automatic speaker, language, emotion, and speech recognition systems mostly rely on...
Диплом (Master), Mississippi State University, 2008. — 70 p. In this work, nonlinear acoustic information is combined with traditional linear acoustic information to produce a noise-robust feature set for speech recognition. Classical acoustic modeling has relied on the assumption of linear acoustics where signal processing is performed in the signal's frequency domain....
Springer, 2004. — 431 p. The present coming of age of speech technologies coincides with the advent of mobile computing and the accompanying need for ubiquitous information access. This has generated enormous commercial interest around deploying speech interaction to IT-based services. In his book, Michael gives an in-depth review of the nuts and bolts of constructing speech...
PhD dissertation. — Kungliga Tekniska högskolan, Stockholm, 2006. — 350 p. Speaker verification is the biometric task of authenticating a claimed identity by means of analyzing a spoken sample of the claimant's voice. The present thesis deals with various topics related to automatic speaker verification (ASV) in the context of its commercial applications, characterized by...
University of Ljubljana, 2012. - 116 p. The two main objectives of this project are to analyse the efficiency of several techniques widely used among the field of emotion recognition through spoken audio signals, and, secondly, obtain empirical data that proves that it is actually plausible to do so with a more than acceptable performance rate. For that purpose, our research will...
Диплом (Master), Massachusetts Institute of Technology, 1991. — 89 p. One of the most critical and yet unsolved problems in phonetic recognition is the transformation of the continuous speech signal to a discrete ,representation for accessing words in the lexicon. In order to find an efficient description of speech for recognition tasks. our research investigates the use of...
PhD dissertation. — Faculté Polytechnique de Mons, 2004. — 143 p. Confidence measures for the results of speech/speaker recognition make the systems more useful in the real time applications. Confidence measures provide a test statistic for accepting or rejecting the recognition hypothesis of the speech/speaker recognition system. Speech/speaker recognition systems are usually...
Master Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2009. — 91 p. The classical front-end analysis in speech recognition is a spectral analysis which produces features vectors consisting of mel-frequency cepstral coefficients (MFCC). MFCC are based on a standard power spectrum estimate which is first subjected to a log-based transform of the...
InTech, 2008. — 576 p. After decades of research activity, speech recognition technologies have advanced in both the theoretical and practical domains. The technology of speech recognition has evolved from the first attempts at speech analysis with digital computers by James Flanagan’s group at Bell Laboratories in the early 1960s, through to the introduction of dynamic...
PhD dissertation. — Brno University of Technology, 2012. — 133 p. Statistical language models are crucial part of many successful applications, such as automatic speech recognition and statistical machine translation (for example well-known Google Translate). Traditional techniques for estimating these models are based on N-gram counts. Despite known weaknesses of N-grams and...
John Wiley, 2002. — 403 p. Playing with a new technology is fun. I have been a teacher in one form or another for over 20 years, but it still gets me excited when I see something that seems so obvious and so simple that it is shocking it hasn’t been done before. That’s the way I feel about VoiceXML. VoiceXML makes it possible for anyone who can build a basic Web page to create...
Kluwer, 2004. — 104 p. The conjunction of several factors having occurred throughout the past few years will make humans significantly change their behavior vis-а-vis machines. In particular the use of speech technologies will become normal in the professional domain, but also in everyday life. The performance of speech recognition components has significantly improved: only...
Newnes, 2011. — 381 p. Voice over IP (VoIP) in particular and Voice over Packet (VoP) in general have been advocated and studied since the mid 1970s. It was the advent of DSP technology for voice compression in the late 1980s and early 1990s that gave these services the impetus they needed to enter the mainstream. Commercial-grade technologies and services started to appear in...
Диплом (Master), Universität Karlsruhe, 2006, -86 pp. The following report describes the work at the interAct on Bulgarian speech recognition, including the collection of data, training a Bulgarian speech recognizer and experimenting with Russian text data to improve the recognition. It also gives an overview of the unique traits of Bulgarian language, introduces the main...
Springer, 2018. — 120 р. This book shows ways of augmenting the capabilities of Natural Language Processing (NLP) systems by means of cognitive-mode language processing. The authors employ eye-tracking technology to record and analyze shallow cognitive information in the form of gaze patterns of readers/annotators who perform language processing tasks. The insights gained from...
Springer, 2018. — 120 р. This book shows ways of augmenting the capabilities of Natural Language Processing (NLP) systems by means of cognitive-mode language processing. The authors employ eye-tracking technology to record and analyze shallow cognitive information in the form of gaze patterns of readers/annotators who perform language processing tasks. The insights gained from...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2003. — 158 p. In this work, normalization techniques in the acoustic feature space are studied which improve the robustness of automatic speech recognition systems. It is shown that there is a fundamental mismatch between training and test data which causes degraded recognition performance. Adaptation and...
Springer, 2005. — 490 p. An increasing number of telephone services are offered in a fully automatic way with the help of speech technology. The underlying systems, called spoken dialogue systems (SDSs), possess speech recognition, speech understanding, dialogue management, and speech generation capabilities, and enable a more-or-less natural spoken interaction with the human...
Springer, 1991. — 402 p. Today there is a great deal of interest and excitement in the investigation of artificial neural networks. Yet, when things sort themselves out, neural networks will do less than their most fervent supporters in their most enthusiastic moments suggest. But they will do more than the most pessimistic estimates of their most adamant detractors. We will...
PhD dissertation. — Massachusetts Institute of Technology, 2002. — 178 p. Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface,...
Springer, 2013. — 59 p. A leading use of speech recognition technology is the conversion of large speech databases into text for indexing and retrieval purposes. Using a large vocabulary continuous speech recognition (LVCSR) engine seems to provide a natural solution, as speech can be fully converted into text and then indexed and searched. One method used for searching speech...
Диплом (Master), Indian Institute of Technology, 2001. — 85 p. The thesis presents a novel situationally-aware multimodal spoken language system called Fuse that performs speech understanding for visual object selection. Fuse uses semantic information from immediate visual context to guide spoken language recognition and understanding. An experimental task was created in which...
IGI Global, 2010. — 342 p. As social scientists often define it, technology refers to devices and processes that extend our natural capabilities. Microscopes make it possible to see smaller things and telescopes enable us to see things that are further away. Cars extend the amount of space that we are able to travel far beyond where our feet can take us during a given period of...
Springer, 2007. — 362 p. The best way to introduce this textbook is by using the words Volker Dellwo and his colleagues had chosen to begin their chapter How Is Individuality Expressed in Voice? While they use this statement to motivate the introductory chapter on speech production and the phonetic description of speech, it constitutes a framework of the entire book as...
Springer, 2007. — 316 p. The best way to introduce this textbook is by using the words Volker Dellwo and his colleagues had chosen to begin their chapter How Is Individuality Expressed in Voice? While they use this statement to motivate the introductory chapter on speech production and the phonetic description of speech, it constitutes a framework of the entire book as...
John Wiley, 2008. — 592 p. Voice over IP (VoIP) gained popularity through actual deployments and by making use of VoIP - based telephone and fax calls with global roaming and connectivity via the Internet. Several decades of effort have gone into VoIP, and these efforts are benefitting real applications. Several valuable books have been published by experts in the field. While I...
Master report, Carnegie Mellon University, 2005, -46 pp. Given a speech signal there are two kinds of information that may be extracted from it. On one hand there is the linguistic information about what is being said, and on the other there is also speaker specific information. This report deals with the task of speaker recognition where the goal is to determine which one of a...
Springer, 2010. — 490 p. Speech dereverberation has been on the agenda of the signal processing community for several years. It is only in the last decade, however, that the topic has really taken off, as seen from the growing number of publications appearing in the journals and at conferences. One of the reasons that the topic has become more popular is the rapidly growing...
PhD dissertation. — Carnegie Mellon University, 2004. — 101 p. Accurate recognition of spontaneous speech is one of the most difficult problems in speech recognition today. When speech is produced in a carefully planned manner, automatic speech recognition (ASR) systems are very successful at accurate recognition and transcription. In response to casual speech, ASR systems produce...
Springer, 2010. — 382 p. Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics provides a forum for today’s speech technology industry leaders – drawn from private enterprises and academic institutions all over the world – to discuss the challenges, advances, and aspirations of voice technology. The collection of essays contained in this volume...
Springer, 2013. — 72 p. AT&T, Yahoo! Research, and other companies, along with academicians, technology developers, and market analysts. They analyze the growing markets for mobile speech, new methodological approaches to the study of natural language, empirical research findings on natural language and mobility, and future trends in mobile speech. This book is divided into...
Springer, 2012. — 546 p. — ISBN: 978-1-4614-0263-3. Forensic Speaker Recognition: Law Enforcement and Counter-Terrorism is an anthology of the research findings of thirty-five speaker recognition experts from around the world. The book provides a multidimensional look at the complex science involved in determining whether a suspect’s voice truly matches forensic speech samples,...
Springer, 1983. — 503 p. This volume contains invited and contributed papers presented at the, NATO Advanced study Institute on "Recent Advances in Speech, Understanding and Dialog systems" held in Bad Windsheim, Federal, Republic of Germany, July 5 to July 18, 1987. It is divided into the, three parts Speech coding and Segmentation, Word Recognition, and, Linguistic...
PhD dissertation. — University of Cambridge, 2001. — 134 p. Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech...
EURASIP Journal on Advances in Signal Processing, 2010. — 94 p. Significant knowledge about microphone arrays has been gained from years of intense research and product development. There have been numerous applications suggested, for example, from large arrays (in the order of 100 elements) for use in auditoriums to small arrays with only 2 or 3 elements for hearing aids and...
PhD dissertation. — McGill University, 1991. — 169 p. Hidden Markov Models (HMMs) are one of the most powerful speech recognition tools available today. Even so, the inadequacies of HMMs as a "correct" modeling framework for speech are well known. In that context, we argue that the maximum mutual information estimation (MMIE) formulation for training is more appropriate...
IEEE Press, 2000. — 560 p. Speech commW1ication is an interdisciplinary subject. Although much of the research material for the book comes from engineering literature (e.g., IEEE journals), a wide variety of sources is employed (especially for Chapters 3-5). The book is directed primarily at an engineering audience le.g., to a final-year undergraduate or graduate course in...
Entropics Ltd., 1999. — 667 p. The HTK Application Programming Interface (HAPI) is a library of functions providing the programmer with an interface to any speech recognition system supplied by Entropic or developed using the Hidden Markov Model Toolkit (HTK). HTK is a set of Unix tools which are used to construct all the components of a modern speech recogniser. One of the...
PhD dissertation. — University of Cambridge, 1995. — 146 p. In recent years considerable progress has been made in the field of continuous speech recognition where the predominant technology is based on hidden Markov models (HMMs). HMMs represent sequences of time varying speech spectra using probabilistic functions of an underlying Markov chain. However because the probability...
Springer, 2015. — 336 p. This book describes the basic principles underlying the generation, coding, transmission and enhancement of speech and audio signals, including advanced statistical and machine learning techniques for speech and speaker recognition with an overview of the key innovations in these areas. Key research undertaken in speech coding, speech enhancement, speech...
CRC Press, 2010. — 381 p. It is becoming increasingly apparent that all forms of communication—including voice—will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding....
PhD dissertation. — Západočeská univerzita v Plzni, 2008. — 125 p. This thesis deals with the problem of building language models for automatic continuous speech recognition of inflectional languages. Impressive progress was made in large vocabulary continuous speech recognition in last decades. However, recognition systems for English perform noticeably better than the other,...
Nova Science Publishers, 2022. — 240 p. — (Computer Science, Technology and Applications). Speech represents the most natural means of communication between humans. By using Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems, machines also become able to interact with humans using speech. This is of particular importance for building interactive robots or...
Диплом (Master), Mississippi State University, 2000, -135 pp. Over the past few years, speech recognition technology performance on tasks ranging from isolated digit recognition to conversational speech has dramatically improved. Performance on limited recognition tasks in noise-free environments is comparable to that achieved by human transcribers. This advancement in...
Диплом (Master), Massachusetts Institute of Technology, 2002. — 79 p. This thesis is concerned with improving the performance of speaker recognition systems in three areas: speaker modeling, verification score computation, and feature extraction in telephone quality speech. We first seek to improve upon traditional modeling approaches for speaker recognition, which are based on...
PhD dissertation. — Massachusetts Institute of Technology, 2006. — 176 p. We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a pre-specified inventory of lexical units (i.e. phones or...
PhD dissertation. — Carnegie Mellon University, 2013. — 145 p. Speech is one of the most private forms of personal communication. A sample of a person’s speech contains information about the gender, accent, ethnicity, and the emotional state of the speaker apart from the message content. Speech processing technology is widely used in biometric authentication in the form of...
De Gruyter, 2019. — 287 p. — (Speech Technology and Text Mining in Medicine and Health Care). — ISBN 978-1-61451-759-7. Signal and Acoustic Modeling for Speech and Communication Disorders demonstrates how speech signal processing and acoustic modeling can be instrumental in early detection and successful intervention with speech deficits resulting from Parkinson’s disease,...
John Wiley, 2006. — 274 p. The total number of mobile phone subscribers worldwide is expected to exceed two billion in 2006. While ordinary voice calling remains the dominant application, mobile devices are becoming increasingly sophisticated, with features like multimedia messaging, cameras, web browsers, games, video, and music. The data capabilities of mobile networks are...
Диплом (Master), University of Illinois, 2010. — 91 p. In this thesis, we describe a biometric authentication system that is capable of recognizing its users’ voice using advanced machine learning and digital signal processing tools. The proposed system can both validate a person’s identity (i.e. verification) and recognize it from a larger known group of people (i.e....
Автор:Votrax Год выхода: Unknown Язык: English Format: PDF Pagine: 22 О книге: The SC-01 Speech Synthesizer is a completely self-contained solid state device.This single chip phonetically synthesizes continuous speech, of unlimited vocabulary, from low data rate inputs. Speech is synthesized by combining phonemes (the building blocks of speech) in the appropriate sequence. The...
Emerald Group, 2012. — 459 p. The last 15 years have seen a revolution in auditory physiology, but the new ideas have been slow to gain currency outside specialist circles. Undoubtedly, one of the main reasons for this has been the lack of a general source for non-specialists, and it is hoped that this book will bring current thinking to a much wider audience. While the book is...
The MIT Press, 2012. — 339 p. — ISBN: 978-0-262-01685-8. На англ. языке. In The Voice in the Machine , Roberto Pieraccini examines six decades of work in science and technology to develop computers that can interact with humans using speech and the industry that has arisen around the quest for these technologies. He shows that although the computers today that understand speech...
Springer, 2010. — 279 p. During the past years the mystery of emotions has increasingly attracted interest in research on human–computer interaction. In this work we investigate the problem of how to incorporate the user’s emotional state into a spoken language dialogue system. The book describes the recognition and classification of emotions and proposes models integrating...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2005. — 172 p. This thesis deals with linear transformations at various stages of the automatic speech recognition process. In current state-of-the-art speech recognition systems linear transformations are widely used to care for a potential mismatch of the training and testing data and thus enhance the...
Springer, 2015. — 187 p. If we want the vocal human–computer interaction to become more intuitive, it is inevitable to make the computer notice, interpret, and react to human ways of expression and patterns in communication beyond the recognition of the mere word strings. This is specifically important when it comes to subtle or hidden characteristics carrying connotations or...
PhD dissertation. — University of Cambridge , 2003. — 172 p. This thesis investigates the use of discriminative criteria for training HMM parameters for speech recognition, in particular the Maximum Mutual Information (MMI) criterion and a new criterion called Minimum Phone Error (MPE). Investigations are conducted into the practical issues relating to the use of MMI for speech...
Диссертация (Master), University of Cambridge, 1999, -42 pp.
Most if not all speech recognition systems use Hidden Markov Models (HMM) to model the production of speech from sequences of phones or other basic units of speech. HMMs need to be trained, and this is done using speech utterances whose transcrip t ion is known. The most common method of t raining HMMs is known as...
Proceedings of the VI International Conference MEMSTECH 2010. - Lviv, Polyana, 2010. - Pp.254-
259. The possibility of advantages join of formant and modulation methods of evaluation of speech intelligibility is shown.
Proceedings of the Xth International Conference "Perspective Technologies and Methods in MEMs Design" (MEMSTECH 2014). - June 2014, Lviv, Ukraine. – P. 100-103
Enhancement of speech distorted by reverberation is issue of the day. Before suppression of late reverberation by spectral subtraction or frequency correction techniques, it is necessary to estimate the spectrum of the...
Диссертация (Master), Wilfrid Laurier University, 1989, -177 pp. Speech Recognition is a rapidly expanding field with many useful applications in man-machine interfacing. One of the main benefits of speech control is the flexibility and ease of use allowed an operator for any number of specific applications. Speech recognition units (SRU) are currently at a high level of...
Диссертация (Master), Universitetet i Trondheim, 1994, -97 pp. Automatic recognition of speech has come a long way from the first serious attempts at machine recognition of a few isolated words in the 1950's. Today, commercial recognizers capable of recognizing several tens of thousands words spoken as isolated utterances are available on a PC platform and the first speaker...
Диплом (Master), Helsinki University of Technology, 2007, -66 pp. The duration of phones play a significant part in the comprehension of speech. Finnish, for example, has several word pairs which can be distinguishable mainly by the duration of their phones. In automatic speech recognition, it is very important to detect these differences. Modern speech recognition systems,...
Prentice-Hall, 2002. — 800 p. Speech and hearing, man's most used means of communication, have been the objects of intense study for more than 150 years-from the time of von Kempelen's speaking machine to the present day. With the advent of the telephone and the explosive growth of its dissemination and use, the engineering and design of evermore bandwidth-efficient and...
New York: Prentice-Hall, 2006. — 802 p. Essential principles, practical examples, current applications, and leading-edge research. In this book, Thomas F. Quatieri presents the field's most intensive, up-to-date tutorial and reference on discrete-time speech signal processing. Building on his MIT graduate course, he introduces key principles, essential applications, and...
John Wiley, 2006. — 338 p. VoIP means transmitting speech over computer networks. In contrast to classical telephony, where research into the relation between physical transmission parameters, the resulting speech signal and the related speech quality has a longer tradition, speech quality of VoIP has only recently become an issue. The present book tries to merge knowledge of the...
Proceedings of the IEEE. — Feb 1989. — Volume 77, Issue 2. — p. 257-286. This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech...
Prentice-Hall International, Inc. , Englewood Cliffs, New Jersey, 1993. — 507 p. From preface of the book: ".the fundamental goal of the book would be to provide a theoretically sound, technically acurate, and reasonably complete description of the basic knowledge and ideas that constitute a modern system for speech recognition by machine. "
Prentice Hall, 1978. — 512 p. Классическая книга по цифровой обработке речевых сигналов Fundamentals of Digital Processing Digital Models for Speech Signal Time-Domain Methods for Speech Processing Digital Representations of the Speech Waveform Short-Time Fourier Analysis Homomorphic Speech Processing Linear Predictive Coding of Speech Digital Speech Processing for Man-Machine...
NOWPress, 2007. — 194 p. — (Foundations and Trends in Signal Processing). Краткое изложение современных подходов к цифровой обработке речи. Since even before the time of Alexander Graham Bell’s revolutionary invention, engineers and scientists have studied the phenomenon of speech communication with an eye on creating more efficient and effective systems of human-to-human and...
Boston: Pearson, 2010. — 1060 p. Speech signal processing has been a dynamic and constantly developing field for more than 70 years. The earliest speech processing systems were analog systems. They included, for example, the Voder (voice demonstration recorder) for synthesizing speech by manual controls, developed by Homer Dudley and colleagues at Bell Labs in the 1930s and...
PhD dissertation. — Cambridge University, 2013. — 266 p. The discriminative approach to speech recognition offers several advantages over the generative, such as a simple introduction of additional dependencies and direct modelling of sentence posterior probabilities/decision boundaries. However, the number of sentences that can possibly be encoded into an observation sequence...
John Wiley, 2012. — 302 p. Advances in computing–in terms of both the creation of novel mathematical techniques and the design of data-driven technologies–have fuelled the ubiquitous development and deployment of speech technologies over the last two decades. Some of the core speech technologies and their applications to coding, recognition, synthesis, enhancement and such have...
Kluwer, 1995. — 471 p. The term speech processing refers to the scientific discipline concerned with the analysis and processing of speech signals for getting the best benefit in various practical scenarios. These different practical scenarios correspond to a large variety of applications of speech processing research. Examples of some applications include enhancement, coding,...
InTech, 2012. — 326 p. — ISBN: 9535108313, ISBN: 9789535108313. This book focuses primarily on speech recognition and the related tasks such as speech enhancement and modeling. This book comprises 3 sections and thirteen chapters written by eminent researchers from USA, Brazil, Australia, Saudi Arabia, Japan, Ireland, Taiwan, Mexico, Slovakia and India. Section 1 on speech...
InTech, 2012. — 149 p. Speech processing is the process by which speech signals are interpreted, understood, and acted upon. Interpretation and production of coherent speech are both important in the processing of speech. It is done by automated systems such as voice recognition software or voice-to-text programs. Speech processing includes speech recognition, speaker recognition,...
Springer, 1998. — 130 p. Once in a while, something nice happens, as if by coincidence, serendipitously. It happened to me when T.V. Raman asked me to supervise his Ph.D. thesis on building a system to speak documents, especially those with technical content or a lot of structure. The project had many interesting points, for example: the need for a programming language for writing...
Springer, 2015. — 156 p. "Ultra Low Bit-Rate Speech Coding" focuses on the specialized topic of speech coding at very low bit-rates of 1 Kbits/sec and less, particularly at the lower ends of this range, down to 100 bps. The authors set forth the fundamental results and trends that form the basis for such ultra low bit-rates to be viable and provide a comprehensive overview of...
Springer, 2012. — 136 p. During production of speech human beings impose durational constraints and intonation patterns on the sequence of sound units to convey the intended message. This inherent ability of the human beings in using the prosody (duration and intonation) knowledge is naturally acquired, and is difficult to articulate. But for synthesizing speech from a text by...
Springer, 2012. — 136 p. During production of speech human beings impose durational constraints and intonation patterns on the sequence of sound units to convey the intended message. This inherent ability of the human beings in using the prosody (duration and intonation) knowledge is naturally acquired, and is difficult to articulate. But for synthesizing speech from a text by...
Springer, 2013. — 127 p. — ISBN 978-1-4614-6359-7, ISBN 978-1-4614-6360-3. Human beings use speech as a primary mode of communication for conveying messages. A speech signal carries multiple cues related to intended message, speaker and language identities, behavioural and emotional mood of the speaker and characteristics of background environment. Human beings exploit all...
Springer, 2014. — 129 p. Robust speech systems in mobile environment have gained a special interest in recent years in order to enable access to remote voice-activated services. In this context, three major challenges that need to be considered are: varying background conditions, speech coding, and transmission channel errors. In this book, we focus on improving the recognition...
Springer, 2015. — 119 p. This book discusses the contribution of excitation source information in discriminating language. The authors focus on the excitation source component of speech for enhancement of language identification (LID) performance. Language specific features are extracted using two different modes: (i) Implicit processing of linear prediction (LP) residual and...
Springer, 2017. — 100 p. The goal of developing a phone recognition system (PRS) is to derive the sequence of basic sound units from the speech signal. Most of the state-of-the-art PRSs are developed using spectral features such as Mel frequency cepstral coefficients. Spectral features mainly represent the gross shape of the vocal tract, but not the information related to the...
Lippincott Williams & Wilkins, 2011. - 416 p. Written in a clear, reader-friendly style, Speech Science Primer serves as an introduction to speech science and covers basic information on acoustics, the acoustic analysis of speech, speech anatomy and physiology, and speech perception. It also includes topics such as research methodology, speech motor control, and history/evolution...
PhD dissertation. — University of Cambridge, 2009. — 206 p. State-of-the-art speech recognition systems are based on statistical techniques and use hidden Markov models (HMMs) as acoustic models. These acoustic models are trained from a large amount of speech data usually collected from a large number of speakers and in different acoustic environments. The training data...
PhD dissertation. — Brown University, 2007. — 131 p. The problem addressed is the real-time labeling of talker identity for conversational speech from several talkers moving freely around a conference-sized room. Because the number of talkers and the identities of the talkers are unknown prior to system startup, labeling consists of marking each speech interval with a unique...
Proceedings of the 28th Annual Conference of the Cognitive Science Society. — Vancouver, 2006. — 6 p. Previous work provided corpus evidence for structural priming for specific syntactic constructions. The present paper extends these results by investigating priming effects involving arbitrary syntactic rules in spoken dialogue corpora. We demonstrate the existence of within-...
8th ELSNET Summer School, Chios Island, Greece, July 15-30 2000, Revised Lectures. — Springer, 2003. — 202 p. This book originated from the 8th ELSNET Summer School on Language and Communication that was held in the summer of 2000 on the island of Chios in ELSNET is the European Network in Human Language Technologies, a network some 140 academic institutions and private...
PhD dissertation. — University of Toronto, 2008. — 269 p. Robust speech recognition in acoustic environments that contain multiple speech sources and/or complex non-stationary noise is a difficult problem, but one of great practical interest. The formalism of probabilistic graphical models constitutes a relatively new and very powerful tool for better understanding and...
Диплом (Master), IDIAP Research Institut, 2015. — 45 p. The general subject of this work is to present mathematical methods encountered in automatic speech recognition (ASR). Learning, evaluation and decoding problems are important parts in ASR and need hidden Markov models to solve them. These processes are explained in the first chapter after some basic definitions. Because...
Kluwer, 1989. — 169 p. In order to perceive speech and other sounds, the incoming sound wave must be transformed into a variety of representations, each bringing forth different aspects of the signal, its source, and meaning. Understanding how we perceive and how machines can be made to perceive auditory signals means, in part, discovering appropriate representations for the...
Taylor & Francis, 2002. — 359 p. This book is about an aspect of applied scholarly endeavour, forensic phonetics, that carries with it very serious social responsibilities. The book makes it clear that forensic speaker identification requires scholarly expertise, and in several disparate areas. Expertise, like forensically useful fundamental frequency, is a long-term thing. It...
PhD dissertation. — Carnegie Mellon University, 1994. — 114 p. Language modeling is the attempt to characterize, capture and exploit regularities in natural language. In statistical language modeling, large amounts of text are used to automatically determine the model’s parameters. Language modeling is useful in automatic speech recognition, machine translation, and any other...
PhD dissertation. — University of Cambridge, 2004. — 157 p. Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions, some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated...
Springer, 1995. — 517 p. This book collects the contributions to the NATO Advanced Study Institute on New Advances and Trends in Speech Recognition and Coding, held in Bubi6n, Granada (Spain), from June 28th to July 10th 1993. The goal of the ASI was to bring together the most important experts on speech recognition and coding to discuss and disseminate their most recent...
Диплом (Master), Massachusetts Institute of Technology, 2004. — 105 p. This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic segment. These units are defined heuristically by mapping several visually similar...
Springer, 1997. — 399 p. This book presents a collection of papers from the Spring 1995 Workshop on Computational Approaches to Processing the Prosody of Spontaneous Speech, hosted by the ATR Interpreting Telecommunications Research Laboratories in Kyoto, Japan. The workshop brought together leading researchers in the fields of speech and signal processing, electrical...
PhD dissertation. — Massachusetts Institute of Technology, 2009. — 164 p. This thesis introduces a novel technique for noise robust speech recognition by first describing a speech signal through a set of broad speech units, and then conducting a more detailed analysis from these broad classes. These classes are formed by grouping together parts of the acoustic signal that have...
Springer, 2009. — 206 p. State-of-the-art automatic speech recognition (ASR) systems use statistical data-driven methods based on hidden Markov models (HMMs). Although such approaches have proved to be efficient choices, ASR systems often perform much worse than human listeners, especially in the presence of unexpected acoustic variability. To improve performance, we usually...
Диссертация (Master), Massachusetts Institute of Technology, 2000. — 107 p. This thesis explores the use of discriminative training to improve acoustic modeling in a segment-based speech recognizer. In contrast with the more commonly used Maximum Likelihood training, discriminative training considers the likelihoods of competing classes when determining the parameters for a...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2015. — 128 p. Automatic processing of multiparty interactions is a research domain with important applications in content browsing, summarization and information retrieval. In recent years, several works have been devoted to find regular patterns which speakers exhibit in a multiparty interaction also known as...
PhD dissertation. — Johns Hopkins University, 2000. — 143 p. Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. For this reason, pronunciation modeling has received considerable attention in recent automatic speech recognition literature. Most of the attention...
Springer, 2014. — 199 p. Speech is a naturally occuring nonstationary signal essential not only for personto- person communication but has become an important aspect of Human Computer Interaction (HCI). Some of the issues related to analysis and design of speech-based applications for HCI have received widespread attention. With continuous upgradation of processing techniques,...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2000. — 155 p. In this work, a framework for efficient discriminative training and modeling is developed and implemented for both small and large vocabulary continuous speech recognition. Special attention will be directed to the comparison and formalization of varying discriminative training criteria and...
Springer, 2012. — 251 p. — ISBN10: 1461445922, ISBN13: 9781461445920. In Monitoring Adaptive Spoken Dialog Systems, authors Alexander Schmitt and Wolfgang Minker investigate statistical approaches that allow for recognition of negative dialog patterns in Spoken Dialog Systems (SDS). The presented stochastic methods allow a flexible, portable and accurate use. Beginning with the...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2006. — 157 p. In this work a number of novel techniques for improved treatment of spontaneous speech variabilities in large vocabulary automatic speech recognition are developed and evaluated on US English conversational speech and spontaneous medical dictations. Two main aspects of spontaneous speech modeling...
Springer, 2004. — 399 p. The first edition having been sold out, gives me a welcome opportunity to augment this volume by some recent applications of speech research. A new chapter, by Holger Quast, treats speech dialogue systems and natural language processing. Dictation programs for word processors, voice dialing for mobile phones, and dialogue systems for air travel...
John Wiley, 2014. — 345 p. It might be safe to claim that 20 years ago, neither the term ‘computational paralinguistics’ nor the field it denotes existed. Some 10 years ago, the term did not yet exist either. However, in hindsight, the field had begun to exist if we think of the first steps towards the automatic processing of emotions in speech in the mid-1990s. For example,...
Academic Press, 2006 Обработка естественного языка с многоязыковой точки зрения Language Characteristics Linguistic Data Resources Multilingual Acoustic Modeling Multilingual Dictionaries Multilingual Language Modeling Multilingual Speech Synthesis Automatic Language Identification Other Challenges: Non-native Speech, Dialects, Accents,and Local Interfaces Speech-to-Speech...
PhD dissertation. — Massachusetts Institute of Technology, 2009. — 108 p. While automatic speech recognition (ASR) systems have steadily improved and are now in widespread use, their accuracy continues to lag behind human performance, particularly in adverse conditions. This thesis revisits the basic acoustic modeling assumptions common to most ASR systems and argues that...
Brain & Language. — 2013. — №124. — p. 174-183. Behavioral syntactic priming effects during sentence comprehension are typically observed only if both the syntactic structure and lexical head are repeated. In contrast, during production syntactic priming occurs with structure repetition alone, but the effect is boosted by repetition of the lexical head. We used fMRI to...
Springer, 2011. — 113 p. Soft Computing (SC) techniques have been recognized nowadays as attractive solutions for modeling highly nonlinear or partially defined complex systems and processes. These techniques resemble biological processes more closely than conventional (more formal) techniques. However, despite its increasing popularity, soft computing lacks a precise...
Dissertation. — Universitat Politècnica de Catalunya, 1985. — 250 p. There has been a substantial interest in the last few decades in the problem of training computers to recognize human speech. In spite of the concentrated efforts of conscientious teams of researchers, however, the solution remains elusive, unless the task is kept so restricted as to be uninteresting. These...
PhD dissertation. — University of Pennsylvania, 2007. — 133 p. Automatic speech recognition (ASR) depends critically on building acoustic models for linguistic units. These acoustic models usually take the form of continuous-density hidden Markov models (CD-HMMs), whose parameters are obtained by maximum likelihood estimation. Recently, however, there has been growing interest...
InTech, 2010. — 174 p. Speech processing has come a long way since the year of 1947, when R. K. Potter, G. A. Kopp, and H. Green from Bell Labs introduced the sound spectrograph, the fi rst instrument to produce human voice-prints in the short-time Fourier-transform domain. Ever since, speech recognition has been constantly evolving. From isolated word recognition with small...
Doctoral thesis for the degree of PhD. : 6D070400 – Computing Systems and Software. — Suleyman Demirel University. — Kaskelen: 2020. — 111 p. Scientific supervisors: Assoc prof PhD Kanat Kozhakmet. Paulo Menezes –PhD Professor of Coimbra University (Portugal). Relevance: Emotions take a significant place in interpersonal human interactions and relationships. Emotion affects our...
PhD dissertation. — Massachusetts Institute of Technology, 2006. — 127 p. In this thesis, we have focused on improving the acoustic modeling of speech recognition systems to increase the overall recognition performance. We formulate a novel multi-stream speech recognition framework using multi-tape finite-state transducers (FSTs). The multi-dimensional input labels of the...
PhD dissertation. — Universitat Pompeu Fabra, 2009. — 103 p. The aim of a speech emotion recognizer is to produce an estimate of the emotional state of the speaker given a speech fragment as an input. In other words we seek a solution for the tricky problem: given a speech fragment how to know what the speaker is feeling, even if she did not intend us to know that. Speech...
Диссертация (Master), Carnegie Mellon University, 1995. — 40 p. This report describes a series of experiments that measure speech rate and that attempt to improve speech recognition accuracy for rapidly-spoken speech. Descriptions of several measures of speech rate are presented, with their advantages and disadvantages. Speech recognition results obtained using several...
PhD dissertation. — University of Cambridge, 2006. — 176 The most extensively and successfully applied acoustic model for speech recognition is the Hidden Markov Model (HMM). In particular, a multivariate Gaussian Mixture Model (GMM) is typically used to represent the output density function of each HMM state. For reasons of efficiency, the covariance matrix associated with...
Singapore: Springer, 2019. — 426 p. This book is about recent research in the area of profiling humans from their voice, which seeks to deduce and describe the speaker's entire persona and their surroundings from voice alone. It covers several key aspects of this technology, describing how the human voice is unique in its ability to both capture and influence the human persona...
Springer, 2010. — 177 p. Speech Processing has rapidly emerged as one of the most widespread and wellunderstood application areas in the broader discipline of Digital Signal Processing. Besides the telecommunications applications that have hitherto been the largest users of speech processing algorithms, several nontraditional embedded processor applications are enhancing their...
Диплом (Master), Massachusetts Institute of Technology, 2004. — 187 p. Currently, most dialog systems are restricted to single user environments. This thesis aims to promote an untethered multi-person dialog system by exploring approaches to help solve the speech correspondence problem (i.e. who, if anyone, is currently speaking). We adopt a statistical framework in which this...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2003. — 199 p. In this work, the application of across-word phoneme models during large vocabulary continuous speech recognition is studied. A recognition system will be developed which allows for the training of high performance across-word phoneme models, the efficient application of these across-word phoneme...
Oxford University Press, 1994. — 314 p. The most sophisticated and efficient means of communication between humans is spoken natural language (NL). It is a rare circumstance when two people choose to communicate via another means when spoken natural language is possible. Ochsman and Chapanis [OC74] conducted a study involving two person teams solving various problems using...
Cambridge. Tecnical Report Number 740, 2009. ISSN: 1476-2986 The focus of this research is on analysis of a wide range of emotions and mental states from non-verbal expressions in speech. In particular, on inference of complex mental states, beyond the set of basic emotions, including naturally evoked subtle expressions and mixtures of expressions.
Springer, 2013. — 415 p. Summarising a research programme that lasted formore than 6 years is a demanding task due to the wealth of deliverables, publications and final results of each of the projects concerned. In addition to the content-related topics, which interest scientists, research programmes also lead to new insights for policy makers and programme managers. The former...
Диплом (Master), Mississippi State University, 2002, -91 pp. Rapid advances in speech recognition theory, as well as computing hardware, have led to the development of machines that can take human speech as input, decode the information content of the speech, and respond accordingly. Real-time performance of such systems is often dominated by the evaluation of likelihoods in...
Диплом (Master), Mississippi State University, 2006. — 94 p. Early human language technology systems were designed in a monolithic fashion. As these systems became more complex, this design became untenable. In its place, the concept of distributed processing evolved wherein the monolithic structure was decomposed into a number of functional components that could interact...
EURASIP Journal on Audio, Speech, and Music Processing, 2010. — 90 p. One of the most important aspects of spoken language is its large degree of variability. Variability in speech is caused by many different sources, for instance, changes of the acoustic environment or transmission channel and differences between speakers or various speaking styles. Successful speech processing...
Springer, 1996. — 682 p. This book is one outcome of the NATO Advanced Studies Institute (ASI) Workshop, "Speechreading by Man and Machine," held at the Chateau de Bonas, Castera-Verduzan (near Auch, France) from August 28 to September 8, 1995 - the first interdisciplinary meeting devoted the subject of speechreading ("lipreading"). The forty-five attendees from twelve...
Springer, 2010. — 354 p. This book describes the development and evaluation of a novel type of spoken language dialogue system that proactively interacts in the conversation with two users. Spoken language dialogue systems are increasingly deployed in more and more application domains and environments. As a consequence, the demands posed on the systems are rising rapidly. In...
PhD dissertation. — Cambridge University, 2003. — 163 p. Most modern speech recognition systems use either Mel-frequency cepstral coefficients or perceptual linear prediction as acoustic features. Recently, there has been some interest in alternative speech parameterisations based on using formant features. Formants are the resonant frequencies in the vocal tract which form the...
Springer, 2007. — 279 p. The last meeting of the Management Committee of the COST Action 277: Nonlinear Speech Processing was held in Heraklion, Crete, Greece, September 20–23, 2005 during the Workshop on Nonlinear Speech Processing (WNSP). This was the last event of COST Action 277. The Action started in 2001. During the workshop, members of the Management Committee and...
Springer, 2011. — 82 p. Spoken dialog systems have been the object of intensive research interest over the past two decades, and hundreds of scientif c articles as well as a handful of text books such as [25, 52, 74, 79, 80, 83] have seen the light of day. What most of these publications lack, however, is a link to the real world, i.e., to conditions, issues, and environmental...
PhD dissertation. — Carnegie Mellon University, 1996. — 113 p. Speech recognition systems suffer from degradation in recognition accuracy when faced with input from noisy and reverberant environments. While most users prefer a microphone that is placed in the middle of a conference table, on top of a computer monitor, or mounted in a wall, the recognition accuracy obtained with...
Springer, 2013. — 278 p. Since the release of the first Internet Phone in 1995, Voice over Internet Protocol (VoIP) has grown exponentially, from a lab-based application to today’s established technology, with global penetration, for real-time communications for business and daily life. Many organisations are moving from the traditional PSTN networks to modern VoIP solutions...
Диплом (Master), Mississippi State University, 2003, -62 pp. Supervised learning using Hidden Markov Models has been used to train acoustic models for automatic speech recognition for several years. Typically clean transcriptions form the basis for this training regimen. However, results have shown that using sources of readily available transcriptions, which can be erroneous...
CRC Press, 2000. — 798 p. Speech has evolved over a period of tens of thousand of years as the primary means of communication between human beings. Since the evolution of speech and of homo sapiens have proceeded hand-inhand, it seems reasonable to assume that human speech production mechanisms, and the resulting acoustic signal, are optimally adapted to human speech perception...
Springer, 2023. — 214 p. — (Artificial Intelligence: Foundations, Theory, and Algorithms). — ISBN 978-981-99-0826-4. Text-to-speech (TTS) synthesis is an Artificial Intelligence (AI) technique that renders a preferably naturally sounding speech given an arbitrary text. It is a key technological component in many important applications, including virtual assistants, AI-generated...
Springer, 2008. — 403 p. The remarkable advances in computing and networking have sparked an enormous interest in deploying Automatic Speech Recognition on Mobile Devices and Over Communication Networks, and the trend is accelerating. This yields an abundance of practical systems, operational algorithms and scientific publications. There is, however, no integrated book...
PhD dissertation. — Massachusetts Institute of Technology, 2005. — 123 p. Automatic speech recognition (ASR) is a process of applying constraints, as encoded in the computer system (the recognizer), to the speech signal until ambiguity is satisfactorily resolved to the extent that only one sequence of words is hypothesized. Such constraints fall naturally into two categories....
Wiley, 2005. — xi, 342 p. — ISBN 978-0470012604. With a growing need for understanding the process involved in producing and perceiving spoken language, this timely publication answers these questions in an accessible reference. Containing material resulting from many years’ teaching and research, Speech Synthesis provides a complete account of the theory of speech. By bringing...
Cambridge University Press, 2009. — 642 p. Speech processing technology has been a mainstream area of research for more than 50 years. The ultimate goal of speech research is to build systems that mimic (or potentially surpass) human capabilities in understanding, generating and coding speech for a range of human-to-human and human-to-machine interactions. In the area of speech...
Cambridge University Press, 2009. — 642 p. Speech processing technology has been a mainstream area of research for more than 50 years. The ultimate goal of speech research is to build systems that mimic (or potentially surpass) human capabilities in understanding, generating and coding speech for a range of human-to-human and human-to-machine interactions. In the area of speech...
PhD dissertation. — Carnegie Mellon University, 1995. — 190 p. This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their...
PhD dissertation. — Queensland University of Technology, Australia, 2005. — 248 p. Keyword Spotting is the task of detecting keywords of interest within continuous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword...
Springer, 2018. — 82 p. With the invention of less expensive means of internet access, voice communication via social media is on the rise, which often comprises threats and distortions. Incorrect speaker/speech identification may sometimes lead to ambiguities in speaker identification and misunderstandings. Therefore, proper identification of speech is a must in speech...
2014. — 88 p. — ASIN B00NV4DZ86. Learn to love Dragon Naturally Speaking with just 100+ Commands Get off to a flying start, improve your skills, speak with confidence - using this new 60 page, illustrated colour guide. Dragon speech recognition can transform the way people work with their computers - students, doctors, writers, family historians, people with dyslexia or...
Springer, 2013. — 142 p. Speech is the most natural mode of communication and yet attempts to build systems which support robust habitable conversations between a human and a machine have so far had only limited success. A key reason is that current systems treat speech input as equivalent to a keyboard or mouse, and behaviour is controlled by pre-defined scripts that try to...
Springer, 2008. — 176 p. Applications of Discrete Wavelet Transform and Wavelet Denoising to Speech Classification, Speech Enhancement and Robust Speech Recognition In this work, we study the application of wavelet analysis for robust speech processing. Reliable time-scale features (TS) which characterize the relevant phonetic classes such as voiced (V), unvoiced (UV), silence...
John Wiley, 2011. — 471 p. There are a number of books and textbooks on speech processing or natural language processing (even some covering speech and language processing), there are no books focusing on spoken language understanding (SLU) approaches and applications. In that respect, living between two worlds, SLU has not received the attention it deserves in spoken language...
Radboud Repository of the Radboud University. — Nijmegen, 2020. — 5 p. The paper presents an implemented model for priming speech recognition, using contextual information about salient entities. The underlying hypothesis is that, in human-robot interaction, speech recognition performance can be improved by exploiting knowledge about the immediate physical situation and the...
Springer, 2021. — 180 p. — (T-Labs Series in Telecommunication Services). — ISBN 978-3-030-71388-1. Обработка человеческой информации при оценке качества речи This book provides a new multi-method, process-oriented approach towards speech quality assessment, which allows readers to examine the influence of speech transmission quality on a variety of perceptual and cognitive...
PhD dissertation. — University of Cambridge, 1995. — 170 p. Conventional speech recognition systems require information from two knowledge sources - a family of acoustic models and a language model. The acoustic models incorporate knowledge extracted from the speech waveform and they are commonly based on hidden Markov models (HMMs). HMMs have been used successfully for speech...
PhD dissertation. — Cambridge University, 2011. — 336 p. A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser’s distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model compensation techniques usually approximate the corrupted speech...
Springer, 2000. — 302 p. This book originates from the Fifth European Summer School on Language and Speech Communication that was held in the summer of 1997 in Leuven, Belgium, under the auspices of the European Language and Speech Network (ELSNET). The central topic of the summer school was "Lexicon Development for Language and Speech Processing"; the choice of this theme was...
Springer, 2005. — 371 p. The chapters in this book jointly contribute to what we shall call the field of natural and multimodal interactive systems engineering. This is not yet a well-established field of research and commercial development but, rather, an emerging one in all respects. It brings together, in a process that, arguably, was bound to happen, contributors from many...
Springer, 1995. — 589 p. Text-to-speech synthesis involves the computation of a speech signal from input text. Accomplishing this requires a system that consists of an astonishing range of components, from abstract linguistic analysis of discourse structure to speech coding. Several implications flow from this fact. First, text-to-speech synthesis is inherently...
2nd Edition. — Wiley, 2024. — 595 p. — ISBN 9781119060994. Enables readers to understand the latest developments in speech enhancement/transmission due to advances in computational power and device miniaturization The Second Edition of Digital Speech Transmission and Enhancement has been updated throughout to provide all the necessary details on the latest advances in the...
John Wiley, 2006. — 644 p. The digital processing, storage, and transmission of speech signals have gained great practical importance. The main application areas are digital mobile radio, acoustic human–machine communication, and digital hearing aids. In fact, these applications are the driving force behind many scientific and technological developments in this field. A...
Springer, 2013. — 146 p. In this book, hierarchical structures based on neural networks are investigated for automatic speech recognition. These structures are mainly evaluated in the task of phoneme recognition under the Hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) paradigm. The baseline hierarchical scheme consists of two levels where each level is based on...
PhD dissertation. — Johns Hopkins University, 2005. — 172 p. Automatic Speech Recognition (ASR) is a sequential pattern recognition problem in which the patterns to be hypothesized are words while the evidence presented to the recognizer is the acoustics of a spoken utterance. Given an acoustic signal, a speech recognizer attempts to classify it as the sequence of words that...
John Wiley, 2013. — 501 p. The term computer speech recognition conjures up visions of the science-fiction capabilities of HAL2000 in 2001, A Space Odessey, or Data, the anthropoid robot in Star Trek, who can communicate through speech with as much ease as a human being. However, our real-life encounters with automatic speech recognition are usually rather less impressive,...
Springer, 2018. — 417 p. The recent progress on machine learning and signal processing has enabled the development of technologies for automatic analysis of sound scenes and events by computational means. This has attracted several research groups and companies to investigate this new field, which has potential in several applications and also has several research challenges....
Eamon Dolan/Houghton Mifflin Harcourt, 2019. — 259 p. — ISBN10: 1328799301, 13 978-1328799302. The next great technological disruption is coming The titans of Silicon Valley are racing to build the last, best computer that the world will ever need. They know that whoever successfully creates it will revolutionize our relationship with technology—and make billions of dollars in...
Wai. C. Chu. Speech Coding Algorithms. Foundation and Evolution of Standardized Coders Mobile Media Laboratory. DoCoMo USA Labs. San Jose, California Wiley &Sons publishing. 578 pages. Speech coding is a highly mature branch of signal processing deployed in products such as cellular phones, communication devices, and more recently, voice over internet protocol This book...
Morgan Kaufmann, 1990. — 630 p. Despite several decades of research activity, speech recognition still retains its appeal as an exciting and growing field of scientific inquiry. Many advances have been made during these past decades; but every new technique and every solved puzzle opens a host of new questions and points us in new directions. Indeed, speech is such an intimate...
Springer, 2013. — 207 p. In the present book, speech transmission quality is modeled on the basis of perceptual dimensions that are relevant for today’s public-switched and packet-based telecommunication systems. The complete transmission path from the mouth of the speaker to the ear of the listener is regarded, and both narrowband (300–3400 Hz) as well as wideband (50–7000 Hz)...
Karlsruher Institut Für Technologie, 2014. — 256 p. This thesis aims at enhancing and improving myoelectric Silent Speech recognition. Based on a standard speech recognition toolchain, we systematically develop methods and algorithms to adapt these components in a way specifically suited for the EMG signal. While our main goal is to improve the recognition accuracy of the Silent...
Диплом (Master), Massachusetts Institute of Technology, 1998, -77 pp. In this thesis, eigenstructure based noise suppression techniques are developed to improve the performance of LPC spectral estimation of speech signals in the presence of additive white noise. LPC estimation error increases as the SNR of the speech signal decreases, thus affecting the performance of speech...
Kluwer, 2004. — 124 p. Speech is the most natural fonn of communication among humans. As machines become ever more capable and their use more widespread due to advances in computing. the need to allow natural communication between a human and a machine also gains critical significance. In order to realize such a system, it is essential that the speech communication process is well...
Диплом (Master), Massachusetts Institute of Technology, 2009. — 90 p. This research explores applications of joint letter-phoneme subwords, known as graphones, in several domains to enable detection and recognition of previously unknown words. For these experiments, graphones models are integrated into the SUMMIT speech recognition framework. First, graphones are applied to...
Диплом (Master), Massachusetts Institute of Technology, 2008. — 135 p. This thesis addresses the problem of obtaining an accurate spectral representation of speech formant structure when the voicing source exhibits a high fundamental frequency. Our work is inspired by auditory perception and physiological modeling studies implicating the use of temporal changes in speech by...
PhD dissertation. — Cambridge University, 2014. — 231 p. Model-based approaches are a powerful and exible framework for robust speech recognition. This framework has been extensively investigated during the past decades and has been extended in a number of ways to handle distortions caused by various acoustic factors, including speaker di_erences, channel distortions and...
PhD dissertation. — Massachusetts Institute of Technology, 2001. — 191 p. The general goal of this thesis is to model the prosodic aspects of speech to improve human-computer dialogue systems. Towards this goal, we investigate a variety of ways of utilizing prosodic information to enhance speech recognition and understanding performance, and address some issues and difficulties...
Cambridge: Cambridge University Press, 2015. — 446 p. — ISBN 978-1-107-05557-5. With this comprehensive guide you will learn how to apply Bayesian machine learning techniques systematically to solve various problems in speech and language processing. A range of statistical models is detailed, from hidden Markov models to Gaussian mixture models, n-gram models and latent topic...
Cambridge University Press, 2015. — xxii, 424 p. — ISBN 978-1-107-05557-5. With this comprehensive guide you will learn how to apply Bayesian machine learning techniques systematically to solve various problems in speech and language processing. A range of statistical models is detailed, from hidden Markov models to Gaussian mixture models, n-gram models and latent topic...
Springer, 2017. — 436 p. — ISBN: 9783319646794. The text provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include...
New York: Springer, 2017. — 436 p. The text provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include descriptions of...
Springer, 1987. — 168 p. This book has its origins in a programme of work conducted at British Telecom Research Laboratories, aimed at developing easily usable, intelligent systems, based on human-computer interaction via spoken and written language, particularly the former. This involved the authors, as members of the Human Factors Division, in conduct-, ing a series of...
PhD dissertation. — Stanford University, 1985. — 155 p. This thesis is concerned with how a person can listen to one person speaking in the presence of an interfering talker using a monaural recording of the conversation. Of course people have two ears, and the directional capabilities that a person gains from using two ears to focus on one talker are very important. However,...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2002. — 191 p. In this thesis, the use of word posterior probabilities for large vocabulary continuous speech recognition is investigated in a unified, statistical framework. The word posterior probabilities are directly derived from the sentence posterior probabilities which are an essential part of Bayes’...
PhD dissertation. — Katholieke Universiteit Nijmegen, 2002. — 149 p. Speech is variable. The way in which a sound, word or sequence of words is pronounced can be different every time it is produced (Strik and Cucchiarini 1999). This pronunciation variation can be the result of: Intra-speaker variability: the variation in pronunciation for one and the same speaker. Inter-speaker...
PhD dissertation. — Cambridge University, 1999. — 266 p. The thesis considers a novel technique for adaptation of speaker models, called eigenvoice decomposition (ED), based around reducing the dimension of the search space of acoustic models. The technique is compared both pradically and theoretically with several other adaptation techniques. The use of Principal Component...
PhD dissertation. — University of Cambridge, 2000. — 141 p. This dissertation concerns the development of statistical language models for use in automatic speech recognition systems. Natural language, which is a complex and variable phenomenon, has been shown to be modelled best using statistical language models. Large training corpora (comprising around one hundred million...
PhD dissertation. — Massachusetts Institute of Technology, 1995. — 159 p. This thesis studies and interprets the inventory of acoustic events associated with the changing vocal-tract configurations that characterize fricatives preceding vowels. Theoretical considerations of the articulatory, aerodynamic and acoustic aspects of the production of fricatives provide the foundation...
PhD dissertation. — Cambridge University, 1999. — 156 p. Computer-assisted language learning (CALL) systems which are able to listen to a student's speech and to judge its quality would be very valuable for foreign language teaching. However, currently it is difficult to integrate pronunciation teaching and assessment in computer-assisted language learning systems. Two major...
Delmar, Cengage Learning, 2009. — 396 p. — ISBN: 1435427270. Understanding Voice Over IP Technology provides students with the in-depth knowledge of Voice over IP and the TCP/IP protocol that it is based on. Voice over IP technology, or making telephone calls over data networks such as the Internet, has now reached the tipping point, and is expected to eventually become the...
John Wiley, 2009. — 584 p. Серьезная книга по современным речевым технологиям As the authors of Distant Speech Recognition note, automatic speech recognition is the key enabling technology that will permit natural interaction between humans and intelligent machines. Core speech recognition technology has developed over the past decade in domains such as office dictation and...
PhD dissertation. — Johns Hopkins University, 2002. — 174 p. In this thesis, we have studied how to use non-local dependencies to improve the performance of language models and how to combine useful information obtained from difference sources together in one framework using maximum entropy app roaches. We have presented fast training methods to solve the problem of heavy...
PhD dissertation. — Queen’s University, Kingston, Ontario, Canada, 2009. — 126 p. Automatic recognition of human emotion in speech aims at recognizing the underly- ing emotional state of a speaker from the speech signal. The area has received rapidly increasing research interest over the past few years. However, designing powerful spec- tral features for high-performance speech...
Elsevier, 2015. — 194 p. In the information communication field, speech communication via network becomes an important way to transfer information. With the development of information technology, speech communication is widely used for military, diplomatic, and economic purposes as well as in cultural life and scientific research. Therefore, speech secure communication and the...
Springer, 2014. — 215 p. Speech and hearing sciences are fundamental to numerous technological advances of the digital world in the past decade, from music compression in MP3 to digital hearing aids, from network based voice enabled services to speech interaction with mobile phones. Mathematics and computation are intimately related to these leaps and bounds. On the other hand,...
PhD dissertation. — University of Missouri-Columbia, 2007. — 101 p. In this dissertation work, new approaches are proposed for online large vocabulary conversational speech recognition, including a fast confusion network algorithm for aligning competing word hypotheses, novel features and a Random Forests based classifier for word confidence annotation, new improvements in...
PhD dissertation. — University of Cambridge, 2015. — 259 p. In continuous speech recognition, observations are sequential data with variable length, and labels are sequence of words (or sub-words) possibly having unbounded number of classes. It is thus impractical to robustly constructmodels for the whole word sequence. To address this problem, rather than treating the whole...
PhD dissertation. — École Polytechnique Fédérale de Lausanne, 2015. — 124 p. Speaker diarization is the task of identifying who spoke when in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers. Diagnostical studies on state-of-the-art diarization systems have isolated three main issues with the...
Kluwer, 1997. — 247 p. This book originates from the 2nd European Summer School on Language and Speech Communication that was held in the summer of 1994 in Utrecht, The Netherlands. During two weeks, 90 participants enjoyed 14 courses that were focussed on the theme "Corpus-Based Methods in Language and Speech Processing". The enthusiasm of the participants for the topic and the...
Springer, 2014. — 321 p. — ISBN10: 1447157788, ISBN13: 978-1-4471-5778-6. This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models including deep neural networks and many of their variants. This is the first automatic speech recognition book dedicated to the deep learning approach. In...
Диплом (Master), Massachusetts Institute of Technology, 2008. — 75 p. Efficient error correction of recognition output is a major barrier in the adoption of speech interfaces. This thesis addresses this problem through a novel correction framework and user interface. The system uses constraints provided by the user to enhance re-recognition, correcting errors with minimal user...
PhD dissertation. — University of Cambridge, 2006. — 194 p. In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it...
PhD dissertation. — Brown University, 2007. — 139 p. Talker recognition and microphone arrays have each been widely studied individually. The problem of distant-talking speech recognition using microphone arrays has become a topic of an increasing number of research papers recently. However, the problem of distant-talking speaker recognition is receiving much less attention. In...
PhD dissertation. — Cambridge University, 2014. — 221 p. Discriminative training criteria and discriminative models are two e.ective improvements for HMM-based speech recognition. This thesis proposed a structured support vectormachine (SSVM) framework suitable formedium to large vocabulary continuous speech recognition. An important aspect of structured SVMs is the form of...
Диссертация (Master), McGill University, 2000. — 53 p. Automatic speech recognition by machine has been a goal of speech researchers for more than 40 years. In recent years we have seen great advances in speech recognition technology. Some speech recognition techniques have entered into the market place and been used in applications such as command-and-control, credit-card...
PhD dissertation. — University of Illinois, 1999. — 119 p. The standard hidden Markov model (HMM) has been proved to be the most successful model for speech recognition. A most widely addressed problem of the HMM is the assumption of independent observations given the state sequence. In the past few years, a wide range of state-space models and graphical models, such as...
PhD dissertation. — Rheinisch-Westfälische Technische Hochschule, 2006. — 156 p. In this thesis, the use of multiple acoustic features of the speech signal is considered for speech recognition. The goals of this thesis are twofold: on the one hand, new acoustic features are developed, on the other hand, feature combination methods are investigated in order to find an effective...
Москва: Изд-во "Радио и связь", 2004. 164 с.
Аннотация.
В книге рассматриваются методы обработки цифровой речи, предназначенные для формирования последовательности векторов признаков и два типа задач классификации речевого сигнала: распознавание слитной речи, идентификация диктора по его голосу. В задаче формирования векторов признаков основное внимание уделяется методам...
Москва: Радио и связь, 2004. — 164 с. В книге рассматриваются методы обработки цифровой речи, предназначенные для формирования последовательности векторов признаков и два типа задач классификации речевого сигнала: распознавание слитной речи, идентификация диктора по его голосу. В задаче формирования векторов признаков основное внимание уделяется методам обнаружения и фильтрации...
Автореферат диссертации на соискание ученой степени кандидата технических наук: 05.11.16 - Информационно-измерительные и управляющие системы (приборостроение). — Пензенский государственный университет. — Пенза, 2015. — 24 с. Научный руководитель: д.т.н., профессор Чураков П.П. Целью диссертационного исследования является совершенствование существующих и разработка новых алгоритмов...
Диссертация на соискание ученой степени кандидата технических наук: 05.11.16 - Информационно-измерительные и управляющие системы (приборостроение). — Пензенский государственный университет. — Пенза, 2015. — 222 с. Научный руководитель: д.т.н., профессор Чураков П.П. Введение Аналитический обзор алгоритмов обработки речевых команд и систем голосового управления Анализ предметной...
Автореферат диссертации на соискание ученой степени кандидата технических наук, УлГТУ, Ульяновск, 2006. - 19 с.
Специальность - 05.13.18 Математическое моделирование, численные методы и комплексы программ
Научный руководитель - доктор технических наук, зав. кафедрой САПР УлГТУ, профессор Крашенинников В. Р.
Целью диссертации является разработка эффективных методов...
М.: Воениздат, 1974. — 136 с.: ил. В брошюре излагается одна из наиболее сложных проблем нашего времени — автоматическое распознавание речевых сигналов и машинное (искусственное) воспроизводство связной речи. Брошюра охватывает все основные аспекты этой проблемы, в ней сформулированы предпосылки, обусловившие необходимость создания техники для прямого речевого общения человека...
М.: Воениздат, 1974. — 136 с.: ил. В брошюре излагается одна из наиболее сложных проблем нашего времени — автоматическое распознавание речевых сигналов и машинное (искусственное) воспроизводство связной речи. Брошюра охватывает все основные аспекты этой проблемы, в ней сформулированы предпосылки, обусловившие необходимость создания техники для прямого речевого общения человека...
Киев: Наук. думка, 1987. – 264 с.
В монографии рассматриваются вопросы автоматического анализа, распознавания, смысловой интерпретации, синтеза и компрессированной передачи речевых сигналов применительно к устному диалогу человека и ЭВМ на формализованных и естественных языках предметных областей для использования в человеко-машинных системах сбора, обработки информации и...
СПб: Университет ИТМО, 2022. – 86 с. Содержится набор лабораторных работ по ряду разделов курса «Распознавание диктора». Приводятся основные теоретические сведения, необходимые для выполнения каждой работы, содержание, порядок выполнения и требования для сдачи лабораторного задания, контрольные вопросы, списки рекомендуемой литературы, а также примеры программных кодов,...
Работа на получение квалификационного уровня магистра электроники.
Специальность: 8.090803 - Электронные системы.
Научный руководитель - Велигорский Александр Анатольевич.
Чернигов, 2004.
Разделы:
Аналитический обзор методов речевого кодирования.
Принципы построения систем с адаптивной дифференциальной импульсно-кодовой модуляцией.
Влияние параметров системы на качество...
Деркач М.Ф., Гумецкий Р.Я., Гура Б.М., Чабан М.Е.
Львов: Вища школа, 1983. — 168 с.
В монографии рассматриваются динамические спектрограммы звуков, слогов, слов и слитных фраз русской речи. Основное внимание уделено отображению на спектрограммах работы артикуляционных органов в процессе произношения речевых сигналов. Особое значение придается изучению динамики артикуляционного...
Вища школа, 1983. — 169 с. В монографии рассматриваются динамические спектрограммы звуков, слогов, слов и слитных фраз русской речи. Основное внимание уделено отображению на спектрограммах работы артикуляционных органов в процессе произношения речевых сигналов. Особое значение придается изучению динамики артикуляционного процесса, отображенной в динамических спектрах речи....
Тбилиси.: Мецниереба, 1974. — С. 87-140. Книга является очередным томом серии сборников, составленных в лаборатории экспериментальной фонетики Института языкознания Академии наук Грузинской ССР. Серия посвящается вопросам общей фонетики и фонетики картвельских языков. Первые четыре сборника вышли в 1966-1973 гг. под названиями: "Вопросы анализа и синтеза речи" (1966), "Вопросы...
Статья. — Электроника и связь. Тематический выпуск "Электроника и нанотехнологии". — 2010. — №3. — С.152-159. Исследовано влияние шумовой и реверберационной помех на точность измерений функции распределения уровней речевого сигнала.
В 2-х частях. — Учебно-методическое пособие. — Минск: БГУИР, 2008. — 44 с. Настоящее пособие представляет собой описание методических рекомендаций к выполнению лабораторных работ по курсу «Речевой интерфейс». Первая часть пособия включает лабораторные работы, связанные с исследованием закономерностей русской речи и практическим изучением основных характеристик речевого сигнала....
Диссертация на соискание ученой степени доктора философии (PhD). : 6D060200 – Информатика. — Евразийский национальный университет им. Л. Н. Гумилева. — Астана: 2014. — 175 с. Научные руководители: доктор технических наук, профессор Шаріпбай А.Ә. доктор физико-математических наук, профессор Шелепов В.Ю. Целью данной диссертационной работы является комплексное исследование и...
Диссертация, Санкт-Петербургский институт информатики и автоматизации, 2007, -176 pp. Основной целью диссертациионной работы является разработка модели дикторонезависимого распознавания русской слитной речи с большим словарем, которая обеспечивает ускорение процесса обработки речи при сохранении точности распознавания. Для достижения поставленной цели в ходе диссертационной...
К.: Полиграф Консалтинг, 2005. — 138 с. В книге представлено спектрально-временное описание речевых сигналов как функций многих переменных. Приведено решение задач нахождения параметров частотной функции речеобразующей системы в одно- и двухмерном случаях по спектральной функции речевого сигнала. Приведены также некоторые алгоритмы и классификация задач базы знаний распознавания...
М.: Мир, 1985. — 237 с. — (В мире науки и техники). Книга рассказывает о теоретических исследованиях и практических разработках в технике синтеза речи. Автор приводит также конкретные схемы электронных блоков, используемых в реальных синтезаторах речи. Основы компьютерного синтеза речи. Как мы говорим Немного о лингвистике. Этика поведения компьютера - синтезатора речи. Немного...
М.: Мир, 1985. — 237 с. — (В мире науки и техники). Книга рассказывает о теоретических исследованиях и практических разработках в технике синтеза речи. Автор приводит также конкретные схемы электронных блоков, используемых в реальных синтезаторах речи. Книга адресована широкому кругу читателей, интересующихся достижениями современной техники; особенно полезна она будет...
СПб.: ГУАП, 2013. — 314 с. В монографии очерчен круг проблем, связанных с особенностями автоматического анализа разговорной русской речи в интерактивных диалоговых системах. Описаны методы дистанционной записи речи, учета вариативности произношения, компактного представления словаря, а также синтаксическо-статистического моделирования языка в системах автоматического...
Диссертация к.т.н. : 05.13.11. — Санкт-Петербургский институт информатики и автоматизации. — СПб.: 2011. — 137 с. Целью диссертационной работы является разработка методов, алгоритмов и программных средств акустико-фонетического моделирования вариативности произношения слов и синтаксическо-статистического моделирования языка для повышения точности распознавания разговорной...
Санкт-Петербургский институт информатики и автоматизации Российской Академии Наук, 2013, -316 с.
В монографии очерчен круг проблем, связанных с особенностями автоматического анализа разговорной русской речи в интерактивных диалоговых системах. Описаны методы дистанционной записи речи, учета вариативности произношения в разговорной речи, компактного представления словаря, а...
Стаття. — Наукоємні технології. — 2015. — № 3. — С. 210-220. Конахович Г. Ф. Порівняльний аналіз перетворення Фур’є, косинусного перетворення та вейвлет-перетворення як спектрального аналізу цифрових мовних сигналів / Г. Ф. Конахович, О. І. Давлет’янц, О. Ю. Лавриненко, Д. І. Бахтіяров // Наукоємні технології. — 2015. — № 3. — С. 210-220. Запропоновано використовувати метод...
Руководство к лабораторно-практическим занятиям по дисциплине "Безопасность жизнедеятельности. Часть 2 Информационная безопасность". Изд-во ТТИ ЮФУ. Таганрог, 2011. 48 с. Предназначено для студентов радиотехнических специальностей вуза с целью изучения разновидностей, характеристик, принципов построения и алгоритмических моделей аналоговых временных и частотных скремблеров...
Статья. — Информационно-телекоммуникационные технологии и матмоделирование, 2011. — с. 135-137. В работе рассматриваются основные этапы предварительной обработки звукового сигнала. Реализованный модуль предобработки включает в себя: удаление постоянной составляющей, фильтрацию, а также выделение границ речевой команды. В дальнейшем данный модуль будет использоваться в системе...
Автореферат диссертации на соискание ученой степени кандидата технических наук. УлГТУ, Ульяновск, 2007. - 19 с.
Специальность - 05.13.18 Математическое моделирование, численные методы и комплексы программ
Научный руководитель – доктор технических наук, профессор, зав. кафедрой САПР УлГТУ, Крашенинников В. Р.
Целью диссертации является разработка методов, алгоритмов и...
Дипломна робота (Магістр), Національній авіаційний університет. — Київ, 2014. — 107 с. Спеціальність 8.05090302 «Телекомунікаційні системи та мережі» Науковий керівник: д-р техн. наук, проф. Давлет'янц О.І. В дипломній роботі вирішуються актуальні проблеми стиснення мовних сигналів та шляхи підвищення її якості. Були розроблені алгоритми підвищення якості стиснення мовних...
Лабораторный практикум. ‒ Владимир : Владимирский государственный университет имени Александра Григорьевича и Николая Григорьевича Столетовых, 2024. — 84 с. — ISBN 978-5-9984-1821-1. На основе функционального моделирования в системе MatLAB проведено исследование алгоритмов обработки речевых сигналов в телефонии. Рассмотрены основные подходы к сжатию потоков данных речи:...
У. А. Ли, Э. П. Нейбург, Т. Б. Мартин, Дж. Р. Уэлч, В. У. Зу, Р. М. Шварц, Дж. Е. Шуп, А. Р. Смит, М. Р. Самбур, Ф. Хейс-Роз, Г. Гудмэн, Р. Редди.
Методы автоматического распознавания речи: В 2-х книгах. Пер. с англ. /Под ред. У. Ли. – М.: Мир, 1983. – Кн. 1. 328 с., ил.
Монография написана ведущими специалистами США, Франции, Италии, Японии и Польской Народной Республики в...
Дж. А. Барнет, М. И. Бернстейн и др.
Методы автоматического распознавания речи: В 2-х книгах. Пер. с англ. /Под ред. У. Ли. – М.: Мир, 1983. – Кн. 2. 392 с., ил.
Монография написана ведущими специалистами США, Франции, Италии, Японии и Польской Народной Республики в области распознавания речи. В русском переводе выпускается в двух книгах. Книга 2 посвящена конкретным системам...
Автореферат диссертации.
Специальности:
05.13.01 – Системный анализ, управление и обработка информации
(в науке и технике)
05.11.16 – Информационно-измерительные и управляющие системы
(в промышленности и медицине)
Работа выполнена в Государственном образовательном учреждении высшего
профессионального образования «Ижевский государственный технический
университет» (ГОУ ВПО...
Пер. с англ. — Под ред. Ю. Н. Прохорова и В. С. Звездина. — М.: Связь, 1980. — 308 с.: ил. В книге излагается в полном объеме комплекс вопросов, связанных с обработкой речевых сигналов с помощью методов линейного предсказания. Представлены алгоритмы анализа речи и процедуры ее синтеза по множеству информативных параметров, доведенные до программ на языке ФОРТРАН. Рассмотрены...
Пер. с англ. — Под ред. Ю. Н. Прохорова и В. С. Звездина. — М.: Связь, 1980. — 308 с. В книге излагается в полном объеме комплекс вопросов, связанных с обработкой речевых сигналов с помощью методов линейного предсказания. Представлены алгоритмы анализа речи и процедуры ее синтеза по множеству информативных параметров, доведенные до программ на языке ФОРТРАН. Рассмотрены вопросы...
Учебно-методическое пособие. — СПб.: Университет ИТМО, 2017. — 45 с. Методические указания предназначаются для магистрантов направления 09.04.02, изучающих дисциплину «Синтез речи». Лабораторные работы, представленные в методических указаниях, составляют практикум по курсу "Синтез речи" и помогут студентам подробно ознакомиться с технологией синтеза интонационной речи. В...
Монографія. — Херсон: вид-во ФОП Вишемирський В.С., 2018. — 168 с. Проаналізовано існуючі на сьогоднішній день методи аналізу голосового сигналу людини. Досліджено сучасні методи аутентифікації особистості, які основані на аналізі голосового сигналу. Розроблено метод локальних максимумів, який дає точніші результати сегментації голосового сигналу у порівнянні з існуючими...
Статья. — Речевые технологии. — 2008. — №1. — С. 93-113. — Речевые технологии. — 2008. — №2. — С. 81-96. В статье проведен анализ развития технологий преобразования речи, в основном применительно к средствам связи, от первых вокодеров конца 1930-х до речевых кодеков мобильных систем. Содержание: Введение Первый этап (1936 - 1952 гг.) Второй этап (1953 - 1974 гг.) Качество...
Под ред. Сапожкова М. А. — М.: Радио и связь, 1987. — 168 с.
Во многих научных центрах в СССР и за рубежом ведутся интенсивные исследования в области передачи сигналов речи по узкополосным каналам связи, автоматического распознавания речевых команд в системах обработки и передачи данных, обучению людей с дефектами слуха и речи, иноязычных и др. Данным исследованиям посвящены...
Автореферат диссертации на соискание ученой степени кандидата технических наук. Специальность: 05.12.04 Радиотехника, в том числе системы и устройства телевидения. — Владимир, ЯрГУ, 2011. — 20 с. В настоящее время системы распознавания речи получают все большее распространение, особенно в тех приложениях, где речевой диалог является наиболее удобным средством управления и обмена...
Кишинёв: Штиинца, 1987. — 175 с. Рассматриваются общие вопросы построения систем автоматического распознавания и синтеза речи. Содержатся сведения о речеобразовании и речевых сигналах, цифровой обработке речи, даётся краткое описание современных отечественных и зарубежных систем распознавания и синтеза речи. Книга рассчитана на массового читателя, студентов технических вузов,...
Учебно-методическое пособие для студентов специальности «Электронные вычислительные средства» дневной формы обучения. — Минск: БГУИР, 2005. — 51 с. Учебно-методическое пособие содержит описание алгоритмов, применяемых для обработки речи: детектора речи, анализа на основе линейного предсказания, векторного квантования. Даны примеры применения векторного квантования для...
Методическое пособие для студентов специальности “Электронные вычислительные средства” дневной формы обучения. — Минск: БГУИР, 2004. — 66 с. В методическом пособии рассматриваются методы компрессии речевых сигналов с психоакустической мотивацией на основе линейного предсказания и пакета дискретного вэйвлет-преобразования(ПДВП). Показаны алгоритмическое обеспечение и программная...
Перев. Попова Р., Кемерово, 2000. - 79 с.
Дата выхода оригинальной работы - 1993 г.
В этой работе мы рассмотрим компоненты алгоритмов обработки сигнала. Эти алгоритмы приводятся как часть общего обзора задачи параметризации сигнала, которазя делится на три направления: измерение, преобразование и статистическое моделирование. В соответствии с этой целью в работу включено...
Монография. — М.: Государственное издательство литературы по вопросам связи и радио, 1962. — 391 с. В монографии «Расчёт и измерение разборчивости речи» излагается теория разборчивости с качественным и количественным описанием свойств и акустических характеристик речи и слуха, определяющих величину фонетической и смысловой информации, передаваемой по телефонным и...
М.: Радио и связь, 1989. — 248 с., ил. — ISBN: 5-256-00267-8. Монография посвящена описанию современного состояния развития техники, использующей возможности речевой связи между человеком и машиной (роботом). Эта область научных исследований и технических разработок прогрессивно развивается в наиболее развитых в техническом отношении странах, что связано в первую очередь с...
М.: Радио и связь, 1989. — 248 с. Монография посвящена описанию современного состояния развития техники, Использующей возможности речевой связи между человеком и машиной (роботом). Эта область научных исследований и технических разработок прогрессивно развивается в наиболее развитых в техническом отношении странах, что связано в первую очередь с освоением вычислительной техники и...
Изд. 3-е. — М.: КомКнига, 2012. — 328 с. Книга посвящена проблемам управления техническими устройствами с помощью устной речи, что имеет непосредственное отношение к развитию робототехнических систем, управляемых голосом. В работе отражены различные аспекты лингвистического компонента в подобных системах. Подчеркивается особое значение исследований в области фундаментального и...
MVP - Программа для изменения голоса в реальном времени, лучшая в своем роде и хорошо настраиваемая. С программой идёт звуковой драйвер, в установке очень проста. Вирусов нет. Год выпуска: 2012. Язык интерфейса: English. Лекарство: присутствует ОС: Windows XP/Vista/7.
Программа для изменения голоса AV VCS 4.0.48.
Интересная программа о которой просто мечтали шутники, телефонные террористы и любители караоке! Вам только остается говорить (петь) в микрофон, а программа сама изменит голос в реальном времени. Хотите говорить (петь) голосом Баскова, Шифутинского? Да не вопрос. Хотите заговорить женским или мужским голосом - пожалуйста!...
Статья. — Системи обробки інформації. — 2014. — № 7(123). — С. 59-66. При построении систем автоматического распознавания речи актуальной является задача коррекции речевых сигналов, искаженных реверберацией, для решения которой необходимо предварительно измерить время реверберации. Слепые измерения времени реверберации менее эффективны, в смысле точности распознавания речи, по...
Труды XXXI Международной научно-технической конференции "Электроника и нанотехнологии", Апрель 12-14, 2011. Произведены аналитические и экспериментальные исследования алгоритмов формантно-модуляционного метода оценивания разборчивости речи.
Статья. — Электроника и связь. — 2009. — С. 18-25. Произведено оценивание и сопоставление плотностей распределения формант по частоте для украинской и русской речи.
Науковi вістi НТУУ "КПI". — 2013. — №1 — С. 13-19. В данной работе произведен поиск элементов технологии сравнительно быстрого построения звуковой части корпусов зашумленной украинской речи. С этой целью рассмотрены характеристики наиболее востребованных современных корпусов зашумленной речи, что позволило сформулировать принципы разработки таких корпусов. Правильность...
Статья. — Электроника и связь. Тематический выпуск "Электроника и нанотехнологии". — 2010. — №2. — С.217-223. Показана возможность объединения достоинств формантного и модуляционного методов и создания нового, формантно-модуляционного, инструментального метода оценивания разборчивости речи.
Статья. — Электроника и связь. Тематический выпуск «Проблемы электроники». Часть 1 — 2008. — С. 227-231. Сопоставлены основные версии формантного метода оценки разборчивости речи.
Статья. — Электроника и связь. — 2007. — №5 — С. 63-70. Произведен краткий аналитический обзор версий формантного метода расчета и измерений разборчивости речи.
Электроника и связь, тематический выпуск "Электроника и нанотехнологии ", ч.2, 2009. №4-
5. - С. 88 - 94
Сопоставлены функции распределения украинской и русской речи. Сделаны предварительные выводы о сравнительной разборчивости украинской и русской речи в одинаковых помеховых условиях.
Электроника и связь, тематический выпуск "Проблемы электроники", ч.1, 2007, с.137- 141. Рассмотрены некоторые особенности расчета и измерения разборчивости речи при малых отношениях сигналшум. Получены условия корректного измерения функции распределения речевого сигнала.
Электроника и связь, тематический выпуск "Проблемы электроники", ч.1, 2007, с.142- 147. Рассмотрены некоторые особенности расчета и измерения разборчивости речи при малых отношениях сигнал-шум. Уточнена методика пересчета функций распределения речевого сигнала в коэффициенты восприятия.
Известия вузов. Радиоэлектроника, т. 57, № 2, 2014, с. 55-59. Эффективность защитных конструкций обычно оценивают по критерию «отношение сигнал-помеха» в точке приема. В данной работе предложено использовать более удобный, с точки зрения конечного пользователя, критерий в виде разборчивости речи.
Статья. — Электроника и связь. Тематический выпуск "Электроника и нанотехнологии". — 2011. — №2. — С. 79-85. Описана процедура оценивания разборчивости речи формантно-модуляционным методом. Получены аналитические выражения для погрешности измерений. Произведено компьютерное моделирование соответствующей измерительной системы. Показано, что результаты измерений хорошо...
Электроника и связь. — 2011. — №6. — С. 16-24. Сопоставлены по точности и скорости измерений традиционный формантный и новый формантно-модуляционный методы измерений разборчивости речи.
Статья. — Электроника и связь. — 2010. — №6 Часть 2. — С.117-124. Разработаны схемы модельных и экспериментальных исследований сравнительной эффективности формантного и формантно-модуляционного методов оценивания разборчивости речи
Произведены аналитические и экспериментальные исследования влияния реверберационной помехи на точность измерений функции распределения уровней речевого сигнала.
Статья. — Известия высших учебных заведений. Радиоэлектроника. — 2015 Том 58. — № 7. — С.40-47. В данной работе с использованием методов компьютерного моделирования выработаны рекомендации, позволяющие оптимизировать оценивание спектра поздней реверберации по таким критериям как качество речевого сигнала и точность автоматического распознавания речи.
Электроника и связь, т.20, №2(85), 2015. - С.33-
40. Установлено, что для слуховой системы человека приемлемыми являются фазовые искажения речевых сигналов, если максимальная разница групповых времен задержки в области высоких и низких частот не превышает 50 мс – в этом случае интерференция между смежными гласными и согласными звуками на слух практически незаметна.
М.: Радио и связь, 1981. — 496 с., ил.
Рассматриваются вопросы цифровой обработки речевых сигналов в системах передачи информации и управления ЭВМ голосом. Излагаются проблемы цифрового представления речевых сигналов: временная дискретизация, интерполяция, квантование, проектирование цифровых фильтров. Обсуждаются способы построения цифровых систем передачи, систем...
Пер. с англ. Под ред. М. В. Назарова и Ю. Н. Прохорова. — М.: Радио и связь, 1981. — 496 с.: ил. Рассматриваются вопросы цифровой обработки речевых сигналов в системах передачи информации и управления ЭВМ голосом. Излагаются проблемы цифрового представления речевых сигналов: временная дискретизация, интерполяция, квантование, проектирование цифровых фильтров. Обсуждаются...
Санкт-Петербургский политехнический университет Петра Великого, Институт компьютерных наук и технологий, Кафедра компьютерных интеллектуальных технологий, Тимонин В.М., Санкт-Петербург, 2015, 80 с. Тема относится к области цифровой обработки речевого сигнала. Предложен и реализован новый алгоритм морфинга человеческого голоса с дополнительными возможностями. Написано приложение,...
Тбилиси: Мецниереба, 1976. — 183 с. Монография посвящена проблеме автоматической идентификации голосов. В ней затронут круг вопросов, связанных с исследованием индивидуальных особенностей голоса, проявляющейся в процессе реальной речевой активности человека. Подробно обсуждается роль как отдельных фонем и их сочетаний, так и более сложных семантических единиц речи в передаче...
СПб.: Университет ИТМО, 2014. — 92 с. В учебном пособии рассматриваются технологии синтеза интонационной речи. Синтез речи является одной из важнейших задач речевой обработки и имеет широкое применение в современных информационных технологиях. Материал пособия разбит на 6 разделов. Изложены история вопроса и основные этапы разработки систем автоматического синтеза. Пособие...
М.: Связьиздат, 1963. — 452 с. Книга посвящена преобразованиям речи применительно к задачам техники связи и кибернетики. Книга рассчитана на специалистов в области техники связи, автоматики, кибернетики, инженеров, аспирантов и научных сотрудников, изучающих вопросы преобразования речи.
М.: Связьиздат, 1963. — 452 с. Книга посвящена преобразованиям речи применительно к задачам техники связи и кибернетики. Книга рассчитана на специалистов в области техники связи, автоматики, кибернетики, инженеров, аспирантов и научных сотрудников, изучающих вопросы преобразования речи.
М.: Государственное издательство литературы по вопросам связи и радио, 1963. — 452 с. Книга посвящена преобразованиям речи применительно к задачам техники связи и кибернетики. Книга рассчитана на специалистов в области техники связи, автоматики, кибернетики, инженеров, аспирантов и научных сотрудников, изучающих вопросы преобразования речи.
М.: Наука, 1992. — 392 с. — ISBN 5-02-014665-Х. Синтез речи с использованием ЭВМ является составной частью современной информационной технологии. Методы синтеза речи находят широкое применение в информационно-справочных системах, в системах обучения с помошыо ЭВМ и т. д. Читатель, обратившись к этой книге, сможет познакомиться с различными методами моделирования процессов...
СПб.: НИУ ИТМО, 2021. – 101 с. Пособие адресовано студентам магистратуры, обучающимся по направлению «Информационные системы и технологии» по профилю подготовки «Речевые информационные системы». В пособии изложены основы анализа и обработки речевых сигналов. Материал пособия представляет собой базу для последующего освоения углубленных курсов обработки речевых сигналов....
Учебное пособие. — Санкт-Петербург: Университет ИТМО, 2024. — 97 с. В книге изложены материалы второй части курса лекций «Цифровая обработка речевых сигналов», прочитанных в течение ряда лет студентам, обучающимся по направлению «Информационные системы и технологии». Книга предполагает знакомство с курсом «Цифровая обработка сигналов». В книге приведены основные термины и...
Учебно-методическое пособие по лабораторному практикуму – СПб: НИУ ИТМО, 2016. – 71 с.
Пособие содержит краткое описание лабораторного практикума по методам цифровой обработки речевых сигналов, используемым в системах анализа и синтеза речевых сигналов, а также системах автоматического распознавания речи и системах автоматической идентификации и верификации дикторов.
Пособие...
Учебное пособие. — СПб.: Университет ИТМО, 2016. — 138 с. В учебном пособии рассматриваются методы автоматического распознавания речи. Материал пособия разбит на 16 разделов. Первые два раздела посвящены вопросам речеобразования и восприятия слуховой системой. В каждом разделе приведены краткие теоретические и/или практические сведения. Пособие может быть использовано при...
Учебное пособие. — Санкт-Петербург: Университет ИТМО, 2017. — 152 с. В учебном пособии рассматриваются методы автоматического распознавания речи. Материал пособия разбит на 16 разделов. Первые два раздела посвящены вопросам речеобразования и восприятия слуховой системой. В каждом разделе приведены краткие теоретические и/или практические сведения. Пособие может быть...
Практикум. — Ярославль: ЯрГУ, 2015. — 44 с. В издании описаны основы теории слепого разделения смесей сигналов, приводятся практические задания, предлагающие проведение аналитических расчетов и компьютерного моделирования. Основное внимание уделено подходам, применимым для разделения смесей звуковых сигналов, полученных при помощи нескольких микрофонов. Практикум предназначен...
Практикум. — Ярославль: ЯрГУ, 2018. — 40 с. Приводятся краткие теоретические сведения по ряду задач цифровой обработки речевых сигналов, а также даны практические задания, нацеленные на использование компьютерного моделирования. Практикум предназначен для студентов, изучающих дисциплину «Цифровая обработка речевых сигналов». Материал также может быть использован при подготовке...
М.: Связь, 1968. — 395 с.
В монографии Дж. Фланагана, известного американского ученого, подробно рассматриваются широкий круг вопросов, связанных со свойствами речи как переносчика информации, основные ее параметры, проблемы анализа, синтеза и автоматического распознавания. Оцениваются характеристики каналов речевой связи. Большое внимание уделяется рассмотрению проблем...
Пер. с англ. А. А. Пирогова. — М.: Связь, 1968. — 397 с.
В монографии Дж. Фланагана, известного американского ученого, подробно рассматриваются широкий круг вопросов, связанных со свойствами речи как переносчика информации, основные ее параметры, проблемы анализа, синтеза и автоматического распознавания. Оцениваются характеристики каналов речевой связи. Большое внимание...
Автореферат диссертации на соискание ученой степени кандидата технических наук. УлГТУ, Ульяновск, 2008. - 19 с.
Специальность - 05.13.18 Математическое моделирование, численные методы и комплексы программ
Научный руководитель - доктор технических наук, профессор Крашенинников В. Р.
Целью диссертации является разработка эффективных алгоритмов обнаружения границ РА на...
Статья. — Вестник молодых ученых. Технические науки. — 2005. — №08. — С. 5-17. В работе выполнена общая классификация задач, относящихся к области речевых технологий, показаны характерные особенности временной структуры речевых сигналов, которые наиболее важны для построения математической модели. Предложена обобщенная модель речевого процесса, позволяющая выполнять аналитические...
Статья. — Вестник молодых ученых. Технические науки. — 2005. — №08. — С. 5-17. В работе выполнена общая классификация задач, относящихся к области речевых технологий, показаны характерные особенности временной структуры речевых сигналов, которые наиболее важны для построения математической модели. Предложена обобщенная модель речевого процесса, позволяющая выполнять аналитические...
М.: Радио и связь, 2000. — 456 с. Рассматриваются проблемы цифровой обработки и передачи речи в системах со сжатием, статистическим уплотнением, пакетной коммутацией, IР-телефонии, сетях АТМ и Frame Relay. Анализируются принципы построения, характеристики и особенности функционирования кодеров формы, вокодеров, гибридных кодеров, реализующих алгоритмы CELP, LD-CELP, ACELP, МВЕ,...
М.: Инфра-М, 2015. — 346 с. В монографии рассмотрены теория, алгоритмы и практические методы реализации цифровой обработки и распознавания речевых сигналов. Представлены основы математического анализа цифровых сигналов, необходимые для обработки речи. Кратко изложена акустическая теория речеобразования с построением общей дискретной модели. Рассмотрены основные характерные...
Статья. — ТИИЭР. — 1970 Том 58. — №5. — С. 111-117. Точная оценка значений как дискретных, так и непрерывных параметров речевых сигналов играла ведущую роль в исследованиях речи и в технике ее обработки. Представляет интерес тот факт, что наиболее эффективные методы оценки параметров часто опирались на интуицию, которая основывалась на знании природы речевых сигналов и процесса их...
Статья. — ТИИЭР. — 1970 Том 58. — №5. — С. 111-117. Точная оценка значений как дискретных, так и непрерывных параметров речевых сигналов играла ведущую роль в исследованиях речи и в технике ее обработки. Представляет интерес тот факт, что наиболее эффективные методы оценки параметров часто опирались на интуицию, которая основывалась на знании природы речевых сигналов и процесса их...
Реферат - Электронные системы распознавания речи Содержание: -Введение -Исторический обзор и основные применения. -Основные методы распознавания речи. -Нерешенные задачи и взгляд в будущее. -Литература
Комментарии