自动化外文翻译---改进型智能机器人的语音识别方法(编辑修改稿)内容摘要:

l problem of speaker recognition system. Speech recognition can be viewed as a pattern recognition task, which includes training and , speech signal can be viewed as a time sequence and characterized by the powerful hidden Markov model (HMM). Through the feature extraction, the speech signal is transferred into feature vectors and act asobservations. In the training procedure, these observationswill feed to estimate the model parameters of HMM. These parameters include probability density function for the observations and their corresponding states, transition probability between the states, etc. After the parameter estimation, the trained models can be used for recognition task. The input observations will be recognized as the resulted words and the accuracy can be evaluated. Thewhole process is illustrated in Fig. 1. Fig. 1 Block diagram of speech recognition system 3 Theory andmethod Extraction of speaker independent features from the speech signal is the fundamental problem of speaker recognition system. The standard methodology for solving this problem uses Linear Predictive Cepstral Coefficients (LPCC) and MelFrequency Cepstral Coefficient (MFCC). Both these methods are linear procedures based on the assumption that speaker features have properties caused by the vocal tract resonances. These features form the basic spectral structure of the speech signal. However, the nonlinear information in speech signals is not easily extracted by the present feature extraction methodologies. So we use fractal dimension to measure non2linear speech turbulence. This paper investigates and implements speaker identification system using both traditional LPCC and nonlinear multiscaled fractal dimension feature extraction. 3. 1 L inear Predictive Cepstral Coefficients Linear prediction coefficient (LPC) is a parameter setwhich is obtained when we do linear prediction analysis of speech. It is about some correlation characteristics between adjacent speech samples. Linear prediction analysis is based on the following basic concepts. That is, a speech sample can be estimated approximately by the linear bination of some past speech samples. According to the minimal square sum principle of difference between real speech sample in certain analysis frame shorttime and predictive sample, the only group ofprediction coefficients can be determined. LPC coefficient can be used to estimate speech signal cepstrum. This is a special processing method in analysis of speech signal shorttime cepstrum. System function of channelmodel is obtained by linear prediction analysis as follow. Where p represents linear prediction order, ak,(k=1,2,…,p) represent sprediction coefficient, Impulse response is represented by h(n). Suppose cepstrum of h(n) is represented by ,then (1) can be expanded as (2). The cepstrum coefficient calculated in the way of (5) is called LPCC, n represents LPCC order. When we extract LPCC parameter before, we should carry on speech signal preemphasis, framing processing, windowingprocessing and endpoints detection etc. , so t。
阅读剩余 0%
本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体),仅供学习参考。 用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权,请联系我们反馈本站将在三个工作日内改正。