|
[Print-friendly version]
This example demonstrates a method of speech coding known as phonetic vocoding. This technique
compresses speech by using a HMM based speech recogniser at the coder to quantise the speech into
phonetic units and then uses the HMM model statistics at the decoder to reconstruct the spectral
envelope. 49 phonemes are modeled by left-right, single mixture Hidden Markov Models using Adaptive
Melcepstral Coefficients (AMC) plus energy and their delta terms for the recognition/encoding
stage and Line Spectral Frequency (LSF) trained models are used at the synthesis/decoding stage.
The phoneme index, state durations and speaker adaptation information is transmitted along with
prosody information. The pitch contour is coded using Piecewise Linear Approximation (PLA). The
example may be compared with the original speech and also with speech produced using the same
speech synthesiser without any quantisation of the spectral envelope or pitch parameters.
Coding Sample 1: Male Speaker (each 134K)
(Ref: J.Dines and S.Sridharan, "A speaker independent phonetic vocoder for the English language", IEEE International
Symposium on Intelligent Signal Processing and Communication Systems (ISPACS-2000), pp. 696-701.)
|